NPD ExplorerData Model

FHIR Data Model

Resource schemas, field definitions, and relationship graph for the CMS NPD FHIR R4 dataset

FHIR R4 · Public Domain

Resource Relationship Graph

Practitioner
7,441,212
PractitionerRole
7,180,732
Organization
3,605,261
Location
3,494,239
Endpoint
5,043,524
OrganizationAffiliation
439,599
From ResourceFieldTo ResourceCompleteness
PractitionerRole.practitioner
Practitioner
100%
PractitionerRole.organization
Organization
98.1%
PractitionerRole.location[]
Location
78.0%
PractitionerRole.endpoint[]
Endpoint
6.2%
Location.managingOrganization
Organization
76.6%
Endpoint.managingOrganization
Organization
19.2%
OrganizationAffiliation.organization + participating
Organization
100%

Resource Schemas

Practitioner7,441,212 records

Individual healthcare providers. Each record has a unique NPI, name, gender, qualifications (NUCC taxonomy), and optional communication languages.

FieldTypeNotes
idstringInternal FHIR resource ID
identifier[NPI]IdentifierNational Provider Identifier (100% present)
activeboolean96.5% active in sample
nameHumanNameFamily + given names (100% present)
gendercodemale | female | other | unknown
qualificationQualification[]NUCC taxonomy codes (95.2% present)
communicationCodeableConcept[]Spoken languages (2.8% present)
PractitionerRole7,180,732 records

The join table of the NPD. Links a Practitioner to an Organization, Location, and Endpoint. Contains specialty codes (NUCC) and telecom for the role.

FieldTypeNotes
practitionerReference<Practitioner>100% present
organizationReference<Organization>98.1% present
locationReference<Location>[]78% present
endpointReference<Endpoint>[]6.2% present
specialtyCodeableConcept[]NUCC specialty (46.1% present)
telecomContactPoint[]Phone/fax (75.2% present)
activeboolean55.2% active in sample
Organization3,605,261 records

Healthcare entities: hospitals, clinics, health systems, pharmacies. Each has NPI + Pseudo-EIN identifiers, address, and telecom.

FieldTypeNotes
identifier[NPI]IdentifierNational Provider Identifier (100%)
identifier[pseudo-EIN]IdentifierCMS-assigned pseudo-EIN (100%)
activeboolean100% active in sample
namestringOrganization name
addressAddress[]Physical address (88.2% present)
telecomContactPoint[]Phone/fax (88.7% present)
typeCodeableConcept[]Organization type code
Location3,494,239 records

Physical service locations. All have addresses; 46.6% have GPS coordinates. Linked to managing organizations.

FieldTypeNotes
statuscodeactive | suspended | inactive (100% active)
modecodeinstance (100% in sample)
addressAddressPhysical address (100% present)
positionPositionGPS lat/lon (46.6% present)
managingOrganizationReference<Organization>76.6% present
physicalTypeCodeableConceptBuilding type code
Endpoint5,043,524 records

FHIR API endpoints for healthcare interoperability. All are active HTTPS URLs using the hl7-fhir-rest connection type.

FieldTypeNotes
statuscodeactive (100% in sample)
connectionTypeCodinghl7-fhir-rest
addressurlHTTPS FHIR endpoint URL (100% HTTPS)
payloadTypeCodeableConcept[]not-applicable (100%)
managingOrganizationReference<Organization>19.2% present
OrganizationAffiliation439,599 records

Inter-organization relationships. 80.7% are 'Member' affiliations, defining healthcare network membership.

FieldTypeNotes
organizationReference<Organization>Parent org (100% present)
participatingOrganizationReference<Organization>Member org (100% present)
codeCodeableConcept[]Member (80.7%) or other
activeboolean100% active in sample
NDJSON Format

Each file is Newline-Delimited JSON — one FHIR resource per line. Ideal for streaming, parallel processing, and bulk loading into databases.

zstd Compression

Files use zstd level 12 compression, achieving ~93% size reduction (40.7 GB → 2.8 GB). Decompress with: zstdcat file.ndjson.zst | jq '.'

Data Specification

Full schema documentation available at the HTE Data Release Specifications on GitHub: ftrotter-gov/HTE_data_release_specifications