A4-1: Validating Electronic Health Record Demographic Data Using Self-reported Data from the Autism Registry Survey

  • September 2014,
  • 109.2;
  • DOI: https://doi.org/10.3121/cmr.2014.1250.a4-1

Abstract

Background/Aims The use of administrative patient data via the electronic health record (EHR) is very important in research. It’s critical we have a sense of the validity and relative accuracy of key data in this widely available data source. The HMORN Virtual Data Warehouse (VDW) is an example of a large repository of administrative data used in numerous studies. The recent Mental Health Research Network (MHRN) Autism Spectrum Disorder (ASD) Registry Survey was hosted by four HMORN sites inviting Kaiser Permanente members to complete a Web-based questionnaire on behalf of their child identified as having an ASD diagnosis in the EHR.

Methods In addition to an extensive battery of questions regarding ASD, the survey collected data on a number of demographic data elements also available in the VDW. These are: age, gender, race/ethnicity (child), income (household) and education (household adult). A total of 1155 adults responded to the ASD Registry Survey. These records were matched with demographic data from the VDW for the same child.

Results Preliminary examination of race-ethnicity and gender data shows that there‘s a good to excellent level of agreement between the two sources when data are non-missing in both data sources. A total of 992 records had non-missing race/ethnicity data from both data sources. In 81.1% (805/992) of the cases, both data sources agreed (kappa = 0.68, CI = 0.64–0.72). The categories that had the highest level of agreement are White, Black, and Asian, while the Hispanic and multi-racial groupings had a comparatively much lower level of agreement (46.9%). For gender, level of agreement was very high (99.0%, 1118/1129, kappa = 0.97, CI = 0.95–0.99).

Conclusions In this study, race in the electronic health record was a very accurate measure for major race categories but was far less accurate when reporting the emergent and important multi-racial category and Hispanic ethnicity. Gender had an excellent level of agreement between the two sources. Age, gender, and race-ethnicity are key covariates for many studies analyzing EHR data. Income and education can serve to illuminate socioeconomic factors very relevant to health care research. Having a sense of the validity and accuracy of these data is crucial to the research process.

Loading