Abstract
Background/Aims Accurate EMR-based cohort identification is crucial in the conduct of comparative effectiveness research. CheCS is a longitudinal study of chronic hepatitis B (CHB) and C (CHC) infection being conducted at 4 HMORN sites. Subjects are identified using automated EMR-based ICD-9 diagnosis and laboratory inclusion criteria, and about 12,000 patients were identified in the initial cohort selection. After confirmation of CHB/ CHC status through chart abstraction, we found false discovery rates (FDR) were 13.6% for CHB and 11.3% for CHC. An adaptive approach was proposed to optimize the EMR-based cohort selection.
Methods Classification and Regression Tree (CART) was performed to identify a set of electronic variables (or variable combinations) for CHB and CHC. The variables/classifiers that were considered included not only all the initial cohort identificaton criteria, but also HIV status, any outpatient order or pharmacy claim for CHB/CHC antiviral medication, and 41 other liver disease-related procedures/diagnoses. The analysis began with CART model building using one set of data (learning), followed by model validation using the other set of data (testing).
Results Of the 12,144 patients identified for the initial CHeCS cohort, 2518 met initial CHB criteria and 9844 met initial CHC criteria, including 218 who met criteria for both. Of these, 10,825 (2176 CHB and 8724 CHC, including 75 co-infected) patients’ diagnoses were confirmed through chart abstraction and the remaining were excluded. CART model FDRs were 8.5% on learning data and 7.1% on testing data for CHB, and 4.9% and 5.7% for CHC, yielding sensitivities and specificities >91% for CHB and >84% for CHC. Overall, FDRs were significantly lower (7.8% for CHB, 5.3% for CHC) than those yielded from the initial inclusion criteria alone (P <0.001).
Conclusions Our adaptive approach to using electronic data for prediction of CHB/CHC status is feasible, can be used for sequential CHeCS cohort identification, and may be useful in other studies to identify patients diagnosed with CHB/CHC.




