Abstract
Background/Aims HMG-CoA inhibitor (statin) use is a common treatment for elevated cholesterol. Statins are effective in reducing hepatic cholesterol synthesis in prospective studies. Examination of EHR data for estimating statin exposure-time is difficult because prescriptions are in unstructured notes and clinicians prescribe combinations of ‘pill splitting’, while frequently adjusting dosage to keep cholesterol levels under control. Here, we present logic for estimating statin exposure-time based on EHR data.
Methods The EHR of individuals studied in the Electronic Medical Records and Genomics (eMERGE) study were interrogated for evidence of exposure to statin medications. Exposure was identified via natural language processing (used between 1998–2007) and extraction from the medications orders system (beginning in 2004). Pill-splitting was identified in text notes: (‘1/2’, ’0.5’, ’one-half’, ‘half of’, ‘1 1/2’, ‘1.5’, ‘one and one half’, ‘bid’, ‘two qd’, ‘2 q.d.’, ‘twice daily’, ‘1/4’, ‘0.25’, ‘one quarter’). Where conflicts occurred between NLP and medication orders, order data took precedence, both for dosage and drug identification. Manual validation of electronic charts was performed to determine gold-standard dosage and exposure dates. Because statin data are frequently difficult to interpret, even in medical charts, the study team used source documents though they may contradict notes.
Results The subset of the eMERGE population (N=4,427) were identified with an indication of statin exposure (51%, 2,540/4,427). Average age 65 at the first exposure. Both exposed and non-exposed populations had lengthy follow-up periods: (Avg 27 vs. 28 yr, Std. Dev 6.1 vs. 6.9 yr, Min 1 vs. 3yr, Max 33 vs. 33 yr).Where statin exposure was detected, the average patient received 2.1 (max=7) different statins, over 8 years (Avg 8.1 yr, Std. Dev 4.9 yr, Min .002 yr, Max 16.8 yr). Statin-exposed population was slightly more comorbid based on the Charlson comorbidity index (scores: 0.72 vs. 0.55). Vitals and lab results were comparable between the two groups as well: BMI (30 vs. 29), LDL (113 vs. 114), Triglycerides (80 vs. 79).
Discussion Combining the medications data obtained via NLP with the medications orders resulted in a substantial improvement in estimation of statin exposure-time, when compared to the raw EHR source data.




