Abstract
Background/Aims A large volume of clinical data is captured in electronic medical records (EMRs), and feasibly extracting the data to define clinical phenotypes is valuable to health care research. We designed an algorithm to define abdominal aortic aneurysm (AAA) cases and controls. We implemented the algorithm using our institutional warehouse and propose using the HMORN Virtual Data Warehouse (VDW) to replicate our findings.
Methods The cohort consisted of individuals enrolled in the Geisinger MyCode biobank or consented for research in other studies (such as the Vascular Department). The Structured Query Language (SQL) algorithm utilized CPT codes and ICD9 codes and vital signs data to define individuals as cases, controls or excludes. AAA cases were defined as having an AAA repair procedure, or at least one vascular clinic encounter with a ruptured AAA, or at least two vascular clinic encounters with an unruptured AAA. AAA controls were neither excludes nor cases, had an encounter within the past 5 years, and never had an ICD9 code 441.3, 441.4, or 441.9. Individuals were excluded based on certain medical conditions, age younger than 40 or older than 89, not having an encounter within 5 years, or having an ICD9 diagnosis of 441.
Results We screened the records of 29,770 individuals, identifying 1,155 AAA cases and 17,523 controls. We excluded 337 individuals based on predisposing genetic conditions, 109 individuals without a visit within the past 5 years and 10,398 individuals based on age. To assure that we had true AAA cases, 248 individuals with ICD-9 codes of 441.x (which includes thoracic and unspecified site of aneurysm) were excluded. The algorithm was validated on a subset of individuals by manual chart review and demonstrated a Positive Predictive Value (PPV) of 94% and sensitivity of 100%.
Conclusions We designed an ePhenotyping algorithm to identify AAA cases and controls from the EMR with high PPV and sensitivity necessary for research purposes. The VDW provides an excellent opportunity to broaden the study population characteristics and replicate the findings.




