|
|
||||||||
Original Research |
Russell A. Wilke, MD, PhD, Center for Human Genetics, Marshfield Clinic Research Foundation and Department of Internal Medicine, Marshfield Clinic, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Richard L. Berg, MS, Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Peggy Peissig, MBA, Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Terrie Kitchner, Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Catherine A. McCarty, PhD, Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Bozana Sijercic, MD, Department of Internal Medicine, Marshfield Clinic, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Daniel J. McCarty, PhD, Marshfield Epidemiology Research Center, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, Wisconsin 54449
Reprint Requests: Russell A. Wilke, MD, PhD, Center for Human Genetics, Marshfield Clinic Research Foundation, 1000 North Oak Avenue, Marshfield, WI 54449, Tel: 715-389-3885, Fax: 715-389-3808, E-mail: wilke.russell{at}mcrf.mfldclin.edu
Received: October 10, 2006.
Revised: December 22, 2006.
Accepted: January 8, 2007.
| Abstract |
|---|
|
|
|---|
50 years living in one of the target PMRP ZIP codes. Based upon diabetic diagnostic codes alone, we observed a false positive case rate ranging from 3.0% (in subjects with the highest glycosylated hemoglobin values) to 44.4% (in subjects with the lowest glycosylated hemoglobin values). We therefore developed an improved case finding algorithm that utilizes diabetic diagnostic codes in combination with clinical laboratory data and medication history. This algorithm yielded an estimated prevalence of 24.2% for diabetes mellitus in adult subjects aged
50 years.
Key Words: Metformin Natural language processing Pharmacogenetics Sulfonylurea
The current obesity epidemic represents a major international health problem.1 Genetic markers may be the most efficient way to identify individuals at risk for obesity-related medical complications. One of the most costly obesity-related co-morbidities is diabetes mellitus (DM).2 Hyperglycemia is the clinical hallmark of DM, but the etiology of this heterogeneous disorder likely involves multiple genetic and environmental interactions that ultimately result in alterations in insulin secretion, insulin action or both.3,4 Large population-based cohorts will be needed to characterize the genetics of complex diseases such as DM.5,6
The Marshfield Clinic Personalized Medicine Research Project (PMRP) is a population-based DNA biobank developed to facilitate research in pharmacogenetics, genetic epidemiology and population genetics (www.mfldclin.edu/pmrp).7 In 2003, the PMRP was mentioned in an article by Dr. Francis Collins and colleagues from the National Human Genome Research Institute as it relates to their identified grand challenge to "develop robust strategies for identifying the genetic contributions to disease and drug response."8 Therefore, a PMRP Working Group was formed to select diseases for which electronic algorithms could be developed to classify exposure and outcome status using the electronic medical records contained within the database. The diseases represent a range of anticipated difficulty in using purely electronic methods to identify disease onset, disease progression and outcome. The first three diseases were selected from a list of diseases that are routinely screened for during routine health maintenance examinations in adults. Listed in order from expected greatest difficulty to least difficulty for electronic algorithms, the three diseases are: (1) glaucoma, (2) osteoporosis, and (3) DM. The purpose of the current study was to pilot the process of electronically and manually abstracting information from the electronic medical record of adults served by Marshfield Clinic to define DM specifically, so that the PMRP database could eventually be utilized for studies designed to characterize the genetic epidemiology and pharmacogenetics of this disease.
| Methods |
|---|
|
|
|---|
Briefly, PMRP is a large biobank containing DNA and sera from approximately 19,000 Marshfield Clinic patients. Each PMRP participant has also provided informed consent allowing their genetic and serologic data to be linked to all available clinical data within their electronic medical record using a confidential and secure encryption process. PMRP therefore provides a unique opportunity to conduct very large genetic studies on a variety of common diseases.
Medical Record
Electronic medical records have been utilized at Marshfield Clinic since the 1960s, and the vast majority of patient records within this system have been electronic for over a decade. A variety of data are captured. One of the key features of the Marshfield Clinic electronic medical record is a Windows application called the combined medical record (CMR). CMR integrates data from all Marshfield Clinic facilities and cooperating hospitals, including Saint Josephs Hospital (Marshfield). CMR includes indices to all events and encounters that patients have experienced within the Marshfield Clinic system of care, and it can be used to access all textual documentation such as office notes, operative reports and discharge summaries. CMR also includes comprehensive lists of patient problems, a summary of each clinic encounter (diagnoses and procedures), a variety of medication alerts, and online access to over a decade of laboratory and radiology results. Since nearly everyone residing in the target ZIP code for the current study receives their health care through Marshfield Clinic, this record is considered comprehensive.
Study Population
Subjects were considered eligible for this study based on the following criteria: (1) age 50 years or older, (2) alive on December 31, 2002, (3) seen at Marshfield Clinic between January 1, 2000 and December 31, 2002, and (4) residing in ZIP code 54449 (Marshfield). Electronic medical record data for the eligible subjects were searched to determine the presence (or absence) of diabetes diagnostic codes from the International Classification of Diseases, Ninth Revision (ICD-9 codes). These codes included primary diagnostic codes for diabetes (ICD-9 codes 250.00250.92), and secondary diagnostic codes for diabetic neuropathy (ICD-9 code 357.2), retinopathy (ICD-9 codes 362.01362.02) and nephropathy (ICD-9 code 583.81). For each potential study subject, clinical laboratory data were scanned electronically to identify relevant test results. These included all available glucose and glycosylated hemoglobin (HbA1c) values. Each glucose value was assumed to be random (i.e., non-fasting) unless otherwise specified. Maximum values were determined for each subject.
Medication History
We have previously utilized natural language processing (NLP) software to reconstruct complete retrospective medication use histories for all research subjects participating in the PMRP Biobank.10 We have also shown previously that these data are amenable to electronic abstraction, and that they can be managed programmatically to yield high quality drug exposure histories in the context of lipid lowering therapy (e.g., 100% sensitive and 96% specific, with a precision of 95%).11 In the current study, clinic records from all eligible subjects were re-interrogated electronically for text mention of three classes of glucose lowering medications. This involved the application of NLP software entitled FreePharma (Language & Computing; http://www.landc.be). All 8101 subject records were searched electronically to identify and catalogue dates for all text notes mentioning any sulfonylurea agent known to be commercially available within the past decade. This included four "first-generation" sulfonylureas (acetohexamide, chlorpropamide, tolbutamide, tolazamide) and three "second-generation" sulfonylureas (glimepiride, glipizide, glyburide). A similar approach was taken to identify all text notes containing mention of any therapeutic agent mapping to the generic drug names metformin (the only clinically approved glucose-lowering biguanide) and insulin (table 1
).
|
126 on two occasions or a single random glucose >200). | Results |
|---|
|
|
|---|
|
Diagnostic Codes Absent
Electronic interrogation of the entire medical record for each of the 8101 unique subjects in this study revealed that 6679 of these subjects had no diabetic diagnostic codes contained within their electronic medical record (figure 1
, right side). Of these, 5597 (84%) had clinical laboratory data containing at least one glucose value or at least one HbA1c level. Since it was likely that some of these 5597 potential false negative cases were actually either undiagnosed diabetics or diabetics treated without a corresponding provider-entered diagnostic code, relevant clinical laboratory data were re-abstracted electronically for all 5597 subjects. These clinical laboratory data are summarized in figure 2
. For both axes (glucose and HbA1c), the mean is represented by a "+" located within box plots corresponding to the 25th, 50th and 75th percentiles, respectively, for the entire dataset (n = 5597). The horizontal dashed line delineates a glucose level
200 mg/dl.
|
Of the 8101 unique subjects in this study with no diabetic diagnostic codes (figure 1
, right side), 1082 (16%) had no clinical laboratory data that could be used to discriminate between diabetic and non-diabetic (i.e., no glucose levels and no HbA1c levels). These 1082 subjects are assumed to be true negative cases (i.e., not diabetic). The design of this study (retrospective chart review) does not allow the discrimination of false negative cases within this specific sub-sample because the research subjects were neither interviewed nor examined during the conduct of the study. However, this population is known to be highly compliant with primary prevention screening visits.12 Among the 5597 potential false negative case subjects with laboratory data but no diagnostic codes, 4477 (80%) were found to have at least one glucose level within 2 years. Based upon these observations, and the additional observation that patients residing in the target study ZIP code receive nearly all their healthcare (90% of outpatient visits, 95% of inpatient visits) through Marshfield Clinic,9 it is reasonable to assume that the frequency of false negative cases would be low in the sub-sample of 1082 subjects with no relevant clinical laboratory data.
Prevalence Estimate
We propose the electronic case-finding algorithm shown in figure 3
. The observations outlined above (Diagnostic Codes Present versus Diagnostic Codes Absent) suggest that the first branch point in this algorithm can be based upon diagnostic codes. The two subsequent branches of the algorithm then apply differential logic, reflecting the following two assumptions. First, in the situation where diabetic diagnostic codes are present, any purely electronic algorithm simply needs to confirm the diagnosis. This can be done by documenting either abnormal laboratory data (HbA1c>ULN, or glucose criteria established by the ADA) or treatment with one of three known medications used as first line therapy for DM. Conversely, in the situation where diabetic diagnostic codes are absent, the algorithm needs to establish the diagnosis. Since this latter step is more than simply confirmatory, the rightward arm of the algorithm needs to be sufficiently stringent to minimize (and, if possible, avoid altogether) false positive case assignment. Based upon the distribution of laboratory data observed in figure 2
(sub-sample with n=5597), we recommend that the identification of false positive case subjects within this sub-sample be made by first using the presence of an HbA1c test to suggest a reasonable clinical index of suspicion for DM, and then, second by accepting a maximum glucose value >200 mg/dl as diagnostic.
|
50 years and living in the target ZIP code), identifying 1960 (24.2%) unique subjects with DM. | Discussion |
|---|
|
|
|---|
The final algorithm also reduces the frequency of false negative cases by identifying subjects with DM in the absence of a diabetic diagnostic code. However, this portion of the algorithm is conservative in that it requires the presence of an elevated random glucose level (
200 mg/dl) specifically within the context of a subject record also containing at least one HbA1c value. We opted not to accept an elevated glucose level alone, since in the absence of diagnostic codes for diabetes, a random glucose value can be elevated for a variety of non-diagnostic reasons (e.g., steroid therapy or intravenous fluid replacement containing dextrose). Since the presence of at least one HbA1c test (whether normal or elevated) indicates an increased clinical index of suspicion for DM, an elevated random glucose level can be considered diagnostic in this context. Although stringent, our inclusion of a strategy to reduce false negative cases was necessary in this study population because the Centers for Disease Control and Prevention have estimated that a significant proportion of all adult diabetic subjects in the United States remain undiagnosed.13
Application of the final algorithm yielded an estimated DM prevalence of 24.2% for adults aged
50 years residing in the target ZIP code (i.e., the algorithm identified 1960 of the 8101 study subjects as diabetic case subjects). The prevalence of DM is highly associated with age, and our observation is consistent with previously published estimates.1315 This work adds to a growing body of literature supporting the utility of electronic medical records for case-finding specifically within the context of DM.1618 Further, the current study extends these observations through the development of an electronic algorithm that considers clinical laboratory data and medication history in addition to diagnostic codes. Since the target ZIP code characterized in the current study is located within the geographic region represented by the Marshfield Clinic PMRP database, the resulting algorithm will be useful for identifying DM cases in this database.
| Outlook |
|---|
|
|
|---|
| Acknowledgments |
|---|
The Working Group for the Personalized Medicine Research Project Phenotyping Engine includes Dr. Philip Giampietro, Dr. Robert Greenlee, Dr. Catherine McCarty, Dr. Daniel McCarty, Ms. Peggy Peissig and Dr. Russell Wilke.
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. E Himes, Y. Dai, I. S Kohane, S. T Weiss, and M. F Ramoni Prediction of Chronic Obstructive Pulmonary Disease (COPD) in Asthma Patients Using Electronic Medical Records JAMIA, May 1, 2009; 16(3): 371 - 379. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Onitilo, J. M. Engel, C. I. Lundgren, P. Hall, L. Thalib, and S. A.R. Doi Simplifying the TNM System for Clinical Use in Differentiated Thyroid Cancer J. Clin. Oncol., April 10, 2009; 27(11): 1872 - 1878. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |