Abstract
Background The Virtual Data Warehouse (VDW) was created as a mechanism for producing comparable data across sites for purposes of proposing and conducting research. It is “virtual” in the sense that the data remain at the local sites; there is no multi-site physical database at a centralized coordinating center. At the core of the VDW are a series of standardized file definitions. Content areas and data elements that are commonly required for research studies are identified, and data dictionaries are created for each content area, specifying a common format for each of the elements— variable name, label, description, code values, and value labels. Local site programmers have mapped the data elements from their HMO’s data systems into this standardized set of variable definitions, names, and codes, as well as onto standardized SAS file formats. This common structure of the VDW files enables a SAS analyst at one site to write one program to extract and/or analyze data at all participating sites.
Methods This poster demonstrates the data in Harvard Pilgrim Health Care (HPHC) VDW files.
Results The HPHC VDW files contain details on 88 million pharmacy dispensings (2000–2010), nearly 89 million unique medical encounters (2000–2010), including .67 million hospitalizations, 69.6 million ambulatory visits, 183 million diagnoses, and 209 million procedures. Vital Signs and lab results are available for about 10% of HPHC enrollees who receive care at Harvard Vanguard Medical Associates, a local multi-specialty practice group and available per project basis. The VDW Enrollment, Demographic, and Census files are derived from internal HPHC historical membership files; the VDW Pharmacy and utilization files are derived from internal HPHC claims files; the VDW tumor data is derived from the Massachusetts Cancer Registry data; the VDW Death data is derived from Massachusetts death registry and HPHC internal data.
Conclusions The VDW at HPHC provides an easily employed unified central repository of data from all available source files. This resource enables the sharing of compatible data in multi-site studies, and also improves programming efficiency, accuracy, and completeness for local single site studies by expending resources to link these legacy systems only once.




