Abstract
Background/Aims The Virtual Data Warehouse (VDW) was created as a mechanism for producing comparable data across sites for purposes of proposing and conducting research. It is “virtual” in the sense that the data remain at the local sites rather than at a centralized data coordinating center. At the core of the VDW are a series of standardized file definitions. Content areas and data elements that are commonly required for research studies are identified, and data dictionaries are created for each of the content areas, specifying a common format for each of the elements—variable name, label, description, code values, and value labels. Local site programmers have mapped the data elements from their HMO’s data systems into this standardized set of variable definitions, names, and codes, as well as onto standardized SAS file formats. This common structure of the VDW files enables a SAS analyst at one site to write one program to extract and/or analyze data at all participating sites.
Methods This poster demonstrates the range of data sources used at Kaiser Permanente in the Mid-Atlantic States (KPMAS) to feed information into our local implementation of the VDW datasets.
Results The KPMAS local implementation of the VDW contains detailed medical information on KPMAS members. These files contain details on 33 million pharmacy dispensings (2004–2011), nearly 27 million unique medical encounters (2005–2011), including 0.5 million hospitalizations, 19 million ambulatory visits, 80 million diagnoses, and 46 million procedures. The data includes 19 million Vital Signs observations, and 40 million lab results. The VDW Enrollment and Demographic files are derived from several historical and current membership files; the VDW Utilization and Pharmacy files are derived from national Kaiser Permanente systems, augmented with data from the KPMAS electronic health record and claims systems; the VDW tumor data is derived from MD, VA and DC state registries.
Conclusions The KPMAS VDW provides a centralized, tested repository of data from all available sources. This resource enables data sharing for multi-site studies, and also improves programming efficiency, accuracy, and completeness for KPMAS studies by providing an integrated regional data warehouse.




