Abstract
Background/Aims Clinical text is an integral part of research at Group Health Research Institute (GHRI), supporting ongoing projects and new research proposals. Our clinical notes are stored in a single, full-text-indexed table that is populated nightly with all new notes in Group Health’s Electronic Medical Record (EMR). These notes include those generated internally and those received from partner organizations via electronic interfaces. Recently one of our partners changed its interface from sending formatted text to sending binary PDF files. While this had little effect on clinical users, it made a significant amount of text data inaccessible for computerized processing. This text was crucial to ongoing research, so we needed to find a way to preserve access.
Methods A diversely-skilled team of technologists collaborated to solve the problem with the following process. When the nightly HL7 file arrives at Group Health, it is copied to a research server and the message ID is stored with the appropriate encounter in the EMR. A scheduled Python script then extracts the message ID and binary PDF from each HL7 message. It converts the base64-encoded information into a PDF file and names it with the message ID for linking with the encounter. Finally, the script converts the PDF file into a text file using an open source library. The PDF and text files are archived, as is the nightly HL7 message.
Results From June to October 2013, this process has converted and archived over 30,000 notes from our partner that otherwise would not have been available for research. The files are also available to Group Health’s claims auditing department, enabling Group Health to see a monetary return on the effort.
Conclusions Thanks to the open source community and strong connections between informatics management at Group Health Cooperative and GHRI, we were able to quickly salvage critical data for research when an externally-driven change to that information occurred. Before the end of 2013 we expect to integrate these converted text notes with our clinical text storage.




