Abstract
PHI in HMORN medical records must be systematically removed prior to use and sharing data with co-investigators at other institutions. The problem is particularly acute when one needs to link clinical and follow-up data with pathologic specimens from cancer patients in studies of prognostic and predictive tumor markers. Natural language processing (NLP) can be customized to identify and remove PHI in selected clinical records by substituting nonsense characters. Such a process of deidentification can proceed largely without human intervention and, if successful, can allow efficient linkage of clinical notes with similarly deidentified specimens for laboratory investigators at other institutions. The Shared Pathology Informatics Network (SPIN) was funded by the Cancer Diagnosis Program of the National Cancer Institute to develop a computerized program that would search pathology department text files and reports from several institutions and retrieve, in a database, all records that met search criteria. SPIN was designed to access information from each institution’s electronic records without affecting the records and collating data from several sites into a single report. Our current project, the Specimen Retrieval System, requires the identification of large numbers of cancer specimens of specific types from Kaiser Permanente Northwest (KPNW) and, subsequently, other Cancer Research Network sites, retrieval of those specimens and linkage with clinical annotations that describe staging, treatment and outcome. We have used the existing databases of the KPNW Tumor Registry and Department of Pathology to identify cases. The EPIC electronic medical record and several other computerized text files provided the clinical notes for these patients. Using SPIN technology we then processed text files to ‘scrub’ them of PHI. All records were then manually inspected to assess the completeness of the process, which elements of PHI persisted and which had been successfully ‘scrubbed’ of PHI. We processed several hundred files from more than 100 patients and will report detailed statistical analysis. This technology has demonstrated significant capability to facilitate searches for pathologic specimens and clinical annotations using conventional reports from pathology departments. It is a valuable tool to remove PHI, deidentify medical records and ease sharing of clinical information with investigators at other sites.




