C-C1-03: Identifying Respiratory-Related Clinical Conditions From ED Reports With Topaz

  • Clinical Medicine & Research
  • March 2010,
  • 8
  • (1)
  • 53;
  • DOI: https://doi.org/10.3121/cmr.8.1.53-b

Abstract

Background: Identifying patients for research studies often requires timely and expensive manual chart review. Automated searching on textual records results in many false positive cases, because a large proportion of clinical conditions described in reports occurred in the past history (past history of pneumonia), were experienced by a relative and not the patient (father suffered from CHF), or are described as being absent (patient denies chest pain). Natural Language Processing (NLP) promises to automatically encode clinical conditions and modifiers necessary for accurate identification of patients matching a given case definition.

Methods: We developed an NLP system called Topaz that currently identifies 55 respiratory related clinical conditions from clinical reports, assigning the value acute, chronic, or absent to each of the conditions in a three-step process. Module 1: Find instances of the conditions in the text. Module 2: For each condition marked by Topaz, assign the following values: Existence (present, absent); Temporality (recent, historical, hypothetical); Experiencer (patient, other). Module 3: Integrate information from individual annotations to assign a single value to each of the 55 conditions.

Results: We evaluated each module on a set of 60 emergency department reports by comparing against physician annotations. Sensitivity and positive predictive value for Modules 1 and 2 are as follows: Module 1: Mark conditions: 99%, 89%; Module 2: Negation (98%, 96%), Historical (76%, 50%), Hypothetical (100%, 81%), Experiencer (100%, 33%). Weighted kappa score between values assigned to the 55 conditions by Topaz (Module 3) compared to physician annotations was 0.85, indicating high agreement between Topaz and a physician.

Conclusion: In spite of mistakes in marking conditions and assigning contextual features, Topaz performed similarly to a physician in assigning 55 conditions the values acute, chronic, or absent based on the ED report. Topaz could improve sensitivity of case retrieval by addressing linguistic variation common in clinical reports (e.g., synonymy, abbreviations, etc). Topaz could also improve precision of case retrieval by addressing temporality, experiencer, and negation of clinical conditions.

Loading
  • Share
  • Bookmark this Article