An Introduction to Clinical Natural Language Processing

Wendy W Chapmana, Dina Demner-Fushmanb, Hua Xuc, Stéphane M Meystred

a Division of Biomedical Informatics, University of California, San Diego, La Jolla, California, USA

b U.S. National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA

cSchool of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, Texas, USA

cDepartment of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA



Natural language processing (NLP) is the umbrella term used to describe the automated structuring and extraction of information formatted as free text. The demand for NLP technologies in medicine will grow significantly in the coming years.This growth will be fueled by the continuing adoption of the electronic medical record, increasing emphasis on quality measurement and improvement initiatives, and the growing need for evidence to be used as part of evidence-based medicine.

This half-day tutorial is designed to introduce clinicians and informaticians to the practice, tools, techniques, and science of clinical NLP. Instruction will be interactive and case driven. The tutorial will focus primarily on clinical NLP, although related uses and methods such as literature-based NLP and text mining will be discussed to lend context.

Topics covered include: an overview of clinical NLP and its uses in medicine; a brief history of clinical NLP and the evolution of NLP methods; the challenges to NLP; the number of approaches used to process natural language and the strengths and weaknesses of each; implementation considerations, creating annotated corpora as training or test sets, evaluation of NLP, and a review of open source tools for natural language processing. Demonstrations and in-class exercises will be used to help tie the theory of NLP to everyday research problems addressed by these technologies. The tutorial will be taught by four instructors experienced as researchers, developers, and users of a variety of tools and approaches to clinical NLP.


  1. Overview: What is NLP and how is it being used in medicine?

    1. Literature

    2. Clinical reports

    3. Applied to bioinformatics

  2. What makes clinical NLP so difficult?

    1. Overview of policies affecting NLP

    2. Characteristics of the clinical documentation environment

  3. Different approaches to clinical NLP

    1. Simple rules-based

    2. Statistical

    3. Symbolic or grammatical

    4. Hybrid approaches

  4. The clinical NLP processAvailable open source tools and components

    1. The various components of clinical NLP

    2. Annotated corpora for training or testing

    3. The pipeline or clinical NLP software

    4. Evaluation (its role in the process)

  5. Available open source tools and components

  6. Implementation considerations

  7. Evaluating clinical NLP (in greater detail)

Learning objectives

By the end of this tutorial, attendees should be able to:

  • Describe the current uses of clinical NLP

  • Describe the relationship between clinical NLP and related techniques such as text and data mining

  • Understand the challenges to clinical NLP

  • Describe the various approaches to clinical NLP and their strengths and weaknesses

  • Understand the process of clinical NLP and its various components

  • Find available open source clinical NLP components, frameworks, and packages

  • Identifying potential implementation concerns and challenges

  • Understand the process of creating and using annotated corpora

  • Interpret the performance of published clinical NLP research


Target Audience:

Any clinician or medical informatician with an interest in learning more about clinical NLP.