Fundamentals of Clinical and Translational Research Informatics


Elmer V. Bernstama,b, Hua Xua, Funda Meric-Bernstamc,d

a School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA

b Department of Internal Medicine, Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA

c Institute for Personalized Cancer Therapy, The University of Texas MD Anderson Cancer Center, Houston, TX, USA

d Department of Surgical Oncology, The University of Texas M.D. Anderson Cancer Center, Houston, TX, USA



Molecular biology, electronic health records and computation have the potential to accelerate the translation of research into clinical practice. However, clinical and translational research is inefficient, labor-intensive and expensive. In this tutorial, we will present an overview of the emerging field of clinical research informatics, which attempts to lower the barriers to clinical and translational research. We will discuss: 1) the basics of clinical and translational research. We will cover different research designs including observational studies, phase I, II and III clinical trials, as well as post-marketing surveillance of drugs. Examples will be drawn from oncology and drug development, in part due to the relatively advanced informatics infrastructure available to oncologists, but will be generally applicable. Illustrative topics will include genotype-phenotype association studies, personalized (or precision) therapy and information retrieval to complement genomic data. 2) We will then introduce the field of clinical research informatics as a sub-field of biomedical informatics. We will describe landmark studies in the field, major challenges and solutions including: “big data,” data integration (e.g., UMLS), electronic data capture (e.g., REDCap) data cleaning and integrity, mitigating duplicate clinical records, data redundancy, platforms for discovery research (e.g., i2b2) and collaboration facilitation (e.g., Vivo, SciVal Experts). 3) The third and final section will cover reuse of clinical data for research to enable a “learning health care system”, focusing on natural language processing (e.g., MedLEE, cTAKES, MetaMap). At the conclusion of this tutorial, students will understand the history, central problems and solutions developed by practitioners of clinical research informatics.

Outline of topics

The tutorial will consist of three parts covering: 1) clinical and translational research, 2) clinical research informatics and 3) reuse of clinical data with an emphasis on natural language processing (NLP). Each section will be motivated by an example and will cover theoretical issues and applied work. Each of the three sections will include 45 minutes of lecture, 10 minutes of question/answer and a 5 minute break.

  1. Clinical and translational research

    1. Motivating example: personalized cancer therapy

    2. Define clinical/translational research

    3. Types of clinical studies

    4. The information challenges to clinical/translational research: barriers and pain points

  2. Clinical research informatics

    1. Motivating example: Learning health care system

    2. Theoretical foundations

    3. “State of the Art” Applications (Computational Infrastructure for Research)

  3. Reuse of clinical data for research

    1. Motivating example: high-throughput phenotyping

    2. Challenges to the reuse of clinical data for research

    3. Natural language processing (NLP) and concept extraction


Learning objectives

  1. Understand the information challenges inherent in clinical and translational research

  2. Understand the scope of clinical research informatics (CRI)

  3. Understand fundamental CRI challenges

  4. Be aware of major theoretical and applied work in CRI including specific projects and systems that have attempted to address important issues in CRI, focusing on widely-adopted open systems

  5. Understand the barriers to the reuse of clinical data for research

  6. Understand the importance of natural language processing (NLP) to CRI

  7. Be aware of specific natural language processing (NLP) projects and systems important to CRI

  8. Understand important CRI concepts and programs including comparative effectiveness research, regulatory barriers to research, patient-centered outcomes research and the learning health care system


Target audience

Clinicians, biomedical researchers, computer scientists, informaticians, system developers, programmers, information technology professionals