Extracting fine-grained disease phenotypes and adverse drug reactions from electronic patient records

Prof. Søren Brunak

Extracting fine-grained disease phenotypes and adverse drug reactions from electronic patient records



It is a fundamental issue to resolve whether specific adverse drug reactions (ADRs) stem from variation in the individual genome of a patient, from drug/environment cocktail effects, or both. We have developed a text mining pipeline for temporal analysis of electronic patient records for identification of ADRs directly from the free text narratives describing patient disease trajectories over time. Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases, drugs and genetic information. Linking these data is a huge undertaking which soon will represent a major challenge given that it now has become feasible to sequence the DNA of entire populations at low cost. By extracting phenotype information and information of adverse drug reactions from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. We characterize the similarity of ADR profiles of approved drugs using drug-ADR networks and report on the relationship between the chemical similarity of drugs and their ADRs.



Søren Brunak, Ph.D., is professor of Bioinformatics at the Technical University of Denmark and professor of Disease Systems Biology at the University of Copenhagen. Prof. Brunak is the founding Director of the Center for Biological Sequence Analysis, which was formed in 1993 as a multi-disciplinary research group of molecular biologists, biochemists, medical doctors, physicists, and computer scientists. Søren Brunak has been highly active within biological data integration, where machine learning techniques often have been used to integrate predicted or experimentally established functional genome, metagenome and proteome annotation. His current research does combine molecular level systems biology and healthcare sector data such as electronic patient records and biobank questionnaires. The aim is to group and stratify patients not only from their genotype, but also phenotypically based on the clinical descriptions in the medical records. An additional focus area is now adverse drug reactions.