Essentials of Data mining in Clinical Applications: Development, Evaluation, and Use

Niels Peek

Department of Medical Informatics, University of Amsterdam, Amsterdam, The Netherlands

 

Abstract

The increasing amounts of clinical data that are being captured digitally today create ample opportunity to expand our knowledge. In consequent, clinical applications of data mining are becoming commonplace and data mining techniques are finding their way to the modern medical informatician’s curriculum. The objective of this tutorial is to offer an accessible treatment of data mining in clinical applications, covering issues pertaining to the development, evaluation, and clinical use of data mining methods. Specifically, the learning objectives of the tutorial are to understand the main concepts pertaining to knowledge discovery in clinical databases; to appreciate how these concepts fit in the lifecycle of real world applications; and to provide “best practice” guidelines for developing clinical data mining applications. The tutorial targets researchers and practitioners at the beginner and intermediate levels. A framework for understanding the essentials of data mining in clinical applications is presented that should allow attendees to approach their own applications in a principled manner, and to scrutinize and criticize other approaches to learning from data. Various case studies from the teachers’ own experiences will be presented.

 

Outline of topics

After a general introduction to the concepts of Data Mining and Knowledge Discovery in Databases, the tutorial will be structured around a discussion of the principal paradigms for machine learning, i.e. supervised learning (classification and prediction), unsupervised (cluster analysis), and semi-supervised learning (subgroup discovery). For each of the paradigms, a brief lecture will be given covering the following topics: (a) rationale, underlying assumptions, and application scenario’s; (b) overview of learning algorithms, and example algorithm; (c)interpretation and validation of results; (d) clinical case study; and (e) practical recommendations.

 

Learning objectives

  1. To understand the main concepts pertaining to knowledge discovery in databases in clinical applications, including:

    1. Clinical requirements necessitating data mining

    2. The preprocessing-development-evaluation-implementation-use lifecycle

    3. Learning paradigms and spectrum of available learning algorithms

    4. Methods for quantitative and qualitative evaluation of models

  2. To appreciate how these concepts fit together by discussing design decisions along the lifecycle of various real world case studies
  3. To provide “best practice” guidelines for developing clinical data mining applications.

 

Target audience

The tutorial targets researchers and practitioners at beginner and intermediate levels. Basic knowledge of probability theory, statistical estimation and hypothesis testing, Boolean logic, and computer algorithms is assumed.

 

SiteLogin