Data Mining and Medical Knowledge Management - Cases and Applications

Chia sẻ: Truong Bao | Ngày: | Loại File: PDF | Số trang:465

1
599
lượt xem
132
download

Data Mining and Medical Knowledge Management - Cases and Applications

Mô tả tài liệu
  Download Vui lòng tải xuống để xem tài liệu đầy đủ

Current research directions are looking at Data Mining (DM) and Knowledge Management (KM) as complementary and interrelated felds, aimed at supporting, with algorithms and tools, the lifecycle of knowledge, including its discovery, formalization, retrieval, reuse, and update. While DM focuses on the extraction of patterns, information, and ultimately knowledge from data (Giudici, 2003; Fayyad et al., 1996; Bellazzi, Zupan, 2008), KM deals with eliciting, representing, and storing explicit knowledge, as well as keeping and externalizing tacit knowledge (Abidi, 2001; Van der Spek, Spijkervet, 1997). Although DM and KM have stemmed from different cultural backgrounds and their methods and tools are...

Chủ đề:
Lưu

Nội dung Text: Data Mining and Medical Knowledge Management - Cases and Applications

  1. Data Mining and Medical Knowledge Management: Cases and Applications Petr Berka University of Economics, Prague, Czech Republic Jan Rauch University of Economics, Prague, Czech Republic Djamel Abdelkader Zighed University of Lumiere Lyon 2, France Medical inforMation science reference Hershey • New York
  2. Director of Editorial Content: Kristin Klinger Managing Editor: Jamie Snavely Assistant Managing Editor: Carole Coulson Typesetter: Sean Woznicki Cover Design: Lisa Tosheff Printed at: Yurchak Printing Inc. Published in the United States of America by Information Science Reference (an imprint of IGI Global) 701 E. Chocolate Avenue, Suite 200 Hershey PA 17033 Tel: 717-533-8845 Fax: 717-533-8661 E-mail: cust@igi-global.com Web site: http://www.igi-global.com/reference and in the United Kingdom by Information Science Reference (an imprint of IGI Global) 3 Henrietta Street Covent Garden London WC2E 8LU Tel: 44 20 7240 0856 Fax: 44 20 7379 0609 Web site: http://www.eurospanbookstore.com Copyright © 2009 by IGI Global. All rights reserved. No part of this publication may be reproduced, stored or distributed in any form or by any means, electronic or mechanical, including photocopying, without written permission from the publisher. Product or company names used in this set are for identification purposes only. Inclusion of the names of the products or companies does not indicate a claim of ownership by IGI Global of the trademark or registered trademark. Library of Congress Cataloging-in-Publication Data Data mining and medical knowledge management : cases and applications / Petr Berka, Jan Rauch, and Djamel Abdelkader Zighed, editors. p. ; cm. Includes bibliographical references and index. Summary: "This book presents 20 case studies on applications of various modern data mining methods in several important areas of medi- cine, covering classical data mining methods, elaborated approaches related to mining in EEG and ECG data, and methods related to mining in genetic data"--Provided by publisher. ISBN 978-1-60566-218-3 (hardcover) 1. Medicine--Data processing--Case studies. 2. Data mining--Case studies. I. Berka, Petr. II. Rauch, Jan. III. Zighed, Djamel A., 1955- [DNLM: 1. Medical Informatics--methods--Case Reports. 2. Computational Biology--methods--Case Reports. 3. Information Storage and Retrieval--methods--Case Reports. 4. Risk Assessment--Case Reports. W 26.5 D2314 2009] R858.D33 2009 610.0285--dc22 2008028366 British Cataloguing in Publication Data A Cataloguing in Publication record for this book is available from the British Library. All work contributed to this book is new, previously-unpublished material. The views expressed in this book are those of the authors, but not necessarily of the publisher. If a library purchased a print copy of this publication, please go to http://www.igi-global.com/agreement for information on activating the library's complimentary electronic access to this publication.
  3. Editorial Advisory Board Riccardo Bellazzi, University of Pavia, Italy Radim Jiroušek, Academy of Sciences, Prague, Czech Republic Katharina Morik, University of Dortmund, Germany Ján Paralič, Technical University, Košice, Slovak Republic Luis Torgo, LIAAD-INESC Porto LA, Portugal Blaž Župan, University of Ljubljana, Slovenia List of Reviewers Ricardo Bellazzi, University of Pavia, Italy Petr Berka, University of Economics, Prague, Czech Republic Bruno Crémilleux, University Caen, France Peter Eklund, Umeå University, Umeå, Sveden Radim Jiroušek, Academy of Sciences, Prague, Czech Republic Jiří Kléma, Czech Technical University, Prague, Czech Republic Mila Kwiatkovska, Thompson Rivers University, Kamloops, Canada Martin Labský, University of Economics, Prague, Czech Republic Lenka Lhotská, Czech Technical University, Prague, Czech Republic Ján Paralić, Technical University, Kosice, Slovak Republic Vincent Pisetta, University Lyon 2, France Simon Marcellin, University Lyon 2, France Jan Rauch, University of Economics, Prague, Czech Republic Marisa Sánchez, National University, Bahía Blanca, Argentina Ahmed-El Sayed, University Lyon 2, France Olga Štěpánková, Czech Technical University, Prague, Czech Republic Vojtěch Svátek, University of Economics, Prague, Czech Republic Arnošt Veselý, Czech University of Life Sciences, Prague, Czech Republic Djamel Zighed, University Lyon 2, France
  4. Table of Contents Foreword ............................................................................................................................................ xiv Preface ................................................................................................................................................ xix Acknowledgment .............................................................................................................................xxiii Section I Theoretical Aspects Chapter I Data, Information and Knowledge .......................................................................................................... 1 Jana Zvárová, Institute of Computer Science of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic; Center of Biomedical Informatics, Czech Republic Arnošt Veselý, Institute of Computer Science of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic; Czech University of Life Sciences, Czech Republic Igor Vajda, Institutes of Computer Science and Information Theory and Automation of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic Chapter II Ontologies in the Health Field .............................................................................................................. 37 Michel Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Radja Messai, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Gayo Diallo, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Ana Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Chapter III Cost-Sensitive Learning in Medicine.................................................................................................... 57 Alberto Freitas, University of Porto, Portugal; CINTESIS, Portugal Pavel Brazdil, LIAAD - INESC Porto L.A., Portugal; University of Porto, Portugal Altamiro Costa-Pereira, University of Porto, Portugal; CINTESIS, Portugal
  5. Chapter IV Classification and Prediction with Neural Networks ............................................................................ 76 Arnošt Veselý, Czech University of Life Sciences, Czech Republic Chapter V Preprocessing Perceptrons and Multivariate Decision Limits ............................................................ 108 Patrik Eklund, Umeå University, Sweden Lena Kallin Westin, Umeå University, Sweden Section II General Applications Chapter VI Image Registration for Biomedical Information Integration .............................................................. 122 Xiu Ying Wang, BMIT Research Group, The University of Sydney, Australia Dagan Feng, BMIT Research Group, The University of Sydney, Australia; Hong Kong Polytechnic University, Hong Kong Chapter VII ECG Processing .................................................................................................................................. 137 Lenka Lhotská, Czech Technical University in Prague, Czech Republic Václav Chudáček, Czech Technical University in Prague, Czech Republic Michal Huptych, Czech Technical University in Prague, Czech Republic Chapter VIII EEG Data Mining Using PCA ............................................................................................................ 161 Lenka Lhotská, Czech Technical University in Prague, Czech Republic Vladimír Krajča, Faculty Hospital Na Bulovce, Czech Republic Jitka Mohylová, Technical University Ostrava, Czech Republic Svojmil Petránek, Faculty Hospital Na Bulovce, Czech Republic Václav Gerla, Czech Technical University in Prague, Czech Republic Chapter IX Generating and Verifying Risk Prediction Models Using Data Mining ............................................. 181 Darryl N. Davis, University of Hull, UK Thuy T.T. Nguyen, University of Hull, UK
  6. Chapter X Management of Medical Website Quality Labels via Web Mining .................................................... 206 Vangelis Karkaletsis, National Center of Scienti.c Research “Demokritos”, Greece Konstantinos Stamatakis, National Center of Scientific Research “Demokritos”, Greece Pythagoras Karampiperis, National Center of Scientific Research “Demokritos”, Greece Martin Labský, University of Economics, Prague, Czech Republic Marek Růžička, University of Economics, Prague, Czech Republic Vojtěch Svátek, University of Economics, Prague, Czech Republic Enrique Amigó Cabrera, ETSI Informática, UNED, Spain Matti Pöllä, Helsinki University of Technology, Finland Miquel Angel Mayer, Medical Association of Barcelona (COMB), Spain Dagmar Villarroel Gonzales, Agency for Quality in Medicine (AquMed), Germany Chapter XI Two Case-Based Systems for Explaining Exceptions in Medicine .................................................... 227 Rainer Schmidt, University of Rostock, Germany Section III Speci.c Cases Chapter XII Discovering Knowledge from Local Patterns in SAGE Data ............................................................. 251 Bruno Crémilleux, Université de Caen, France Arnaud Soulet, Université François Rabelais de Tours, France Jiří Kléma, Czech Technical University, in Prague, Czech Republic Céline Hébert, Université de Caen, France Olivier Gandrillon, Université de Lyon, France Chapter XIII Gene Expression Mining Guided by Background Knowledge ........................................................... 268 Jiří Kléma, Czech Technical University in Prague, Czech Republic Filip Železný, Czech Technical University in Prague, Czech Republic Igor Trajkovski, Jožef Stefan Institute, Slovenia Filip Karel, Czech Technical University in Prague, Czech Republic Bruno Crémilleux, Université de Caen, France Jakub Tolar, University of Minnesota, USA Chapter XIV Mining Tinnitus Database for Knowledge .......................................................................................... 293 Pamela L. Thompson, University of North Carolina at Charlotte, USA Xin Zhang, University of North Carolina at Pembroke, USA Wenxin Jiang, University of North Carolina at Charlotte, USA Zbigniew W. Ras, University of North Carolina at Charlotte, USA Pawel Jastreboff, Emory University School of Medicine, USA
  7. Chapter XV Gaussian-Stacking Multiclassifiers for Human Embryo Selection..................................................... 307 Dinora A. Morales, University of the Basque Country, Spain Endika Bengoetxea, University of the Basque Country, Spain Pedro Larrañaga, Universidad Politécnica de Madrid, Spain Chapter XVI Mining Tuberculosis Data ................................................................................................................... 332 Marisa A. Sánchez, Universidad Nacional del Sur, Argentina Sonia Uremovich, Universidad Nacional del Sur, Argentina Pablo Acrogliano, Hospital Interzonal Dr. José Penna, Argentina Chapter XVII Knowledge-Based Induction of Clinical Prediction Rules ................................................................. 350 Mila Kwiatkowska, Thompson Rivers University, Canada M. Stella Atkins, Simon Fraser University, Canada Les Matthews, Thompson Rivers University, Canada Najib T. Ayas, University of British Columbia, Canada C. Frank Ryan, University of British Columbia, Canada Chapter XVIII Data Mining in Atherosclerosis Risk Factor Data .............................................................................. 376 Petr Berka, University of Economics, Prague, Czech Republic; Academy of Sciences of the Czech Republic, Prague, Czech Republic Jan Rauch, University of Economics, Praague, Czech Republic; Academy of Sciences of the Czech Republic, Prague, Czech Republic Marie Tomečková, Academy of Sciences of the Czech Republic, Prague, Czech Republic Compilation of References ............................................................................................................... 398 About the Contributors .................................................................................................................... 426 Index ................................................................................................................................................... 437
  8. Detailed Table of Contents Foreword ............................................................................................................................................ xiv Preface ................................................................................................................................................ xix Acknowledgment .............................................................................................................................xxiii Section I Theoretical Aspects This section provides a theoretical and methodological background for the remaining parts of the book. It defines and explains basic notions of data mining and knowledge management, and discusses some general methods. Chapter I Data, Information and Knowledge .......................................................................................................... 1 Jana Zvárová, Institute of Computer Science of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic; Center of Biomedical Informatics, Czech Republic Arnošt Veselý, Institute of Computer Science of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic; Czech University of Life Sciences, Czech Republic Igor Vajda, Institutes of Computer Science and Information Theory and Automation of the Academy of Sciences of the Czech Republic v.v.i., Czech Republic This chapter introduces the basic concepts of medical informatics: data, information, and knowledge. It shows how these concepts are interrelated and can be used for decision support in medicine. All discussed approaches are illustrated on one simple medical example. Chapter II Ontologies in the Health Field .............................................................................................................. 37 Michel Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Radja Messai, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Gayo Diallo, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France Ana Simonet, Laboratoire TIMC-IMAG, Institut de l’Ingénierie et de l’Information de Santé, France
  9. This chapter introduces the basic notions of ontologies, presents a survey of their use in medicine, and explores some related issues: knowledge bases, terminology, information retrieval. It also addresses the issues of ontology design, ontology representation, and the possible interaction between data mining and ontologies. Chapter III Cost-Sensitive Learning in Medicine.................................................................................................... 57 Alberto Freitas, University of Porto, Portugal; CINTESIS, Portugal Pavel Brazdil, LIAAD - INESC Porto L.A., Portugal; University of Porto, Portugal Altamiro Costa-Pereira, University of Porto, Portugal; CINTESIS, Portugal Health managers and clinicians often need models that try to minimize several types of costs associated with healthcare, including attribute costs (e.g. the cost of a specific diagnostic test) and misclassification costs (e.g. the cost of a false negative test). This chapter presents some concepts related to cost-sensitive learning and cost-sensitive classification in medicine and reviews research in this area. Chapter IV Classification and Prediction with Neural Networks ............................................................................ 76 Arnošt Veselý, Czech University of Life Sciences, Czech Republic This chapter describes the theoretical background of artificial neural networks (architectures, methods of learning) and shows how these networks can be used in medical domain to solve various classifica- tion and regression problems. Chapter V Preprocessing Perceptrons and Multivariate Decision Limits ............................................................ 108 Patrik Eklund, Umeå University, Sweden Lena Kallin Westin, Umeå University, Sweden This chapter introduces classification networks composed of preprocessing layers and classification networks, and compares them with “classical” multilayer percpetrons on three medical case studies. Section II General Applications This section presents work that is general in the sense of a variety of methods or variety of problems described in each of the chapters. Chapter VI Image Registration for Biomedical Information Integration .............................................................. 122 Xiu Ying Wang, BMIT Research Group, The University of Sydney, Australia Dagan Feng, BMIT Research Group, The University of Sydney, Australia; Hong Kong Polytechnic University, Hong Kong
  10. In this chapter, biomedical image registration and fusion, which is an effective mechanism to assist medical knowledge discovery by integrating and simultaneously representing relevant information from diverse imaging resources, is introduced. This chapter covers fundamental knowledge and major methodologies of biomedical image registration, and major applications of image registration in biomedicine. Chapter VII ECG Processing .................................................................................................................................. 137 Lenka Lhotská, Czech Technical University in Prague, Czech Republic Václav Chudáček, Czech Technical University in Prague, Czech Republic Michal Huptych, Czech Technical University in Prague, Czech Republic This chapter describes methods for preprocessing, analysis, feature extraction, visualization, and clas- sification of electrocardiogram (ECG) signals. First, preprocessing methods mainly based on the discrete wavelet transform are introduced. Then classification methods such as fuzzy rule-based decision trees and neural networks are presented. Two examples - visualization and feature extraction from Body Surface Potential Mapping (BSPM) signals and classification of Holter ECGs – illustrate how these methods are used. Chapter VIII EEG Data Mining Using PCA ............................................................................................................ 161 Lenka Lhotská, Czech Technical University in Prague, Czech Republic Vladimír Krajča, Faculty Hospital Na Bulovce, Czech Republic Jitka Mohylová, Technical University Ostrava, Czech Republic Svojmil Petránek, Faculty Hospital Na Bulovce, Czech Republic Václav Gerla, Czech Technical University in Prague, Czech Republic This chapter deals with the application of principal components analysis (PCA) to the field of data mining in electroencephalogram (EEG) processing. Possible applications of this approach include separation of different signal components for feature extraction in the field of EEG signal processing, adaptive seg- mentation, epileptic spike detection, and long-term EEG monitoring evaluation of patients in a coma. Chapter IX Generating and Verifying Risk Prediction Models Using Data Mining ............................................. 181 Darryl N. Davis, University of Hull, UK Thuy T.T. Nguyen, University of Hull, UK In this chapter, existing clinical risk prediction models are examined and matched to the patient data to which they may be applied using classification and data mining techniques, such as neural Nets. Novel risk prediction models are derived using unsupervised cluster analysis algorithms. All existing and derived models are verified as to their usefulness in medical decision support on the basis of their effectiveness on patient data from two UK sites.
  11. Chapter X Management of Medical Website Quality Labels via Web Mining .................................................... 206 Vangelis Karkaletsis, National Center of Scientific Research “Demokritos”, Greece Konstantinos Stamatakis, National Center of Scientific Research “Demokritos”, Greece Pythagoras Karampiperis, National Center of Scientific Research “Demokritos”, Greece Martin Labský, University of Economics, Prague, Czech Republic Marek Růžička, University of Economics, Prague, Czech Republic Vojtěch Svátek, University of Economics, Prague, Czech Republic Enrique Amigó Cabrera, ETSI Informática, UNED, Spain Matti Pöllä, Helsinki University of Technology, Finland Miquel Angel Mayer, Medical Association of Barcelona (COMB), Spain Dagmar Villarroel Gonzales, Agency for Quality in Medicine (AquMed), Germany This chapter deals with the problem of quality assessment of medical Web sites. The so called “quality labeling” process can benefit from employment of Web mining and information extraction techniques, in combination with flexible methods of Web-based information management developed within the Semantic Web initiative. Chapter XI Two Case-Based Systems for Explaining Exceptions in Medicine .................................................... 227 Rainer Schmidt, University of Rostock, Germany In medicine, doctors are often confronted with exceptions, both in medical practice or in medical research. One proper method of how to deal with exceptions is case-based systems. This chapter presents two such systems. The first one is a knowledge-based system for therapy support. The second one is designed for medical studies or research. It helps to explain cases that contradict a theoretical hypothesis. Section III Specific Cases This part shows results of several case studies of (mostly) data mining applied to various specific medi- cal problems. The problems covered by this part, range from discovery of biologically interpretable knowledge from gene expression data, over human embryo selection for the purpose of human in-vitro fertilization treatments, to diagnosis of various diseases based on machine learning techniques. Chapter XII Discovering Knowledge from Local Patterns in SAGE Data ............................................................. 251 Bruno Crémilleux, Université de Caen, France Arnaud Soulet, Université François Rabelais de Tours, France Jiří Kléma, Czech Technical University, in Prague, Czech Republic Céline Hébert, Université de Caen, France Olivier Gandrillon, Université de Lyon, France Current gene data analysis is often based on global approaches such as clustering. An alternative way is to utilize local pattern mining techniques for global modeling and knowledge discovery. This chapter proposes three data mining methods to deal with the use of local patterns by highlighting the most promis-
  12. ing ones or summarizing them. From the case study of the SAGE gene expression data, it is shown that this approach allows generating new biological hypotheses with clinical applications. Chapter XIII Gene Expression Mining Guided by Background Knowledge ........................................................... 268 Jiří Kléma, Czech Technical University in Prague, Czech Republic Filip Železný, Czech Technical University in Prague, Czech Republic Igor Trajkovski, Jožef Stefan Institute, Slovenia Filip Karel, Czech Technical University in Prague, Czech Republic Bruno Crémilleux, Université de Caen, France Jakub Tolar, University of Minnesota, USA This chapter points out the role of genomic background knowledge in gene expression data mining. Its application is demonstrated in several tasks such as relational descriptive analysis, constraint-based knowledge discovery, feature selection and construction, or quantitative association rule mining. Chapter XIV Mining Tinnitus Database for Knowledge .......................................................................................... 293 Pamela L. Thompson, University of North Carolina at Charlotte, USA Xin Zhang, University of North Carolina at Pembroke, USA Wenxin Jiang, University of North Carolina at Charlotte, USA Zbigniew W. Ras, University of North Carolina at Charlotte, USA Pawel Jastreboff, Emory University School of Medicine, USA This chapter describes the process used to mine a database containing data, related to patient visits dur- ing Tinnitus Retraining Therapy. The presented research focused on analysis of existing data, along with automating the discovery of new and useful features in order to improve classification and understanding of tinnitus diagnosis. Chapter XV Gaussian-Stacking Multiclassifiers for Human Embryo Selection..................................................... 307 Dinora A. Morales, University of the Basque Country, Spain Endika Bengoetxea, University of the Basque Country, Spain Pedro Larrañaga, Universidad Politécnica de Madrid, Spain This chapter describes a new multi-classification system using Gaussian networks to combine the outputs (probability distributions) of standard machine learning classification algorithms. This multi-classifica- tion technique has been applied to a complex real medical problem: The selection of the most promising embryo-batch for human in-vitro fertilization treatments. Chapter XVI Mining Tuberculosis Data ................................................................................................................... 332 Marisa A. Sánchez, Universidad Nacional del Sur, Argentina Sonia Uremovich, Universidad Nacional del Sur, Argentina Pablo Acrogliano, Hospital Interzonal Dr. José Penna, Argentina
  13. This chapter reviews current policies of tuberculosis control programs for the diagnosis of tuberculosis. A data mining project that uses WHO’s Direct Observation of Therapy data to analyze the relationship among different variables and the tuberculosis diagnostic category registered for each patient is then presented. Chapter XVII Knowledge-Based Induction of Clinical Prediction Rules ................................................................. 350 Mila Kwiatkowska, Thompson Rivers University, Canada M. Stella Atkins, Simon Fraser University, Canada Les Matthews, Thompson Rivers University, Canada Najib T. Ayas, University of British Columbia, Canada C. Frank Ryan, University of British Columbia, Canada This chapter describes how to integrate medical knowledge with purely inductive (data-driven) methods for the creation of clinical prediction rules. To address the complexity of the domain knowledge, the authors have introduced a semio-fuzzy framework, which has its theoretical foundations in semiotics and fuzzy logic. This integrative framework has been applied to the creation of clinical prediction rules for the diagnosis of obstructive sleep apnea, a serious and under-diagnosed respiratory disorder. Chapter XVIII Data Mining in Atherosclerosis Risk Factor Data .............................................................................. 376 Petr Berka, University of Economics, Prague, Czech Republic; Academy of Sciences of the Czech Republic, Prague, Czech Republic Jan Rauch, University of Economics, Praague, Czech Republic; Academy of Sciences of the Czech Republic, Prague, Czech Republic Marie Tomečková, Academy of Sciences of the Czech Republic, Prague, Czech Republic This chapter describes goals, current results, and further plans of long-time activity concerning the ap- plication of data mining and machine learning methods to the complex medical data set. The analyzed data set concerns longitudinal study of atherosclerosis risk factors. Compilation of References ............................................................................................................... 398 About the Contributors .................................................................................................................... 426 Index ................................................................................................................................................... 437
  14. xiv Foreword Current research directions are looking at Data Mining (DM) and Knowledge Management (KM) as complementary and interrelated fields, aimed at supporting, with algorithms and tools, the lifecycle of knowledge, including its discovery, formalization, retrieval, reuse, and update. While DM focuses on the extraction of patterns, information, and ultimately knowledge from data (Giudici, 2003; Fayyad et al., 1996; Bellazzi, Zupan, 2008), KM deals with eliciting, representing, and storing explicit knowledge, as well as keeping and externalizing tacit knowledge (Abidi, 2001; Van der Spek, Spijkervet, 1997). Although DM and KM have stemmed from different cultural backgrounds and their methods and tools are different, too, it is now clear that they are dealing with the same fundamental issues, and that they must be combined to effectively support humans in decision making. The capacity of DM to analyze data and to extract models, which may be meaningfully interpreted and transformed into knowledge, is a key feature for a KM system. Moreover, DM can be a very useful instrument to transform the tacit knowledge contained in transactional data into explicit knowledge, by making experts’ behavior and decision-making activities emerge. On the other hand, DM is greatly empowered by KM. The available, or background knowledge, (BK) is exploited to drive data gathering and experimental planning, and to structure the databases and data warehouses. BK is used to properly select the data, choose the data mining strategies, improve the data mining algorithms, and finally evaluates the data mining results (Bellazzi, Zupan, 2008; Bellazzi, Zupan, 2008). The output of the data analysis process is an update of the domain knowledge itself, which may lead to new experiments and new data gathering (see Figure 1). If the interaction and integration of DM and KM is important in all application areas, in medical applications it is essential (Cios, Moore, 2002). Data analysis in medicine is typically part of a complex reasoning process which largely depends on BK. Diagnosis, therapy, monitoring, and molecular research are always guided by the existing knowledge of the problem domain, on the population of patients or on the specific patient under consideration. Since medicine is a safety critical context (Fox, Das, 2000), Figure 1. Role of the background knowledge in the data mining process B ackground K no w ledg e E xp erim e ntal d esign D ata e xtraction D ata M ining P atterns D ata b ase d esign C ase -b ase definition interpretation
  15. xv decisions must always be supported by arguments, and the explanation of decisions and predictions should be mandatory for an effective deployment of DM models. DM and KM are thus becoming of great interest and importance for both clinical practice and research. As far as clinical practice is concerned, KM can be a key player in the current transformation of healthcare organizations (HCO). HCOs have currently evolved into complex enterprises in which managing knowledge and information is a crucial success factor in order to improve efficiency, (i.e. the capability of optimizing the use of resources, and efficacy, i.e. the capability to reach the clinical treat- ment outcome) (Stefanelli, 2004). The current emphasis on Evidence-based Medicine (EBM) is one of the main reasons to utilize KM in clinical practice. EBM proposes strategies to apply evidence gained from scientific studies for the care of individual patients (Sackett, 2004). Such strategies are usually provided as clinical practice guidelines or individualized decision making rules and may be considered as an example of explicit knowledge. Of course, HCO must also manage the empirical and experiential (or tacit) knowledge mirrored by the day-by-day actions of healthcare providers. An important research effort is therefore to augment the use of the so-called “process data” in order to improve the quality of care (Montani et al., 2006; Bellazzi et al. 2005). These process data include patients’ clinical records, healthcare provider actions (e.g. exams, drug administration, surgeries) and administrative data (admis- sions, discharge, exams request). DM may be the natural instrument to deal with this problem, providing the tools for highlighting patterns of actions and regularities in the data, including the temporal relation- ships between the different events occurring during the HCO activities (Bellazzi et al. 2005). Biomedical research is another driving force that is currently pushing towards the integration of KM and DM. The discovery of the genetic factors underlying the most common diseases, including for example cancer and diabetes, is enabled by the concurrence of two main factors: the availability of data at the genomic and proteomic scale and the construction of biological data repositories and ontologies, which accumulate and organize the considerable quantity of research results (Lang, 2006). If we represent the current research process as a reasoning cycle including inference from data, ranking of the hypothesis and experimental planning, we can easily understand the crucial role of DM and KM (see Figure 2). Figure 2. Data mining and knowledge management for supporting current biomedical research K no w led g e - A ccess t o data based repositories R anking Literature S e arch H ypothesis D ata M ining E xp erim e nt D ata A nalysis plannin g D ata and e vidence
  16. xvi In recent years, new enabling technologies have been made available to facilitate a coherent integra- tion of DM and KM in medicine and biomedical research. Firstly, the growth of Natural Language Processing (NLP) and text mining techniques is allowing the extraction of information and knowledge from medical notes, discharge summaries, and narrative patients’ reports. Rather interestingly, this process is however, always dependent on already formalized knowledge, often represented as medical terminologies (Savova et al., 2008; Cimiano et al., 2005). Indeed, medical ontologies and terminologies themselves may be learned (or at least improved or complemented) by resorting to Web mining and ontology learning techniques. Thanks to the large amount of information available on the Web in digital format, this ambitious goal is now at hand (Cimiano et al., 2005). The interaction between KM and DM is also shown by the current efforts on the construction of automated systems for filtering association rules learned from medical transaction databases. The avail- ability of a formal ontology allows the ranking of association rules by clarifying what are the rules confirming available medical knowledge, what are surprising but plausible, and finally, the ones to be filtered out (Raj et al., 2008). Another area where DM and KM are jointly exploited is Case-Based Reasoning (CBR). CBR is a problem solving paradigm that utilizes the specific knowledge of previously experienced situations, called cases. It basically consists in retrieving past cases that are similar to the current one and in reus- ing (by, if necessary, adapting) solutions used successfully in the past; the current case can be retained and put into the case library. In medicine, CBR can be seen as a suitable instrument to build decision support tools able to use tacit knowledge (Schmidt et al., 2001). The algorithms for computing the case similarity are typically derived from the DM field. However, case retrieval and situation assessment can be successfully guided by the available formalized background knowledge (Montani, 2008). Within the different technologies, some methods seem particularly suitable for fostering DM and KM integration. One of those is represented by Bayesian Networks (BN), which have now reached maturity and have been adopted in different biomedical application areas (Hamilton et al., 1995; Galan et al., 2002; Luciani et al., 2003). BNs allow to explicitly represent the knowledge available in terms of a directed acyclic graph structure and a collection of conditional probability tables, and to perform probabilistic inference (Spiegelhalter, Lauritzen, 1990). Moreover, several algorithms are available to learn both the graph structure and the underlying probabilistic model from the data (Cooper, Herskovits, 1992; Ramoni, Sebastiani, 2001). BNs can thus be considered at the conjunction of knowledge representation, automated reasoning, and machine learning. Other approaches, such as association and classification rules, joining the declarative nature of rules, and the availability of learning mechanisms including inductive logic programming, are of great potential for effectively merging DM and KM (Amini et al., 2007). At present, the widespread adoption of software solutions that may effectively implement KM strategies in the clinical settings is still to be achieved. However, the increasing abundance of data in bioinformatics, in health care insurance and administration, and in the clinics, is forcing the emergence of clinical data warehouses and data banks. The use of such data banks will require an integrated KM- DM approach. A number of important projects are trying to merge clinical and research objectives with a knowledge management perspective, such as the I2B2 project at Harvard (Heinze et al. 2008), or, on a smaller scale, the Hemostat (Bellazzi et al. 2005) and the Rhene systems in Italy (Montani et al., 2006). Moreover, several commercial solutions for the joint management of information, data, and knowledge are available on the market. It is almost inevitable that in the near future, DM and KM technologies will be an essential part of hospital and research information systems. The book “Data Mining and Medical Knowledge Management: Cases and Applications” is a collec- tion of case studies in which advanced DM and KM solutions are applied to concrete cases in biomedical research. The reader will find all the peculiarities of the medical field, which require specific solutions
  17. xvii to complex problems. The tools and methods applied are therefore much more than a simple adapta- tion of general purpose solutions: often they are brand-new strategies and always integrate data with knowledge. The DM and KM researchers are trying to cope with very interesting challenges, including the integration of background knowledge, the discovery of interesting and non-trivial relationships, the construction and discovery of models that can be easily understood by experts, the marriage of model discovery and decision support. KM and DM are taking shape and even more than today they will be in the future part of the set of basic instruments at the core of medical informatics. Riccardo Bellazzi Dipartimento di Informatica e Sistemistica, Università di Pavia Refe Rences Abidi, S. S. (2001). Knowledge management in healthcare: towards ‘knowledge-driven’ decision-sup- port services. Int J Med Inf, 63, 5-18. Amini, A., Muggleton, S. H., Lodhi, H., & Sternberg, M.J. (2007). A novel logic-based approach for quantitative toxicology prediction. J Chem Inf Model, 47(3), 998-1006. Bellazzi, R., Larizza, C., Magni, P., & Bellazzi, R. (2005). Temporal data mining for the quality assess- ment of hemodialysis services. Artif Intell Med, 34(1), 25-39. Bellazzi, R., & Zupan, B. (2007). Towards knowledge-based gene expression data mining. J Biomed Inform, 40(6), 787-802. Bellazzi, R, & Zupan, B. (2008). Predictive data mining in clinical medicine: current issues and guide- lines. Int J Med Inform, 77(2), 81-97. Cimiano, A., Hoto, A., & Staab, S. (2005). Learning concept hierarchies from text corpora using formal concept analysis. Journal of Artificial Intelligence Research, 24, 305-339. Cios, K. J., & Moore, G. W. (2002). Uniqueness of medical data mining. Artif Intell Med, 26, 1-24. Cooper, G. F, & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309-347. Dudley, J., & Butte, A. J. (2008). Enabling integrative genomic analysis of high-impact human diseases through text mining. Pac Symp Biocomput, 580-591. Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). Data mining and knowledge discovery in data- bases. Communications of the ACM, 39, 24-26. Fox, J., & Das, S. K. (2000). Safe and sound: artificial intelligence in hazardous applications. Cam- bridge, MA: MIT Press. Galan, S. F., Aguado, F., Diez, F. J., & Mira, J. (2002). NasoNet, modeling the spread of nasopharyngeal cancer with networks of probabilistic events in discrete time. Artif Intell Med, 25(3), 247-264. Giudici, P. (2003). Applied Data Mining, Statistical Methods for Business and Industry. Wiley & Sons.
  18. xviii Hamilton, P. W., Montironi, R., Abmayr, W., et al. (1995). Clinical applications of Bayesian belief net- works in pathology. Pathologica, 87(3), 237-245. Heinze, D. T., Morsch, M. L., Potter, B. C., & Sheffer, R.E Jr. (2008). Medical i2b2 NLP smoking chal- lenge: the A-Life system architecture and methodology. J Am Med Inform Assoc, 15(1), 40-3. Lang, E. (2006). Bioinformatics and its impact on clinical research methods. Findings from the Section on Bioinformatics. Yearb Med Inform, 104-6. Luciani, D., Marchesi, M., & Bertolini, G. (2003). The role of Bayesian Networks in the diagnosis of pulmonary embolism. J Thromb Haemost, 1(4), 698-707. Montani, S. (2008). Exploring new roles for case-based reasoning in heterogeneous AI systems for medical decision support. Applied Intelligence, 28(3), 275-285. Montani, S., Portinale, L., Leonardi, G., & Bellazzi, R. (2006). Case-based retrieval to support the treat- ment of end stage renal failure patients. Artif Intell Med, 37(1), 31-42. Raj, R., O’Connor, M. J., & Das, A. K. (2008). An Ontology-Driven Method for Hierarchical Mining of Temporal Patterns: Application to HIV Drug Resistance Research. AMIA Symp. Ramoni, M., & Sebastiani, P. (2001). Robust learning with Missing Data. Machine Learning, 45, 147- 170. Sackett, D. L., Rosenberg, W. M., Gray, J. A., Haynes, R B., & Richardson, W. S. (2004). Evidence based medicine: what it is and what it isn’t. BMJ, 312 (7023), 71-2. Savova, G. K., Ogren, P. V., Duffy, P. H., Buntrock, J. D., & Chute, C. G. (2008). Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc, 15(1), 25-8. Schmidt, R., Montani, S., Bellazzi, R., Portinale, L., & Gierl, L. (2001). Case-based reasoning for medi- cal knowledge-based systems. Int J Med Inform, 64(2-3), 355-367. Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probabilities on di- rected graphical structures. Networks, 20, 579-605. Stefanelli, M. (2004). Knowledge and process management in health care organizations. Methods Inf Med, 43(5), 525-35. Van der Spek, R, & Spijkervet, A. (1997). Knowledge management: dealing intelligently with knowl- edge. In J. Liebowitz & L.C. Wilcox (Eds.), Knowledge Management and its Integrative Elements. CRC Press, Boca Raton, FL, 1997. Ricardo Bellazzi is associate professor of medical informatics at the Dipartimento di Informatica e Sistemistica, University of Pavia, Italy. He teaches medical informatics and machine learning at the Faculty of Biomedical Engineering and bioinformat- ics at the Faculty of Biotechnology of the University of Pavia. He is a member of the board of the PhD in bioengineering and bioinformatics of the University of Pavia. Dr. Bellazzi is past-chairman of the IMIA working group of intelligent data analysis and data mining, program chair of the AIME 2007 conference and member of the program committee of several international conferences in medical informatics and artificial intelligence. He is member of the editorial board of Methods of Information in Medicine and of the Journal of Diabetes Science and Technology. He is affiliated with the American Medical Informatics Association and with the Italian Bioinformatics Society. His research interests are related to biomedical informatics, comprising data mining, IT-based management of chronic patients, mathematical modeling of biological systems, bioinformatics. Riccardo Bellazzi is author of more than 200 publications on peer-reviewed journals and international conferences.
  19. xix Preface The basic notion of the book “Data Mining and Medical Knowledge Management: Cases and Applica- tions” is knowledge. A number of definitions of this notion can be found in the literature: • Knowledge is the sum of what is known: the body of truth, information, and principles acquired by mankind. • Knowledge is human expertise stored in a person’s mind, gained through experience, and interac- tion with the person’s environment. • Knowledge is information evaluated and organized by the human mind so that it can be used pur- posefully, e.g., conclusions or explanations. • Knowledge is information about the world that allows an expert to make decisions. There are also various classifications of knowledge. A key distinction made by the majority of knowledge management practitioners is Nonaka's reformulation of Polanyi's distinction between tacit and explicit knowledge. By definition, tacit knowledge is knowledge that people carry in their minds and is, therefore, difficult to access. Often, people are not aware of the knowledge they possess or how it can be valuable to others. Tacit knowledge is considered more valuable because it provides context for people, places, ideas, and experiences. Effective transfer of tacit knowledge generally requires extensive personal contact and trust. Explicit knowledge is knowledge that has been or can be articulated, codified, and stored in certain media. It can be readily transmitted to others. The most common forms of explicit knowledge are manuals, documents, and procedures. We can add a third type of knowledge to this list, the implicit knowledge. This knowledge is hidden in a large amount of data stored in various databases but can be made explicit using some algorithmic approach. Knowledge can be further classified into procedural knowledge and declarative knowledge. Procedural knowledge is often referred to as knowing how to do something. Declarative knowledge refers to knowing that something is true or false. In this book we are interested in knowledge expressed in some language (formal, semi-formal) as a kind of model that can be used to support the decision making process. The book tackles the notion of knowledge (in the domain of medicine) from two different points of view: data mining and knowledge management. Knowledge Management (KM) comprises a range of practices used by organizations to identify, create, represent, and distribute knowledge. Knowledge Management may be viewed from each of the following perspectives: • Techno-centric: A focus on technology, ideally those that enhance knowledge sharing/growth. • Organizational: How does the organization need to be designed to facilitate knowledge processes? Which organizations work best with what processes?
Đồng bộ tài khoản