Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions

Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination (Einstein never said that [1]).

[1]  I. B. Khriplovich General relativity , 2005 .

[2]  Andreas Holzinger,et al.  Disease-Disease Relationships for Rheumatic Diseases: Web-Based Biomedical Textmining an Knowledge Discovery to Assist Medical Decision Making , 2012, 2012 IEEE 36th Annual Computer Software and Applications Conference.

[3]  M. Boisot,et al.  Data, information and knowledge: have we got it right? , 2004 .

[4]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[5]  D. V. van Essen,et al.  Challenges and Opportunities in Mining Neuroscience Data , 2011, Science.

[6]  Andreas Holzinger,et al.  On Using Entropy for Enhancing Handwriting Preprocessing , 2012, Entropy.

[7]  Various Various,et al.  Topological Methods in Data Analysis and Visualization IIITheory, Algorithms, and Applications , 2014 .

[8]  Gerik Scheuermann,et al.  Topology-based Methods in Visualization , 2007, Topology-based Methods in Visualization.

[9]  David Koslicki,et al.  Topological entropy of DNA sequences , 2011, Bioinform..

[10]  Ruoming Jin,et al.  Graph and Topological Structure Mining on Scientific Articles , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[11]  Paul J. Kennedy,et al.  The curse of dimensionality: a blessing to personalized medicine. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Andreas Holzinger,et al.  Learning performance with interactive simulations in medical education: Lessons learned from results of learning complex physiological models with the HAEMOdynamics SIMulator , 2009, Comput. Educ..

[13]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[14]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[15]  Albert Y. Zomaya,et al.  Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data , 2013 .

[16]  Ernesto Estrada Spectral theory of networks : from biomolecular to ecological systems , 2009 .

[17]  Andreas Holzinger Weakly Structured Data in Health-Informatics: The Challenge for Human-Computer Interaction , 2011 .

[18]  Vibhav Garg,et al.  Cloud computing approaches to accelerate drug discovery value chain. , 2011, Combinatorial chemistry & high throughput screening.

[19]  Facundo Mémoli,et al.  Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition , 2007, PBG@Eurographics.

[20]  Frank Klawonn,et al.  Computational Intelligence: A Methodological Introduction , 2015, Texts in Computer Science.

[21]  Matthias Jarke,et al.  Architecture and Quality in Data Warehouses , 1998, CAiSE.

[22]  A Min Tjoa,et al.  Security aspects of ubiquitous computing in health care , 2006, Elektrotech. Informationstechnik.

[23]  Gerald Quirchmayr,et al.  Multidisciplinary Research and Practice for Information Systems , 2012, Lecture Notes in Computer Science.

[24]  Gerd Gigerenzer,et al.  Heuristic decision making. , 2011, Annual review of psychology.

[25]  Stephan Borgert,et al.  On Entropy-Based Molecular Descriptors: Statistical Analysis of Real and Synthetic Chemical Structures , 2009, J. Chem. Inf. Model..

[26]  Jeremiah Scholl,et al.  Empowering village doctors and enhancing rural healthcare using cloud computing in a rural area of mainland China , 2014, Comput. Methods Programs Biomed..

[27]  Andreas Holzinger Interacting with Information - Challenges in Human–Computer Interaction and Information Retrieval , 2011 .

[28]  Sean Ekins,et al.  Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. , 2012, Drug discovery today.

[29]  Tin Wee Tan,et al.  Towards big data science in the decade ahead from ten years of InCoB and the 1st ISCB-Asia Joint Conference , 2011, BMC Bioinformatics.

[30]  Alfred Inselberg Visualization of concept formation and learning , 2005 .

[31]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[32]  Alberto Leardini,et al.  Multimod Data Manager: A tool for data fusion , 2007, Comput. Methods Programs Biomed..

[33]  Andreas Holzinger,et al.  On Knowledge Discovery and Interactive Intelligent Visualization of Biomedical Data - Challenges in Human-Computer Interaction & Biomedical Informatics , 2012, DATA.

[34]  Jan Rauch,et al.  Lessons Learned from the ECML/PKDD Discovery Challenge on the Atherosclerosis Risk Factors Data , 2007, Comput. Informatics.

[35]  Heiko Mueller,et al.  Problems , Methods , and Challenges in Comprehensive Data Cleansing , 2005 .

[36]  Daniel A. Keim,et al.  Pixel-Oriented Visualization Techniques for Exploring Very Large Data Bases , 1996 .

[37]  Andreas Holzinger,et al.  KNODWAT: A scientific framework application for testing knowledge discovery methods for the biomedical domain , 2013, BMC Bioinformatics.

[38]  Katharina Morik,et al.  Combining Statistical Learning with a Knowledge-Based Approach - A Case Study in Intensive Care Monitoring , 1999, ICML.

[39]  Igor Jurisica,et al.  Data Integration in the Life Sciences: 9th International Conference, DILS 2013, Montreal, Canada, July 11-12, 2013, Proceedings , 2013 .

[40]  Alvis Brazma,et al.  Visualization of large microarray experiments with space maps , 2009, BMC Bioinformatics.

[41]  Andreas Holzinger,et al.  Speech Recognition in daily Hospital practice: Human-Computer Interaction Lessons learned , 2004 .

[42]  Igor Jurisica,et al.  Binary tree-structured vector quantization approach to clustering and visualizing microarray data , 2002, ISMB.

[43]  Andreas Holzinger,et al.  Usability of image fusion: optimal opacification of vessels and squamous cell carcinoma in CT scans , 2006, Elektrotech. Informationstechnik.

[44]  Ghassan Hamarneh,et al.  Exploration and Visualization of Segmentation Uncertainty using Shape and Appearance Prior Information , 2010, IEEE Transactions on Visualization and Computer Graphics.

[45]  Markus Kreuzthaler,et al.  Navigating through Very Large Sets of Medical Records: An Information Retrieval Evaluation Architecture for Non-standardized Text , 2011, USAB.

[46]  Gebhard Kirchgässner Alles Leben ist Problemlösen , 2002 .

[47]  Matthias Dehmer Structural Analysis of Complex Networks , 2010 .

[48]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[49]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[50]  Penny Rheingans,et al.  NIH-NSF visualization research challenges report summary , 2006, IEEE Computer Graphics and Applications.

[51]  Roberto Marcondes Cesar Junior,et al.  An environment for knowledge discovery in biology , 2004, Comput. Biol. Medicine.

[52]  Timothy M. Kowalewski,et al.  Exploratory Visualization of Surgical Training Databases for Improving Skill Acquisition , 2012, IEEE Computer Graphics and Applications.

[53]  Andreas Holzinger,et al.  On Knowledge Discovery in Open Medical Data on the Example of the FDA Drug Adverse Event Reporting System for Alendronate (Fosamax) , 2013, CHI-KDD.

[54]  Albert Y. Zomaya,et al.  Biological Knowledge Discovery Handbook , 2013 .

[55]  Martin Dugas,et al.  Medizinische Informatik und Bioinformatik , 2003 .

[56]  Harold W. Thimbleby,et al.  Human-Computer Interaction for Medicine and Health Care (HCI4MED): Towards making Information usable , 2010, Int. J. Hum. Comput. Stud..

[57]  H. Simon Studying human intelligence by creating artificial intelligence. , 1981, American scientist.

[58]  Ben Shneiderman,et al.  Inventing Discovery Tools: Combining Information Visualization with Data Mining1 , 2001, Inf. Vis..

[59]  Christophe Rigotti,et al.  From digital genetics to knowledge discovery: Perspectives in genetic network understanding , 2010, Intell. Data Anal..

[60]  Edward H. Shortliffe,et al.  Biomedical Informatics: Defining the Science and Its Role in Health Professional Education , 2011, USAB.

[61]  W. Bean Personal Knowledge: Towards a Post-Critical Philosophy , 1961 .

[62]  Ian T. Foster,et al.  Software as a service for data scientists , 2012, Commun. ACM.

[63]  Serene W. H. Wong,et al.  Integration, visualization and analysis of human interactome. , 2014, Biochemical and biophysical research communications.

[64]  Wan-Chi Siu,et al.  Improved techniques for automatic image segmentation , 2001, IEEE Trans. Circuits Syst. Video Technol..

[65]  Markus Kreuzthaler,et al.  On the Need for Open-Source Ground Truths for Medical Information Retrieval Systems , 2010 .

[66]  Lee B. Lusted Gamuts in Radiology: Comprehensive Lists of Roentgen Differential Diagnosis , 1976 .

[67]  BMC Bioinformatics , 2005 .

[68]  Hamish Carr,et al.  Topological Methods in Data Analysis and Visualization III, Theory, Algorithms, and Applications , 2011 .

[69]  Paul G Nagy,et al.  Cloud computing in medical imaging. , 2013, Medical physics.

[70]  Lennart Martens,et al.  Toward More Transparent and Reproducible Omics Studies Through a Common Metadata Checklist and Data Publications , 2013, Big Data.

[71]  Gabriele Taentzer,et al.  A Component Concept for Typed Graphs with Inheritance and Containment Structures , 2010, ICGT.

[72]  Oscar Cordón,et al.  A multiobjective evolutionary programming framework for graph-based data mining , 2013, Inf. Sci..

[73]  Andreas Holzinger,et al.  Biomedical Informatics. Computational Sciences meet Life Sciences. Lecture Notes to LV 444.152 , 2012 .

[74]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[75]  Haym Hirsh Data Mining Research: Current Status and Future Opportunities , 2008 .

[76]  André Calero Valdez,et al.  On Graph Entropy Measures for Knowledge Discovery from Publication Network Data , 2013, CD-ARES.

[77]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[78]  Dan R. Olsen Interacting with Information , 1995, DSV-IS.

[79]  John Mylopoulos,et al.  Case-based reasoning in IVF: prediction and knowledge mining , 1998, Artif. Intell. Medicine.

[80]  Markus Kreuzthaler,et al.  A Comparison of Different Retrieval Strategies Working on Medical Free Texts , 2011, J. Univers. Comput. Sci..

[81]  Paula A. Kiberstis All Eyes on Epigenetics , 2012 .

[82]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[83]  Melanie Tory,et al.  Human factors in visualization research , 2004, IEEE Transactions on Visualization and Computer Graphics.

[84]  Andreas Holzinger,et al.  Interactive Visualization for Information Analysis in Medical Diagnosis , 2011, USAB.

[85]  Thorsten Meinl,et al.  Graph based molecular data mining - an overview , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[86]  Andreas Holzinger,et al.  Human-Computer Interaction and Knowledge Discovery (HCI-KDD): What Is the Benefit of Bringing Those Two Fields to Work Together? , 2013, CD-ARES.

[87]  Stephan Borgert,et al.  INFORMATION INEQUALITIES FOR GRAPHS , 2008 .

[88]  Felix Naumann,et al.  Introduction to the special issue on data quality , 2013 .

[89]  Andreas Holzinger,et al.  Interactive Analysis and Visualization of Macromolecular Interfaces between Proteins , 2007, USAB.

[90]  Silvia Miksch,et al.  Visualization methods for data analysis and planning in medical applications , 2002, Int. J. Medical Informatics.

[91]  Matthias Dehmer,et al.  A history of graph entropy measures , 2011, Inf. Sci..

[92]  Igor Jurisica,et al.  Functional topology in a network of protein interactions , 2004, Bioinform..

[93]  Matthias Jarke,et al.  Architecture and Quality in Data Warehouses , 1998, CAiSE.

[94]  H. Edelsbrunner,et al.  Topological data analysis , 2011 .

[95]  Gernot R. Müller-Putz,et al.  Computational Sensemaking on Examples of Knowledge Discovery from Neuroscience Data: Towards Enhancing Stroke Rehabilitation , 2012, ITBAM.

[96]  Andreas Holzinger,et al.  On Visual Analytics and Evaluation in Cell Physiology: A Case Study , 2013, CD-ARES.

[97]  Ben Shneiderman Inventing discovery tools: combining information visualization with data mining? , 2002, Inf. Vis..

[98]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[99]  Elena Console,et al.  Data Fusion , 2009, Encyclopedia of Database Systems.

[100]  Richard A. Olshen,et al.  Assessing gene-level translational control from ribosome profiling , 2013, Bioinform..

[101]  Susanne Grabowski,et al.  Human-computer interaction viewed as pseudo-communication , 2001, Knowl. Based Syst..

[102]  Ana L. N. Fred,et al.  On Applying Approximate Entropy to ECG Signals for Knowledge Discovery on the Example of Big Sensor Data , 2012, AMT.

[103]  Helena M. Mentis,et al.  Fieldwork for Healthcare: Guidance for Investigating Human Factors in Computing Systems , 2014, Fieldwork for Healthcare: Guidance for Investigating Human Factors in Computing Systems.

[104]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[105]  Martina Ziefle,et al.  Informatics as Semiotics Engineering: Lessons Learned from Design, Development and Evaluation of Ambient Assisted Living Applications for Elderly People , 2011, HCI.

[106]  Andreas Holzinger,et al.  Quality-Based Knowledge Discovery from Medical Text on the Web , 2013, Quality Issues in the Management of Web Information.

[107]  Markus Kreuzthaler,et al.  Development of an Interactive Application for Learning Medical Procedures and Clinical Decision Making , 2011, USAB.

[108]  Helena M. Mentis,et al.  Fieldwork for Healthcare: Case Studies Investigating Human Factors in Computing Systems , 2014, Fieldwork for Healthcare: Case Studies Investigating Human Factors in Computing Systems.

[109]  Doheon Lee,et al.  A Taxonomy of Dirty Data , 2004, Data Mining and Knowledge Discovery.

[110]  William F. Punch,et al.  Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[111]  L. Hood,et al.  Predictive, personalized, preventive, participatory (P4) cancer medicine , 2011, Nature Reviews Clinical Oncology.

[112]  Andreas Holzinger Biomedical Informatics: Discovering Knowledge in Big Data , 2014 .

[113]  Russell Beale,et al.  Supporting serendipity: Using ambient intelligence to augment user exploration for data mining and web browsing , 2007, Int. J. Hum. Comput. Stud..

[114]  Andreas Holzinger,et al.  Semantic Information in Medical Information Systems: Utilization of Text Mining Techniques to Analyze Medical Diagnoses , 2008, J. Univers. Comput. Sci..

[115]  Igor Jurisica,et al.  Optimization of experimental design parameters for high-throughput chromatin immunoprecipitation studies , 2008, Nucleic acids research.

[116]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[117]  Heidrun Schumann,et al.  Visualization of Time-Oriented Data , 2011, Human-Computer Interaction Series.

[118]  Ann Blandford,et al.  Conceptual Design for Sensemaking , 2014, Handbook of Human Centric Visualization.

[119]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[120]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[121]  Haym Hirsh Data Mining Research: Current Status and Future Opportunities , 2008, Stat. Anal. Data Min..

[122]  Hongjun Lu,et al.  Cleansing Data for Mining and Warehousing , 1999, DEXA.

[123]  Milan Randic,et al.  On molecular identification numbers , 1984, J. Chem. Inf. Comput. Sci..

[124]  Christine Clavien Gerd Gigerenzer, Gut Feelings: Short Cuts to Better Decision Making , 2010 .

[125]  Igor Jurisica,et al.  High-throughput protein crystallization on the World Community Grid and the GPU , 2012 .

[126]  Stephan M. Winkler,et al.  On Text Preprocessing for Opinion Mining Outside of Laboratory Environments , 2012, AMT.

[127]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[128]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[129]  Daniel B. Carr,et al.  Some visualization challenges , 2001 .

[130]  Vimla L. Patel,et al.  Biomedical Complexity and Error , 2011, J. Biomed. Informatics.

[131]  Hanchuan Peng,et al.  BIOCAT: a pattern recognition platform for customizable biological image classification and annotation , 2013, BMC Bioinformatics.

[132]  Lawrence B. Holder,et al.  Graph-Based Data Mining , 2000, IEEE Intell. Syst..

[133]  Eduardo L. De Vito,et al.  Making it possible to measure knowledge, experience and intuition in diagnosing lung injury severity: a fuzzy logic vision based on the Murray score , 2010, BMC Medical Informatics Decis. Mak..

[134]  Declan Butler,et al.  2020 computing: Everything, everywhere , 2006, Nature.