The TOKEn project: knowledge synthesis for in silico science

OBJECTIVE The conduct of investigational studies that involve large-scale data sets presents significant challenges related to the discovery and testing of novel hypotheses capable of supporting in silico discovery science. The use of what are known as Conceptual Knowledge Discovery in Databases (CKDD) methods provides a potential means of scaling hypothesis discovery and testing approaches for large data sets. Such methods enable the high-throughput generation and evaluation of knowledge-anchored relationships between complexes of variables found in targeted data sets. METHODS The authors have conducted a multipart model formulation and validation process, focusing on the development of a methodological and technical approach to using CKDD to support hypothesis discovery for in silico science. The model the authors have developed is known as the Translational Ontology-anchored Knowledge Discovery Engine (TOKEn). This model utilizes a specific CKDD approach known as Constructive Induction to identify and prioritize potential hypotheses related to the meaningful semantic relationships between variables found in large-scale and heterogeneous biomedical data sets. RESULTS The authors have verified and validated TOKEn in the context of a translational research data repository maintained by the NCI-funded Chronic Lymphocytic Leukemia Research Consortium. Such studies have shown that TOKEn is: (1) computationally tractable; and (2) able to generate valid and potentially useful hypotheses concerning relationships between phenotypic and biomolecular variables in that data collection. CONCLUSIONS The TOKEn model represents a potentially useful and systematic approach to knowledge synthesis for in silico discovery science in the context of large-scale and multidimensional research data sets.

[1]  Philip R. O. Payne,et al.  Supporting the Design of Translational Clinical Studies through the Generation and Verification of Conceptual Knowledge-anchored Hypotheses , 2008, AMIA.

[2]  Stephen B. Johnson,et al.  Conceptual knowledge acquisition in biomedicine: A methodological review , 2007, J. Biomed. Informatics.

[3]  Stephen B. Johnson,et al.  Breaking the Translational Barriers: The Value of Integrating Biomedical Informatics and Translational Research , 2005, Journal of Investigative Medicine.

[4]  Kun Huang,et al.  Multi-dimensional discovery of biomarker and phenotype complexes , 2010, BMC Bioinformatics.

[5]  Peter J. Embi,et al.  Evaluating the Impact of Conceptual Knowledge Engineering on the Design and Usability of a Clinical and Translational Science Collaboration Portal , 2010, Summit on translational bioinformatics.

[6]  A Burgun,et al.  Accessing and Integrating Data and Knowledge for Biomedical Research , 2008, Yearbook of Medical Informatics.

[7]  Stephen B. Johnson,et al.  Central challenges facing the national clinical research enterprise. , 2003, JAMA.

[8]  Yang Xiang,et al.  Using Frequent Co-expression Network to Identify Gene Clusters for Breast Cancer Prognosis , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[9]  Stephen B. Johnson,et al.  Improving Clinical Trial Participant Tracking Tools Using Knowledge-anchored Design Methodologies. , 2010, Applied clinical informatics.

[10]  Philip R. O. Payne,et al.  Ontology-anchored Approaches to Conceptual Knowledge Discovery in a Multi-dimensional Research Data Repository , 2008, Summit on translational bioinformatics.

[11]  Louiqa Raschid,et al.  Using Annotations from Controlled Vocabularies to Find Meaningful Associations , 2007, DILS.

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  E. Zerhouni Translational and clinical science--time for a new vision. , 2005, The New England journal of medicine.

[14]  Eneida A. Mendonça,et al.  Modeling Participant-Related Clinical Research Events Using Conceptual Knowledge Acquisition Techniques , 2007, AMIA.

[15]  Stephen B. Johnson,et al.  Reengineering Clinical Research with Informatics , 2006, Journal of Investigative Medicine.

[16]  O Bodenreider,et al.  Biomedical ontologies in action: role in knowledge management, data integration and decision support. , 2008, Yearbook of medical informatics.

[17]  S. Horvath,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[18]  E. Zerhouni US biomedical research: basic, translational, and clinical sciences. , 2005, JAMA.

[19]  Philip R. O. Payne,et al.  Clinical research informatics: challenges, opportunities and definition for an emerging domain. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[20]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .