A Approach to Clinical Proteomics Data Quality Control and Import

Biomedical domain and proteomics in particular are faced with an increasing volume of data. The heterogeneity of data sources implies heterogeneity in the representation and in the content of data. Data may also be incorrect, implicate errors and can compromise the analysis of experiments results. Our approach aims to ensure the initial quality of data during import into an information system dedicated to proteomics. It is based on the joint use of models, which represent the system sources, and ontologies, which are use as mediators between them. The controls, we propose, ensure the validity of values, semantics and data consistency during import process.

[1]  Thomas C. Redman,et al.  Data Quality: The Field Guide , 2001 .

[2]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[3]  Andrew D. Spear Ontology for the Twenty First Century: An Introduction with Recommendations , 2006 .

[4]  Ian Horrocks,et al.  A proposal for an owl rules language , 2004, WWW '04.

[5]  Vijayan Sugumaran,et al.  Ontologies for conceptual modeling: their creation, use, and management , 2002, Data Knowl. Eng..

[6]  Aris M. Ouksel,et al.  A classification of semantic conflicts in heterogeneous database systems , 1995, J. Organ. Comput..

[7]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[8]  Stuart E. Madnick,et al.  A Metadata Approach to Resolving Semantic Conflicts , 2011, VLDB.

[9]  Sunil Arya,et al.  Space-time tradeoffs for approximate nearest neighbor searching , 2009, JACM.

[10]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[11]  Robert L. Ashenhurst,et al.  Ontological aspects of information modeling , 1996, Minds and Machines.

[12]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[13]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, SEMWEB.

[14]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[15]  Stuart E. Madnick,et al.  Representing and reasoning about semantic conflicts in heterogeneous information systems , 1997 .

[16]  S J Willson Measuring inconsistency in phylogenetic trees. , 1998, Journal of theoretical biology.

[17]  Samir AbdelRahman,et al.  A Multiple-Domain Ontology Builder , 2010, COLING.

[18]  John V. Carlis,et al.  Genomic data modeling , 2003, Inf. Syst..

[19]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[20]  Boris Motik,et al.  Reconciling description logics and rules , 2010, JACM.

[21]  Omar Chiotti,et al.  A process for building a domain ontology: an experience in developing a government budgetary ontology , 2006 .

[22]  Ronald G. Ross,et al.  Principles of the business rule approach: Ronald G. Ross, Addison-Wesley Information Technology Series, February 2003, 256pp., price £30.99, ISBN 0-201-78893-4 , 2004, Int. J. Inf. Manag..

[23]  Hans Jürgen Ohlbach GWAI-92: Advances in Artificial Intelligence , 1993, Lecture Notes in Computer Science.

[24]  OntologiesGio WiederholdStanford UniversityNovember Interoperation, Mediation, and Ontologies , 1994 .

[25]  Marc Linster Viewing Knowledge Engineering as a Symbiosis of Modeling to Make Sense and Modeling to Implement Systems , 1992, GWAI.

[26]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[27]  Tharam S. Dillon,et al.  On the Move to Meaningful Internet Systems, OTM 2010 , 2010, Lecture Notes in Computer Science.

[28]  Bob J. Wielinga,et al.  Using explicit ontologies in KBS development , 1997, Int. J. Hum. Comput. Stud..

[29]  Jérôme Euzenat,et al.  Ten Challenges for Ontology Matching , 2008, OTM Conferences.