Bioinformatics: biomarkers of early detection.

Capturing, sharing, and publishing cancer biomarker research data are all fundamental challenges of enabling new opportunities to research and understand scientific data. Informatics experts from the National Cancer Institute's (NCI) Early Detection Research Network (EDRN) have pioneered a principled informatics infrastructure to capture and disseminate data from biomarker validation studies, in effect, providing a national-scale, real-world successful example of how to address these challenges. EDRN is a distributed, collaborative network and it requires its infrastructure to support research across cancer research institutions and across their individual laboratories. The EDRN informatics infrastructure is also referred to as the EDRN Knowledge Environment, or EKE. EKE connects information about biomarkers, studies, specimens and resulting scientific data, allowing users to search, download and compare each of these disparate sources of cancer research information. EKE's data is enriched by providing annotations that describe the research results (biomarkers, protocols, studies) and that link the research results to the captured information within EDRN (raw instrument datasets, specimens, etc.). In addition EKE provides external links to public resources related to the research results and captured data. EKE has leveraged and reused data management software technologies originally developed for planetary and earth science research results and has infused those capabilities into biomarker research. This paper will describe the EDRN Knowledge Environment, its deployment to the EDRN enterprise, and how a number of these challenges have been addressed through the capture and curation of biomarker data results.

[1]  Nenad Medvidovic,et al.  A software architecture-based framework for highly distributed and data intensive scientific applications , 2006, ICSE.

[2]  J. Steven Hughes,et al.  The Planetary Data System. A Case Study in the Development and Management of Meta-Data for a Scientific Digital Library , 1998, ECDL.

[3]  Chris Mattmann,et al.  An Extensible Biomarker Curation Approach and Software Infrastructure for the Early Detection of Cancer , 2009, HEALTHINF.

[4]  Daniel J. Crichton,et al.  The semantic planetary data system , 2005 .

[5]  C. Lynch Big data: How do your data grow? , 2008, Nature.

[6]  Daniel J. Crichton,et al.  A Distributed Data Architecture for 2001 Mars Odyssey Data Distribution , 2003 .

[7]  Larry Kerschberg,et al.  A software architectural design method for large-scale distributed information systems , 1996, Distributed Syst. Eng..

[8]  Chris Mattmann,et al.  Intelligent resource discovery using ontology-based resource profiles , 2005, Data Sci. J..

[9]  David B. Keator,et al.  A National Human Neuroimaging Collaboratory Enabled by the Biomedical Informatics Research Network (BIRN) , 2008, IEEE Transactions on Information Technology in Biomedicine.

[10]  Arie Shoshani,et al.  The Earth System Grid: Supporting the Next Generation of Climate Modeling Research , 2005, Proceedings of the IEEE.

[11]  Stefan Decker,et al.  Ontology-Based Resource Matching in the Grid - The Grid Meets the Semantic Web , 2003, SEMWEB.

[12]  Anthony Finkelstein,et al.  Relating Requirements and Architectures: A Study of Data-Grids , 2004, Journal of Grid Computing.

[13]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[14]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[15]  M S Pepe,et al.  Phases of biomarker development for early detection of cancer. , 2001, Journal of the National Cancer Institute.

[16]  Wei Chu,et al.  A machine learning approach for the curation of biomedical literature: KDD Cup 2002 (task 1) , 2002, SKDD.

[17]  S. Slavney,et al.  The planetary data system , 1994 .

[18]  James Hetherington,et al.  Computational challenges of systems biology , 2004, Computer.

[19]  Ian T. Foster,et al.  The anatomy of the grid: enabling scalable virtual organizations , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[20]  Daniel J. Crichton,et al.  An ontology-based archive information model for the planetary science community , 2008 .

[21]  Daniel J. Crichton,et al.  A classification and evaluation of data movement technologies for the delivery of highly voluminous scientific data products , 2006 .

[22]  K. Birmingham,et al.  An inauspicious start for the US National Biospecimen Network. , 2004, The Journal of clinical investigation.

[23]  Henrik Eriksson,et al.  Metatools for knowledge acquisition , 1993, IEEE Software.

[24]  Chris Mattmann,et al.  Software architecture for large-scale, distributed, data-intensive systems , 2004, Proceedings. Fourth Working IEEE/IFIP Conference on Software Architecture (WICSA 2004).

[25]  Nenad Medvidovic,et al.  Modeling software architectures in the Unified Modeling Language , 2002, TSEM.

[26]  Steven Tuecke,et al.  The Physiology of the Grid An Open Grid Services Architecture for Distributed Systems Integration , 2002 .

[27]  Marija Mikic-Rakic,et al.  GLIDE: A Grid-Based Light-Weight Infrastructure for Data-Intensive Environments , 2005, EGC.

[28]  Richard N. Taylor,et al.  A Classification and Comparison Framework for Software Architecture Description Languages , 2000, IEEE Trans. Software Eng..

[29]  Hasan Davulcu,et al.  Collaborative Curation of Data from Bio-medical Texts and Abstracts and Its integration , 2005, DILS.

[30]  Sudhir Srivastava,et al.  An interoperable data architecture for data exchange in a biomedical research network , 2001, Proceedings 14th IEEE Symposium on Computer-Based Medical Systems. CBMS 2001.

[31]  Daniel J. Crichton,et al.  Creating a National Virtual Knowledge Environment for Proteomics and Information Management , 2005 .

[32]  Tom Reynolds Validating biomarkers: early detection research network launches first phase III study. , 2003, Journal of the National Cancer Institute.

[33]  Susanne Patig,et al.  Evolution of entity-relationship modelling , 2006, Data Knowl. Eng..

[34]  Nenad Medvidovic,et al.  Unlocking the grid , 2005, CBSE'05.

[35]  David Gavaghan,et al.  Challenges of ultra large scale integration of biomedical computing systems , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[36]  Ian T. Foster,et al.  The Anatomy of the Grid: Enabling Scalable Virtual Organizations , 2001, Int. J. High Perform. Comput. Appl..

[37]  Daniel J. Crichton,et al.  A Science Data System Architecture for Information Retrieval , 2003, Clustering and Information Retrieval.