A bioinformatics system for searching Co-Occurrence based on Co-Operational Formation with Advanced Method (COCOFAM)

Literature analysis is a key step in obtaining background information in biomedical research. However, it is difficult for researchers to obtain knowledge of their interests in an efficient manner because of the massive amount of the published biomedical literature. Therefore, efficient and systematic search strategies are required, which allow ready access to the substantial amount of literature. In this paper, we propose a novel search system, named Co-Occurrence based on Co-Operational Formation with Advanced Method(COCOFAM) which is suitable for the large-scale literature analysis. COCOFAM is based on integrating both Spark for local clusters and a global job scheduler to gather crowdsourced co-occurrence data on global clusters. It will allow users to obtain information of their interests from the substantial amount of literature.

[1]  Matthias Frisch,et al.  LitInspector: literature and signal transduction pathway mining in PubMed abstracts , 2009, Nucleic Acids Res..

[2]  Miguel A. Andrade-Navarro,et al.  PESCADOR, a web-based tool to assist text-mining of biointeractions extracted from PubMed queries , 2011, BMC Bioinformatics.

[3]  Rajiv Ranjan,et al.  G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..

[4]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[5]  Sophia Ananiadou,et al.  Event-based text mining for biology and functional genomics , 2014, Briefings in functional genomics.

[6]  David S. Wishart,et al.  Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites , 2008 .

[7]  Lawrence Hunter,et al.  Biomedical Discovery Acceleration, with Applications to Craniofacial Development , 2009, PLoS Comput. Biol..

[8]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[9]  David P. Anderson,et al.  High-performance task distribution for volunteer computing , 2005, First International Conference on e-Science and Grid Computing (e-Science'05).

[10]  Yon Dohn Chung,et al.  Tajo: A distributed data warehouse system on large clusters , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[11]  Umar Farooq Minhas,et al.  SQL-on-Hadoop: Full Circle Back to Shared-Nothing Database Architectures , 2014, Proc. VLDB Endow..

[12]  Yusuke Miyao,et al.  AKANE System : Protein-Protein Interaction 1 AKANE System : Protein-Protein Interaction Pairs in the BioCreAtIvE 2 Challenge , PPI-IPS subtask , 2007 .

[13]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[14]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[15]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[16]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[17]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[18]  Alfonso Valencia,et al.  Implementing the iHOP concept for navigation of biomedical literature , 2005, ECCB/JBI.

[19]  Don Tapscott,et al.  Wikinomics: How Mass Collaboration Changes Everything , 2006 .

[20]  Umeshwar Dayal,et al.  The architecture of an active database management system , 1989, SIGMOD '89.

[21]  Douglas Crockford,et al.  The application/json Media Type for JavaScript Object Notation (JSON) , 2006, RFC.

[22]  Steve Vinoski,et al.  Node.js: Using JavaScript to Build High-Performance Network Programs , 2010, IEEE Internet Comput..

[23]  C. Fuchs Don Tapscott & Anthony D. Williams: Wikinomics: How Mass Collaboration Changes Everything , 2008 .