LARGE SCALE ANNOTATION OF BIOMOLECULAR DATA USING INTEGRATED DATABASE MANAGEMENT TOOLS

[1]  P. Salamon,et al.  Metagenomic Analyses of an Uncultured Viral Community from Human Feces , 2003, Journal of bacteriology.

[2]  Simon Foucart,et al.  WGSQuikr: Fast Whole-Genome Shotgun Metagenomic Classification , 2014, PloS one.

[3]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[4]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[5]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[6]  Qichao Tu,et al.  Strain/species identification in metagenomes using genome-specific markers , 2014, Nucleic acids research.

[7]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[8]  Ni Li,et al.  Gene Ontology Annotations and Resources , 2012, Nucleic Acids Res..

[9]  Richard Hull,et al.  Managing semantic heterogeneity in databases: a theoretical prospective , 1997, PODS.

[10]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[11]  P. Pevzner,et al.  De Novo Repeat Classification and Fragment Assembly , 2004 .

[12]  T. Ideker,et al.  A decade of systems biology. , 2010, Annual review of cell and developmental biology.

[13]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[14]  Gaston H. Gonnet,et al.  OMA Browser - Exploring orthologous relations across 352 complete genomes , 2007, Bioinform..

[15]  Monzoorul Haque Mohammed,et al.  SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences , 2009, Bioinform..

[16]  Thure Etzold,et al.  SRS - an indexing and retrieval tool for flat file data libraries , 1993, Comput. Appl. Biosci..

[17]  Griffin Jl,et al.  Metabonomics: its potential as a tool in toxicology for safety assessment and data integration. , 2004 .

[18]  E. Birney,et al.  EnsMart: a generic system for fast and flexible access to biological data. , 2003, Genome research.

[19]  H. Ellegren Genome sequencing and population genomics in non-model organisms. , 2014, Trends in ecology & evolution.

[20]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[21]  Alejandro A. Schäffer,et al.  Improved BLAST searches using longer words for protein seeding , 2007, Bioinform..

[22]  Yu Zhang,et al.  An Eulerian path approach to local multiple alignment for DNA sequences. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[23]  A. Halpern,et al.  The Sorcerer II Global Ocean Sampling Expedition: Metagenomic Characterization of Viruses within Aquatic Microbial Samples , 2008, PloS one.

[24]  B. V. Bronk,et al.  A review of molecular recognition technologies for detection of biological threat agents. , 2000, Biosensors & bioelectronics.

[25]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[26]  S. Pongor,et al.  ComQXPA Quorum Sensing Systems May Not Be Unique to Bacillus subtilis: A Census in Prokaryotic Genomes , 2014, PloS one.

[27]  Peter D. Karp,et al.  A Strategy for Database Interoperation , 1995, J. Comput. Biol..

[28]  Stephan Philippi Data and knowledge integration in the life sciences , 2008, Briefings Bioinform..

[29]  J. Kawai,et al.  Direct Metagenomic Detection of Viral Pathogens in Nasal and Fecal Specimens Using an Unbiased High-Throughput Sequencing Approach , 2009, PloS one.

[30]  Arek Kasprzyk,et al.  BioMart: driving a paradigm change in biological data management , 2011, Database J. Biol. Databases Curation.

[31]  Golan Yona,et al.  Hubs of knowledge: using the functional link structure in Biozon to mine for biologically significant entities , 2006, BMC Bioinformatics.

[32]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[33]  Yasset Perez-Riverol,et al.  A parallel systematic-Monte Carlo algorithm for exploring conformational space. , 2012, Current topics in medicinal chemistry.

[34]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[35]  R. Staden A strategy of DNA sequencing employing computer programs. , 1979, Nucleic acids research.

[36]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .

[37]  Ian T. Foster,et al.  Designing and building parallel programs - concepts and tools for parallel software engineering , 1995 .

[38]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[39]  Robert Longtin An integrated approach: systems biology seeks order in complexity. , 2005, Journal of the National Cancer Institute.

[40]  Daniel H. Huson,et al.  Methods for comparative metagenomics , 2009, BMC Bioinformatics.

[41]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[42]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[43]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[44]  Anthony K. H. Tung,et al.  Indexing DNA Sequences Using q-Grams , 2005, DASFAA.

[45]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[46]  M. Snyder,et al.  Investigating metabolite-protein interactions: an overview of available techniques. , 2012, Methods.

[47]  Anne E. Trefethen,et al.  Toward interoperable bioscience data , 2012, Nature Genetics.

[48]  Adam Godzik,et al.  Tolerating some redundancy significantly speeds up clustering of large protein databases , 2002, Bioinform..

[49]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..

[50]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[51]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2012 update , 2011, Nucleic Acids Res..

[52]  Bin Ma,et al.  Patternhunter Ii: Highly Sensitive and Fast Homology Search , 2004, J. Bioinform. Comput. Biol..

[53]  Qin Zhao,et al.  NCIR: a database of non-canonical interactions in known RNA structures , 2002, Nucleic Acids Res..

[54]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[55]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[56]  Steven J. M. Jones,et al.  Slider—maximum use of probability information for alignment of short sequence reads and SNP detection , 2008, Bioinform..

[57]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[58]  Gaston H. Gonnet,et al.  OMA 2011: orthology inference among 1000 complete genomes , 2010, Nucleic Acids Res..

[59]  A. Gnirke,et al.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads , 2009, Genome Biology.

[60]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[61]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[62]  Yang Li,et al.  HMDD v2.0: a database for experimentally supported human microRNA and disease associations , 2013, Nucleic Acids Res..

[63]  Herbert Thiele,et al.  Bioinformatics Strategies in Life Sciences: from Data Processing and Data Warehousing to Biological Knowledge Extraction , 2022 .

[64]  Claire O'Donovan,et al.  Biocurators and Biocuration: surveying the 21st century challenges , 2012, Database J. Biol. Databases Curation.

[65]  David Fenyö,et al.  The Biopolymer Markup Language , 1999, Bioinform..

[66]  M. Loessner,et al.  Bacteriophage P100 for control of Listeria monocytogenes in foods: genome sequence, bioinformatic analyses, oral toxicity study, and application. , 2005, Regulatory toxicology and pharmacology : RTP.

[67]  Michael Farrar,et al.  Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[68]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[69]  Michael Y. Galperin,et al.  The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection , 2013, Nucleic Acids Res..

[70]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[71]  Miguel Ángel Medina,et al.  Systems biology for molecular life sciences and its impact in biomedicine , 2012, Cellular and Molecular Life Sciences.

[72]  Peter D. Karp,et al.  MetaCyc: a multiorganism database of metabolic pathways and enzymes. , 2004, Nucleic acids research.

[73]  Martin Reczko,et al.  DIANA-LncBase: experimentally verified and computationally predicted microRNA targets on long non-coding RNAs , 2012, Nucleic Acids Res..

[74]  C. Ouzounis,et al.  Expansion of the BioCyc collection of pathway/genome databases to 160 genomes , 2005, Nucleic acids research.

[75]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[76]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[77]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[78]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[79]  Rodrigo Lopez,et al.  Assembly information services in the European Nucleotide Archive , 2013, Nucleic Acids Res..

[80]  Neil Hall,et al.  Advanced sequencing technologies and their wider impact in microbiology , 2007, Journal of Experimental Biology.

[81]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[82]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[83]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[84]  Sándor Pongor,et al.  ProGMap: an integrated annotation resource for protein orthology , 2009, Nucleic Acids Res..

[85]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[86]  Eugene W. Myers,et al.  Basic local alignment search tool. Journal of Molecular Biology , 1990 .

[87]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[88]  Christian von Mering,et al.  STRING 8—a global view on proteins and their functional interactions in 630 organisms , 2008, Nucleic Acids Res..

[89]  Hamid Bolouri,et al.  A data integration methodology for systems biology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[91]  Qingfeng Chen,et al.  Analyzing Inconsistency Toward Enhancing Integration of Biological Molecular Databases , 2006, APBC.

[92]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[93]  Karen Eilbeck,et al.  A standard variation file format for human genome sequences , 2010, Genome Biology.

[94]  Toshihisa Takagi,et al.  DDBJ progress report: a new submission system for leading to a correct annotation , 2013, Nucleic Acids Res..

[95]  J. Kast,et al.  Chemical proteomics and its impact on the drug discovery process , 2012, Expert review of proteomics.

[96]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[97]  Christian von Mering,et al.  eggNOG: automated construction and annotation of orthologous groups of genes , 2007, Nucleic Acids Res..

[98]  Laura M. Haas,et al.  DiscoveryLink: A system for integrated access to life sciences data sources , 2001, IBM Syst. J..

[99]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[100]  Peter Buneman,et al.  Challenges in Integrating Biological Data Sources , 1995, J. Comput. Biol..

[101]  L. Wong,et al.  Technologies for Integrating Biological Data , 2002, Briefings Bioinform..

[102]  James R. Cole,et al.  Ribosomal Database Project: data and tools for high throughput rRNA analysis , 2013, Nucleic Acids Res..

[103]  Walter V. Sujansky,et al.  Heterogeneous Database Integration in Biomedicine , 2001, J. Biomed. Informatics.

[104]  Alex Bateman,et al.  Databases, data tombs and dust in the wind , 2008, Bioinform..

[105]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[106]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[107]  T. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2006, Nucleic Acids Res..

[108]  David Eppstein,et al.  Sparse dynamic programming , 1990, SODA '90.

[109]  Chunhui Li,et al.  Exploring the diversity of complex metabolic networks , 2005, Bioinform..

[110]  Christian von Mering,et al.  Genome organization: Teamed up for transcription , 2002, Nature.

[111]  L. Hood,et al.  A data integration methodology for systems biology: experimental verification. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[112]  Fabian Schreiber,et al.  Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information , 2011, Briefings Bioinform..

[113]  Ralf Hofestädt,et al.  BioDWH: A Data Warehouse Kit for Life Science Data Integration , 2008, J. Integr. Bioinform..

[114]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[115]  Lydia E. Kavraki,et al.  Computational challenges in systems biology , 2009, Comput. Sci. Rev..

[116]  Olga G. Troyanskaya,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm332 Data and text mining , 2022 .

[117]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[118]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[119]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[120]  Marco Masseroli,et al.  Quality controls in integrative approaches to detect errors and inconsistencies in biological databases , 2010, J. Integr. Bioinform..

[121]  Steven J. M. Jones,et al.  High quality SNP calling using Illumina data at shallow coverage , 2010, Bioinform..

[122]  D. Cowan,et al.  Review and re-analysis of domain-specific 16S primers. , 2003, Journal of microbiological methods.

[123]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[124]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[125]  Rafael C. Jimenez,et al.  The IntAct molecular interaction database in 2012 , 2011, Nucleic Acids Res..

[126]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[127]  Xiaoqiu Huang,et al.  Generating a Genome Assembly with PCAP , 2005, Current protocols in bioinformatics.

[128]  L. Stein Integrating biological databases , 2003, Nature Reviews Genetics.

[129]  Bobbie-Jo M. Webb-Robertson,et al.  Current trends in computational inference from mass spectrometry-based proteomics , 2007, Briefings Bioinform..

[130]  Tao Xu,et al.  Atlas – a data warehouse for integrative bioinformatics , 2005, BMC Bioinformatics.

[131]  Faraz Hach,et al.  mrsFAST: a cache-oblivious algorithm for short-read mapping , 2010, Nature Methods.

[132]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[133]  Darren A. Natale,et al.  The COG database: an updated version includes eukaryotes , 2003, BMC Bioinformatics.

[134]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[135]  Esko Ukkonen,et al.  Two Algorithms for Approximate String Matching in Static Texts , 1991, MFCS.

[136]  Bart De Moor,et al.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis , 2005, Bioinform..

[137]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[138]  Scott Federhen,et al.  The NCBI Taxonomy database , 2011, Nucleic Acids Res..

[139]  Tulika Prakash,et al.  Functional assignment of metagenomic data: challenges and applications , 2012, Briefings Bioinform..

[140]  Markus Müller,et al.  In silico analysis of accurate proteomics, complemented by selective isolation of peptides. , 2011, Journal of proteomics.

[141]  Mark D. Wilkinson,et al.  BioMOBY: An Open Source Biological Web Services Proposal , 2002, Briefings Bioinform..

[142]  S. Gygi,et al.  Proteome analysis: Biological assay or data archive? , 1998, Electrophoresis.

[143]  Alice Carolyn McHardy,et al.  Taxonomic binning of metagenome samples generated by next-generation sequencing technologies , 2012, Briefings Bioinform..

[144]  Haruki Nakamura,et al.  The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data , 2006, Nucleic Acids Res..

[145]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[146]  Avi Ma'ayan,et al.  Lists2Networks: Integrated analysis of gene/protein lists , 2010, BMC Bioinformatics.

[147]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[148]  Emmanuel Barillot,et al.  XML, bioinformatics and data integration , 2001, Bioinform..

[149]  K. Reinert,et al.  RazerS--fast read mapping with sensitivity control. , 2009, Genome research.

[150]  Wing Hung Wong,et al.  SeqMap: mapping massive amount of oligonucleotides to the genome , 2008, Bioinform..

[151]  Russ B. Altman,et al.  Editorial: Building successful biological databases , 2004, Briefings Bioinform..

[152]  Haixu Tang,et al.  Fragment assembly with double-barreled data , 2001, ISMB.

[153]  Golan Yona,et al.  BIOZON: a system for unification, management and analysis of heterogeneous biological data , 2006, BMC Bioinformatics.

[154]  R. Overbeek,et al.  Overview of the Integrated Genomic Data system (IGD) , 1992 .

[155]  Daniel R. Zerbino,et al.  Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler , 2009, PloS one.

[156]  Yasset Perez-Riverol,et al.  Evaluation of phenylthiocarbamoyl-derivatized peptides by electrospray ionization mass spectrometry: selective isolation and analysis of modified multiply charged peptides for liquid chromatography-tandem mass spectrometry experiments. , 2010, Analytical chemistry.

[157]  Sándor Pongor,et al.  JBioWH: an open-source Java framework for bioinformatics data integration , 2013, Database J. Biol. Databases Curation.

[158]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[159]  James W. Cooper,et al.  Java design patterns , 2000 .

[160]  Alla Lapidus,et al.  A Bioinformatician's Guide to Metagenomics , 2008, Microbiology and Molecular Biology Reviews.

[161]  D. Donoho For most large underdetermined systems of linear equations the minimal 𝓁1‐norm solution is also the sparsest solution , 2006 .

[162]  U. Sauer,et al.  Getting Closer to the Whole Picture , 2007, Science.

[163]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[164]  Michael Brudno,et al.  SHRiMP: Accurate Mapping of Short Color-space Reads , 2009, PLoS Comput. Biol..

[165]  Monzoorul Haque Mohammed,et al.  SPHINX - an algorithm for taxonomic binning of metagenomic sequences , 2011, Bioinform..

[166]  Arun Kumar,et al.  Biological Databases- Integration of Life Science Data , 2012 .

[167]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[168]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[169]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[170]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[171]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[172]  James F. Brinkley,et al.  BioMediator Data Integration: Beyond Genomics to Neuroscience Data , 2005, AMIA.