TREC genomics special issue overview

Recent advances in biotechnology have changed the fundamental nature of biologicalresearch. Whereas scientists used to be able to manage their modest amount of experi-mental data in paper notebooks or simple spreadsheets, new tools such as gene chips formeasuring gene expression (Mobasheri et al. 2004) or sequence variation (Pennisi 2007)have fundamentally altered their work. Not only do these gene chips generate massiveamounts of data (as much as tens of thousands of data points per biological sample), theyuncover potential associations and interactions with a wide variety of genes, diseases, andother biological entities. The field devoted to managing, utilizing, and evaluating this datais called bioinformatics (Baxevanis and Ouellette 2005), which is sometimes described asthe intersection of biology (or biomedicine) and computer science.The growth of biological data has resulted in a correspondingly large increase in sci-entific knowledge in what biologists sometimes call the bibliome or literature of biology.This requires new approaches to dealing with the biomedical literature, which is the mainpoint of intersection between this field and that of information retrieval (IR) and relateddisciplines such as text mining.In the early part of this decade, it became apparent that this situation was ripe for a trackat the Text REtrieval Conference (TREC, www.trec.nist.gov), a challenge evaluation forIR organized by the U.S. National Institute of Standards and Technology (NIST,http://www.nist.gov/) (Voorhees and Harman 2005). Started in 1992, TREC has provided aseries of challenge evaluations and a forum for presentation of their results. TREC isorganized as an annual event at which the tasks are specified and queries and documentsare provided to participants. While TREC has historically focused most of its research ontextual documents, the field has expanded in recent years with the growth of new infor-mation needs (e.g., question-answering, cross-lingual), data types (e.g., sequence data,video) and platforms (e.g., the Web) (Hersh 2003). This special issue is devoted to theTREC Genomics Track, which ran from 2003 to 2007.

[1]  Andreas D. Baxevanis,et al.  Bioinformatics - a practical guide to the analysis of genes and proteins , 2001, Methods of biochemical analysis.

[2]  Bruce Rannala Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Andreas D. Baxevanis , B. F. Francis Ouellette , 1999 .

[3]  Paul Over,et al.  Interactivity at the Text Retrieval Conference (TREC) , 2001, Inf. Process. Manag..

[4]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[5]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[6]  HARD Track Overview in TREC 2004 , 2003 .

[7]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[8]  William R. Hersh,et al.  TREC GENOMICS Track Overview , 2003, TREC.

[9]  Ceslovas Venclovas,et al.  Assessment of progress over the CASP experiments , 2003, Proteins.

[10]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[11]  James Allan,et al.  HARD Track Overview in TREC 2003: High Accuracy Retrieval from Documents , 2003, TREC.

[12]  Joyce A. Mitchell,et al.  Gene Indexing: Characterization and Analysis of NLM's GeneRIFs , 2003, AMIA.

[13]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[14]  J. Eppig,et al.  Visualizing the Laboratory Mouse: Capturing Phenotype Information , 2004, Genetica.

[15]  Hongfang Liu,et al.  Knowledge-Intensive and Statistical Approaches to the Retrieval and Annotation of Genomics MEDLINE Citations , 2004, TREC.

[16]  Javed Mostafa,et al.  TREC 2004 Genomics Track Experiments at IUB , 2004, TREC.

[17]  Marti A. Hearst,et al.  TREC 2004 Genomics Track Overview , 2005, TREC.

[18]  A. Mobasheri,et al.  Post-genomic applications of tissue microarrays: basic research, prognostic oncology, clinical genomics and drug discovery. , 2004, Histology and histopathology.

[19]  Sumio Fujita Revisiting Again Document Length Hypotheses TREC 2004 Genomics Track Experiments at Patolis , 2004, TREC.

[20]  Preslav Nakov,et al.  BioText Team Experiments for the TREC 2004 Genomics Track , 2004, TREC.

[21]  Charles L. A. Clarke,et al.  Domain-Specific Synonym Expansion and Validation for Biomedical Information Retrieval (MultiText Experiments for TREC 2004) , 2004, TREC.

[22]  Judith A. Blake,et al.  The mouse Gene Expression Database (GXD): updates and enhancements , 2004, Nucleic Acids Res..

[23]  Alfonso Valencia,et al.  Overview of BioCreAtIvE: critical assessment of information extraction for biology , 2005, BMC Bioinformatics.

[24]  Adam Zemla,et al.  Critical assessment of methods of protein structure prediction (CASP)‐round V , 2005, Proteins.

[25]  Ellen M. Voorhees,et al.  Retrieval System Evaluation , 2005 .

[26]  C. Bult,et al.  THE MOUSE TUMOR BIOLOGY DATABASE: INTEGRATED ACCESS TO MOUSE CANCER BIOLOGY DATA , 2005, Experimental lung research.

[27]  Midori A. Harris,et al.  The Gene Ontology project , 2005 .

[28]  Jimmy J. Lin,et al.  Fusion of Knowledge-Intensive and Statistical Approaches for Retrieving and Annotating Textual Genomics Documents , 2005, TREC.

[29]  Hagit Shatkay,et al.  Applying Probabilistic Thematic Clustering for Classification in the TREC 2005 Genomics Track , 2005, TREC.

[30]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[31]  Mark Dredze,et al.  TREC 2005 Genomics Track Experiments at IBM Watson , 2005, TREC.

[32]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.

[33]  William R Hersh,et al.  The TREC 2004 genomics track categorization task: classifying full text biomedical documents , 2006, Journal of biomedical discovery and collaboration.

[34]  William R. Hersh,et al.  Reducing workload in systematic review preparation using automated citation classification. , 2006, Journal of the American Medical Informatics Association : JAMIA.

[35]  William R Hersh,et al.  Enhancing access to the Bibliome: the TREC 2004 Genomics Track , 2006, Journal of biomedical discovery and collaboration.

[36]  D. Harman,et al.  TREC: Experiment and Evaluation in Information Retrieval , 2006 .

[37]  E. Pennisi,et al.  Human Genetic Variation , 2007, Science.

[38]  E. Pennisi Breakthrough of the year. Human genetic variation. , 2007, Science.

[39]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[40]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[41]  William R. Hersh,et al.  A comparative analysis of retrieval features used in the TREC 2006 Genomics Track passage retrieval task , 2007, AMIA.

[42]  ChengXiang Zhai,et al.  An empirical study of tokenization strategies for biomedical information retrieval , 2007, Information Retrieval.

[43]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[44]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[45]  Zhiyong Lu,et al.  Evaluation of query expansion using MeSH in PubMed , 2009, Information Retrieval.

[46]  Yue Lu,et al.  An empirical study of gene synonym query expansion in biomedical information retrieval , 2008, Information Retrieval.

[47]  Michael Y. Galperin The Molecular Biology Database Collection: 2008 update , 2007, Nucleic Acids Res..

[48]  Yi Li,et al.  Exploring criteria for successful query expansion in the genomic domain , 2009, Information Retrieval.

[49]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[50]  William R. Hersh,et al.  Tasks, topics and relevance judging for the TREC Genomics Track: five years of experience evaluating biomedical text information retrieval systems , 2009, Information Retrieval.

[51]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.