Towards human-computer synergetic analysis of large-scale biological data

BackgroundAdvances in technology have led to the generation of massive amounts of complex and multifarious biological data in areas ranging from genomics to structural biology. The volume and complexity of such data leads to significant challenges in terms of its analysis, especially when one seeks to generate hypotheses or explore the underlying biological processes. At the state-of-the-art, the application of automated algorithms followed by perusal and analysis of the results by an expert continues to be the predominant paradigm for analyzing biological data. This paradigm works well in many problem domains. However, it also is limiting, since domain experts are forced to apply their instincts and expertise such as contextual reasoning, hypothesis formulation, and exploratory analysis after the algorithm has produced its results. In many areas where the organization and interaction of the biological processes is poorly understood and exploratory analysis is crucial, what is needed is to integrate domain expertise during the data analysis process and use it to drive the analysis itself.ResultsIn context of the aforementioned background, the results presented in this paper describe advancements along two methodological directions. First, given the context of biological data, we utilize and extend a design approach called experiential computing from multimedia information system design. This paradigm combines information visualization and human-computer interaction with algorithms for exploratory analysis of large-scale and complex data. In the proposed approach, emphasis is laid on: (1) allowing users to directly visualize, interact, experience, and explore the data through interoperable visualization-based and algorithmic components, (2) supporting unified query and presentation spaces to facilitate experimentation and exploration, (3) providing external contextual information by assimilating relevant supplementary data, and (4) encouraging user-directed information visualization, data exploration, and hypotheses formulation. Second, to illustrate the proposed design paradigm and measure its efficacy, we describe two prototype web applications. The first, called XMAS (Ex periential M icroarray A nalysis S ystem) is designed for analysis of time-series transcriptional data. The second system, called PSPACE (P rotein Spac e E xplorer) is designed for holistic analysis of structural and structure-function relationships using interactive low-dimensional maps of the protein structure space. Both these systems promote and facilitate human-computer synergy, where cognitive elements such as domain knowledge, contextual reasoning, and purpose-driven exploration, are integrated with a host of powerful algorithmic operations that support large-scale data analysis, multifaceted data visualization, and multi-source information integration.ConclusionsThe proposed design philosophy, combines visualization, algorithmic components and cognitive expertise into a seamless processing-analysis-exploration framework that facilitates sense-making, exploration, and discovery. Using XMAS, we present case studies that analyze transcriptional data from two highly complex domains: gene expression in the placenta during human pregnancy and reaction of marine organisms to heat stress. With PSPACE, we demonstrate how complex structure-function relationships can be explored. These results demonstrate the novelty, advantages, and distinctions of the proposed paradigm. Furthermore, the results also highlight how domain insights can be combined with algorithms to discover meaningful knowledge and formulate evidence-based hypotheses during the data analysis process. Finally, user studies against comparable systems indicate that both XMAS and PSPACE deliver results with better interpretability while placing lower cognitive loads on the users. XMAS is available at: http://tintin.sfsu.edu:8080/xmas. PSPACE is available at: http://pspace.info/.

[1]  Andreas Prlic,et al.  Pre-calculated protein structure alignments at the RCSB PDB website , 2010, Bioinform..

[2]  Ben Shneiderman,et al.  Visualization and analysis of microarray and gene ontology data with treemaps , 2004, BMC Bioinformatics.

[3]  Jake Chen,et al.  Biological Database Modeling , 2007 .

[4]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[5]  William Ribarsky,et al.  Interactive visual analysis of time-series microarray data , 2008, The Visual Computer.

[6]  S. Sunagawa,et al.  The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database , 2010, PloS one.

[7]  Wolfgang Taube,et al.  Designing Interactions in Event-Based Unified Management of Personal Multimedia Information , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  Ben Shneiderman,et al.  Inventing Discovery Tools: Combining Information Visualization with Data Mining1 , 2001, Inf. Vis..

[10]  Rahul Singh,et al.  XMAS: An Experiential Approach for Visualization, Analysis, and Exploration of Time Series Microarray Data , 2008, BIRD.

[11]  Rahul Singh,et al.  From Information-Centric to Experiential Environments , 2006 .

[12]  Jonathon H Stillman,et al.  A cDNA microarray analysis of the response to heat stress in hepatopancreas tissue of the porcelain crab Petrolisthes cinctipes. , 2007, Comparative biochemistry and physiology. Part D, Genomics & proteomics.

[13]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[14]  C. Rosenow Combining expression and genotyping analysis in neuropsychiatric research GeneSpring Platform Application Note , 2005 .

[15]  Ramesh C. Jain Experiential computing , 2003, CACM.

[16]  Sung-Hou Kim,et al.  Global mapping of the protein structure space and application in structure-based inference of protein function. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[18]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[19]  Kai Li,et al.  Visualization methods for statistical analysis of microarray clusters , 2005, BMC Bioinformatics.

[20]  Atul Butte,et al.  The use and analysis of microarray data , 2002, Nature Reviews Drug Discovery.

[21]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Sung-Hou Kim,et al.  A global representation of the protein fold space , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[25]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[26]  Weida Tong,et al.  ArrayTrack--supporting toxicogenomic research at the U.S. Food and Drug Administration National Center for Toxicological Research. , 2003, Environmental health perspectives.

[27]  T.P.S. Chan,et al.  An Interactive Visualization-Based Approach for High Throughput Screening Information Management in Drug Discovery , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[29]  A. Sali,et al.  Gene expression profiling of the human maternal-fetal interface reveals dramatic changes between midgestation and term. , 2007, Endocrinology.

[30]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[31]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[32]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[33]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002, Annals of the New York Academy of Sciences.

[34]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[35]  J. Kennedy,et al.  Coordinated parallel views for the exploratory analysis of microarray time-course data , 2005, Coordinated and Multiple Views in Exploratory Visualization (CMV'05).

[36]  Ben Shneiderman,et al.  Integrating Statistics and Visualization for Exploratory Power: From Long-Term Case Studies to Design Guidelines , 2009, IEEE Computer Graphics and Applications.

[37]  Rahul Singh,et al.  Study and Analysis of User Behaviour and Usage Patterns in a Unified Personal Multimedia Information Envirionment , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[38]  Zheng Li,et al.  Short time-series microarray analysis: Methods and challenges , 2008, BMC Systems Biology.

[39]  Dennis F. Galletta,et al.  Cognitive Fit: An Empirical Study of Information Acquisition , 1991, Inf. Syst. Res..

[40]  Ben Shneiderman,et al.  Dynamic querying for pattern identification in microarray and genomic data , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[41]  Margarita Osadchy,et al.  Maps of protein structure space reveal a fundamental relationship between protein structure and function , 2011, Proceedings of the National Academy of Sciences.

[42]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[43]  Rahul Singh,et al.  Multiple perspective interactive search: a paradigm for exploratory search and information retrieval on the web , 2011, Multimedia Tools and Applications.

[44]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[45]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[46]  Rahul Singh,et al.  FreeFlowDB: Storage, Querying and Interacting with Structure-Activity Information from High-Throughput Drug Discovery , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[47]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.