On information organization and information extraction for the study of gene expressions by tissue microarray technique

Genomic expression studies are the means of depicting molecular profiles characterizing specific disease states. Microarrays allow the tracking and the translation of genome sequences into gene functions, leading to the identification of highly informative genes and pathways with a potential impact on understanding disease development and progression. These technologies concurrently may improve diagnostic and treatment modalities and the detection of novel therapeutic targets. Expression array technology is dramatically expanding the amount of data available on many disease states. These studies typically involve many researchers with different backgrounds, each contributing to some steps of the entire process. In particular, Tissue Microarray technology allows for high-throughput expression profiling of tumor samples by evaluating potentially interesting candidate genes and proteins on a large number of well-characterized tumors, providing information on a population basis. High quality experimental data production is extremely important for the reliability of data analysis. Critical assessment of experimental design and organization and reliability assessment of experimental data together with data preprocessing need to be addressed. A technological approach is also advisable to properly manage data heterogeneity, data quantity and user diversity. The focus of this thesis is to develop a systematic approach to processing and better understanding data generated from Tissue Microarray technology, overcoming the limitations of other current approaches. This thesis addresses Tissue Microarray data collection and organization, enhancing data sharing, usability, and process automation. We faced preprocessing issues, identifying critical points and some solutions. We also focused on a specific issue in data classification, proposing a novel classification model based on a Bayesian hierarchical approach, able to handle data uncertainty. Three Tissue Microarray experiments are presented as case studies with the purpose of providing real world examples to illustrate some of the critical points made in this thesis.

[1]  David J. Foran,et al.  A prototype for unsupervised analysis of tissue microarrays for cancer research and diagnostics , 2004, IEEE Transactions on Information Technology in Biomedicine.

[2]  T. Barrette,et al.  Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. , 2002, Cancer research.

[3]  Jules J. Berman,et al.  The tissue microarray data exchange specification: A community-based, open source tool for sharing tissue microarray data , 2003, BMC Medical Informatics Decis. Mak..

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Kononen,et al.  Tissue microarrays for high-throughput molecular profiling of tumor specimens , 1998, Nature Medicine.

[6]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[7]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[8]  Manish Parashar,et al.  Engineering a peer-to-peer collaboratory for tissue microarray research , 2004, Proceedings of the Second International Workshop on Challenges of Large Applications in Distributed Environments, 2004. CLADE 2004..

[9]  F. Demichelis,et al.  The virtual case: a new method to completely digitize cytological and histological slides , 2002, Virchows Archiv.

[10]  A. D. Dei Tos,et al.  Cyclin D3 expression in normal, reactive and neoplastic tissues , 1998, The Journal of pathology.

[11]  Greg Yothers,et al.  Real-world performance of HER2 testing--National Surgical Adjuvant Breast and Bowel Project experience. , 2002, Journal of the National Cancer Institute.

[12]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[13]  T. Barrette,et al.  ONCOMINE: a cancer microarray database and integrated data-mining platform. , 2004, Neoplasia.

[14]  O. Kallioniemi,et al.  Tissue microarray technology for high-throughput molecular profiling of cancer. , 2001, Human molecular genetics.

[15]  Axel Hoos,et al.  Tissue Microarray Profiling of Cancer Specimens and Cell Lines: Opportunities and Limitations , 2001, Laboratory Investigation.

[16]  E. F. Codd,et al.  The Relational Model for Database Management, Version 2 , 1990 .

[17]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[18]  Lin Yang,et al.  Novel relational database for tissue microarray analysis. , 2009, Archives of pathology & laboratory medicine.

[19]  Kenneth A. Fleming,et al.  EDITORIAL. EVIDENCE‐BASED PATHOLOGY , 1996 .

[20]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[21]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[22]  Manuel Salto-Tellez,et al.  Evaluation of HER-2/neu oncogene status in breast tumors on tissue microarrays. , 2003, Human pathology.

[23]  Ash A. Alizadeh,et al.  Software tools for high-throughput analysis and archiving of immunohistochemistry staining data obtained with tissue microarrays. , 2002, The American journal of pathology.

[24]  Michael I. Jordan,et al.  Robust Sparse Hyperplane Classifiers: Application to Uncertain Molecular Profiling Data , 2004, J. Comput. Biol..

[25]  P. Park Gene Expression Data and Survival Analysis , 2005 .

[26]  G. Viale,et al.  p63, a p53 Homologue, Is a Selective Nuclear Marker of Myoepithelial Cells of the Human Breast , 2001, The American journal of surgical pathology.

[27]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[28]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[30]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[31]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[32]  D. Rimm,et al.  Automated subcellular localization and quantification of protein expression in tissue microarrays , 2002, Nature Medicine.

[33]  Paola Sebastiani,et al.  Statistical Challenges in Functional Genomics , 2003 .

[34]  Arul M Chinnaiyan,et al.  Multiplex biomarker approach for determining risk of prostate-specific antigen-defined recurrence of prostate cancer. , 2003, Journal of the National Cancer Institute.

[35]  H. Moch,et al.  Tissue microarray (TMA) technology: miniaturized pathology archives for high‐throughput in situ studies , 2001, The Journal of pathology.

[36]  K. Fleming,et al.  Evidence-based pathology , 1997, Evidence Based Medicine.

[37]  M. Rubin,et al.  Relational database structure to manage high-density tissue microarray data and images for pathology studies focusing on clinical outcome: the prostate specialized program of research excellence model. , 2001, The American journal of pathology.

[38]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[39]  Daniel B. Martin,et al.  Quantitative Proteomic Analysis of Proteins Released by Neoplastic Prostate Epithelium , 2004, Cancer Research.

[40]  A. Sboner,et al.  Large scale TMA experiments : automation and data management , 2004 .

[41]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[42]  N M Luscombe,et al.  What is Bioinformatics? A Proposed Definition and Overview of the Field , 2001, Methods of Information in Medicine.

[43]  C. Li,et al.  Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Kimberly F. Johnson Methods of Microarray Data Analysis II , 2002, Springer US.

[45]  Mark A. Rubin,et al.  Quantitative determination of expression of the prostate cancer protein alpha-methylacyl-CoA racemase using automated quantitative analysis (AQUA): a novel paradigm for automated and continuous biomarker measurements. , 2004, The American journal of pathology.

[46]  E. Lander,et al.  A molecular signature of metastasis in primary solid tumors , 2003, Nature Genetics.

[47]  E Mahlamäki,et al.  Hormone therapy failure in human prostate cancer: analysis by complementary DNA and tissue microarrays. , 1999, Journal of the National Cancer Institute.

[48]  M. Rubin,et al.  Tissue microarray assessment of prostate cancer tumor proliferation in African- American and white men. , 2000, Journal of the National Cancer Institute.

[49]  D. Rimm,et al.  Validation of Tissue Microarray Technology in Breast Carcinoma , 2000, Laboratory Investigation.

[50]  John T. Wei,et al.  Integrative genomic and proteomic analysis of prostate cancer reveals signatures of metastatic progression. , 2005, Cancer cell.

[51]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[52]  E. Devilard,et al.  Identification of TCL1A as an immunohistochemical marker of adverse outcome in diffuse large B-cell lymphomas. , 2005, International journal of oncology.

[53]  D G Altman,et al.  Survival Analysis Part IV: Further concepts and methods in survival analysis , 2003, British Journal of Cancer.

[54]  W. Sellers,et al.  Overexpression, Amplification, and Androgen Regulation of TPD52 in Prostate Cancer , 2004, Cancer Research.

[55]  M. Rubin,et al.  Neuroendocrine expression in metastatic prostate cancer: evaluation of high throughput tissue microarrays to detect heterogeneous protein expression. , 2000, Human pathology.

[56]  Stephen M. Hewitt,et al.  Post-analysis follow-up and validation of microarray experiments , 2002, Nature Genetics.

[57]  P. Brown,et al.  Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[58]  S. Varambally,et al.  JAGGED1 Expression Is Associated with Prostate Cancer Metastasis and Recurrence , 2004, Cancer Research.

[59]  Ivar Jacobson,et al.  The Unified Software Development Process , 1999 .

[60]  D. Slamon,et al.  Frozen tumor tissue microarray technology for analysis of tumor RNA, DNA, and proteins. , 2001, The American journal of pathology.

[61]  H. Moch,et al.  High-throughput tissue microarray analysis to evaluate genes uncovered by cDNA microarray screening in renal cell carcinoma. , 1999, The American journal of pathology.

[62]  Michael W Kattan,et al.  Judging new markers by their ability to improve predictive accuracy. , 2003, Journal of the National Cancer Institute.

[63]  S. Fox,et al.  Bcl-2 and p53 expression in node-negative breast carcinoma: a study with long-term follow-up. , 1996, Human pathology.

[64]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[65]  S. Dhanasekaran,et al.  Delineation of prognostic biomarkers in prostate cancer , 2001, Nature.

[66]  D.,et al.  Regression Models and Life-Tables , 2022 .

[67]  I. Shih The role of CD146 (Mel‐CAM) in biology and pathology , 1999, The Journal of pathology.

[68]  Ronald Simon,et al.  Tissue microarray (TMA) applications: implications for molecular medicine , 2003, Expert Reviews in Molecular Medicine.

[69]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[70]  A. Le Bivic,et al.  Identification of CD146 as a component of the endothelial junction involved in the control of cell-cell cohesion. , 2001, Blood.

[71]  Jun Chen,et al.  Joint analysis of two microarray gene-expression data sets to select lung adenocarcinoma marker genes , 2004, BMC Bioinformatics.

[72]  K. Pienta,et al.  Rapid ("warm") autopsy study for procurement of metastatic prostate cancer. , 2000, Clinical cancer research : an official journal of the American Association for Cancer Research.

[73]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[74]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[75]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[76]  W. Gerald,et al.  Gene expression profiling predicts clinical outcome of prostate cancer. , 2004, The Journal of clinical investigation.

[77]  K. Pienta,et al.  Tissue Microarray Sampling Strategy for Prostate Cancer Biomarker Analysis , 2002, The American journal of surgical pathology.

[78]  Xueli Liu,et al.  Statistical Methods for Analyzing Tissue Microarray Data , 2004, Journal of biopharmaceutical statistics.

[79]  D L Rimm,et al.  Amplification of tissue by construction of tissue microarrays. , 2001, Experimental and molecular pathology.

[80]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[81]  H. Moch,et al.  Tissue microarrays for rapid linking of molecular changes to clinical endpoints. , 2001, The American journal of pathology.

[82]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[83]  Mark A Rubin,et al.  Prospective evaluation of AMACR (P504S) and basal cell markers in the assessment of routine prostate needle biopsy specimens. , 2004, Human pathology.

[84]  H. Battifora The multitumor (sausage) tissue block: novel method for immunohistochemical antibody testing. , 1986, Laboratory investigation; a journal of technical methods and pathology.

[85]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[86]  Stefano Forti,et al.  Digital Pathology: Science Fiction? , 2000, International journal of surgical pathology.

[87]  John McCafferty,et al.  Expression profiling by high-throughput immunohistochemistry. , 2004, Journal of immunological methods.

[88]  G. Parmigiani,et al.  Web-based tissue microarray image data analysis: initial validation testing through prostate cancer Gleason grading. , 2001, Human pathology.

[89]  B. Stein,et al.  Immunoperoxidase localization of prostatic antigens. Comparison of primary and metastatic sites. , 1984, Urology.

[90]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[91]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[92]  R. Verhaak,et al.  Prognostically useful gene-expression profiles in acute myeloid leukemia. , 2004, The New England journal of medicine.