An environment for knowledge discovery in biology

This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported.

[1]  M Graves,et al.  A graph conceptual model for developing human genome center databases , 1996, Comput. Biol. Medicine.

[2]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  O Baujard,et al.  Trends in medical information retrieval on Internet. , 1998, Computers in biology and medicine.

[4]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[5]  Junior Barrera,et al.  Automatic Programming of Morphological Machines by PAC Learning , 2000, Fundam. Informaticae.

[6]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Y. Chen,et al.  Ratio-based decisions and the quantitative analysis of cDNA microarray images. , 1997, Journal of biomedical optics.

[8]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[9]  Junior Barrera,et al.  Time series inference from clustering , 2001, SPIE BiOS.

[10]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[11]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[12]  David G. Stork,et al.  Pattern Classification , 1973 .

[13]  Rex M. Jakobovits,et al.  A Visual Database Environment for Scientific Research , 1996, J. Vis. Lang. Comput..

[14]  Roberto Marcondes Cesar Junior,et al.  Σynergos—Synergetic Vision Research , 2001, Real-Time Systems.

[15]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[16]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[17]  Tin Wee Tan,et al.  Generation of a database containing discordant intron positions in eukaryotic genes (MIDB) , 2001, Bioinform..

[18]  O. Firschein,et al.  Syntactic pattern recognition and applications , 1983, Proceedings of the IEEE.

[19]  Christophe Blanchet,et al.  ANTHEPROT: An integrated protein sequence analysis software with client/server capabilities , 2001, Comput. Biol. Medicine.

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Robert P. W. Duin,et al.  STATISTICAL PATTERN RECOGNITION , 2005 .

[24]  Luciano da Fontoura Costa,et al.  Shape Analysis and Classification: Theory and Practice , 2000 .

[25]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[26]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[27]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[28]  Edward R. Dougherty,et al.  Simulator for gene expression networks , 2001, SPIE BiOS.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[30]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[31]  João Eduardo Ferreira,et al.  Database modularization design for the construction of flexible information systems , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[32]  C. Lotfi,et al.  Unmasking a Growth-promoting Effect of the Adrenocorticotropic Hormone in Y1 Mouse Adrenocortical Tumor Cells* , 1997, The Journal of Biological Chemistry.

[33]  Sameer Singh International Conference on Advances in Pattern Recognition , 1999, Springer London.

[34]  Junior Barrera,et al.  Microarray gridding by mathematical morphology , 2001, Proceedings XIV Brazilian Symposium on Computer Graphics and Image Processing.

[35]  K. Vrana,et al.  Fundamentals of DNA hybridization arrays for gene expression analysis. , 2000, BioTechniques.

[36]  Roberto Marcondes Cesar Junior,et al.  Feature Selection Based on Fuzzy Distances between Clusters: First Results on Simulated Data , 2001, ICAPR.

[37]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Roberto Marcondes Cesar Junior,et al.  Inference from Clustering with Application to Gene-Expression Microarrays , 2002, J. Comput. Biol..

[39]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[40]  E. Kok,et al.  The application of DNA microarrays in gene expression analysis. , 2000, Journal of biotechnology.

[41]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[42]  Junior Barrera,et al.  Segmentation of Microarray Images by Mathematical Morphology , 2002, Real Time Imaging.

[43]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[45]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.