论文信息 - Genome-Based Population Clustering: Nuggets of Truth Buried in a Pile of Numbers?

Genome-Based Population Clustering: Nuggets of Truth Buried in a Pile of Numbers?

National/Ethnic population Mutation databases (NEMDBs) are online mutation depositories recording extensive information about the described genetic heterogeneity in populations and ethnic groups worldwide. FINDbase ( http://www.findbase.org ) is a database containing causative mutations and pharmacogenomic markers allele frequencies in various populations and ethnic groups. In this paper, we experiment with designing and applying new automated data mining techniques on the original FINDbase causative mutations data collection in an attempt to identify genomic relationships between populations. Furthermore, we have developed an interactive web-based visualization tool that enables users to correlate the information, determine the relationships and gain insight into the underlying data collection in a novel and meaningful way.

[1] Hongan Wang,et al. Visualization of large hierarchical data by circle packing , 2006, CHI.

[2] J. Rashbass. Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[3] Vassiliki Gkantouna,et al. FINDbase: a worldwide database for genetic variation allele frequencies updated , 2011, Nucleic Acids Res..

[4] J. Kogan. Introduction to Clustering Large and High-Dimensional Data , 2007 .

[5] Inderjit S. Dhillon,et al. Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[6] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .

[7] Ricardo Baeza-Yates,et al. Modern Information Retrieval - the concepts and technology behind search, Second edition , 2011 .

[8] Milan Macek,et al. FINDbase: a relational database recording frequencies of genetic defects leading to inherited disorders worldwide , 2006, Nucleic Acids Res..

[9] Kohei Inoue,et al. Fuzzy clustering based on cooccurrence matrix and its application to data retrieval , 2001 .

[10] D. Cooper,et al. Human Gene Mutation Database , 1996, Human Genetics.

[11] Vassiliki Gkantouna,et al. Population-specific documentation of pharmacogenomic markers and their allelic frequencies in FINDbase. , 2011, Pharmacogenomics.