Cancer research is rudimentary research which is done to identify causes and develop strategies for prevention, diagnosis, treatment and cure. An optimized solution for the better treatment of cancer and toxicity minimization on the cancer patient is performed by identifying the exact type of tumor. A clear cancer classification analysis system is required to get a clear picture on the insight of a problem. A systematic approach to analyze global gene expression is followed for identifying exact problem area. Molecular diagnostics provide a promising option of systematic human cancer classification. But these types of tests are not mostly applied because characteristics molecular markers have yet to be identified for most solid tumors. Recently, DNA micro-array based tumor gene expression profiles have been used for cancer diagnosis. In the proposed system, gene expressions are taken from multiple sources and an ontological store is created. Ant colony optimization technique is used to analyze the cluster of data with attribute match association rule for detecting cancer using the acquired knowledge. Keywords: Gene expression, cancer cells, ontological store; I. Introduction Data mining is the widely used technique to obtain the knowledge data from the existing history of data. The large collection of data sets is stored in the database and using the concept of mining we can obtain the knowledge data. The large collection of data bases is known as warehouse. Data mining technique has its applications in the field of computer science, statistics, and artificial intelligence and in many other fields. Data mining technique have several stages such as pre-processing, mining and validation stages. The data pre-processing is the first stage in which the data sets collected are arranged in the proper structure or format suitable for the mining process. The unwanted, unambiguous and redundant data are removed from the repository and proper structured data base is created. The removal of unwanted data is known as data cleaning. The second stage is the mining stage in which the actual work is done. Various mining techniques are used such as clustering, pattern matching, regression, classification, etc. to obtain the knowledge data which are previously unknown data. The third and final stage is validation stage in which the results obtained from mining process is validated. The result produced by the mining process is not always prone to be correct therefore the results obtained should be validated. In the study of human genetics, sequence mining helps address the important goal of understanding the mapping relationship between the inter-individual variations in human DNA sequence and the variability in disease prediction. In simple words, it aims to find out how the changes in an individual's DNA sequence affects the risks of developing common diseases such as cancer, which is of great importance to improve the methods of detecting, preventing, and handling these diseases.
[1]
Vladimir Pavlovic,et al.
RankGene: identification of diagnostic genes based on expression data
,
2003,
Bioinform..
[2]
Laurent Brisson,et al.
Mining Gene Expression Data using Domain Knowledge
,
2008,
Int. J. Softw. Informatics.
[3]
S. Dudoit,et al.
Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data
,
2002
.
[4]
Richard Preziosi,et al.
The estimation of the genetic correlation: the use of the jackknife
,
1994,
Heredity.
[5]
Khalid Raza,et al.
A Novel Anticlustering Filtering Algorithm for the Prediction of Genes as a Drug Target
,
2012,
ArXiv.
[6]
Friedrich Leisch,et al.
Jackknife distances for clustering time – course gene expression data
,
2006
.
[7]
Laurent Brisson,et al.
An Ontology Driven Data Mining Process
,
2008,
ICEIS.
[8]
Aidong Zhang,et al.
Cluster analysis for gene expression data: a survey
,
2004,
IEEE Transactions on Knowledge and Data Engineering.
[9]
Microarray cluster analysis and applications Instructor : Prof
,
2003
.