Determining the familial risk distribution of colorectal cancer: a data mining approach

This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95 % confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7 % of families (SIR = 7.11; 95 % CI 6.65–7.59) had a strong family history of colorectal cancer; (2) 13 % of families (SIR = 2.94; 95 % CI 2.78–3.10) had a moderate family history of colorectal cancer; (3) 11 % of families (SIR = 1.23; 95 % CI 1.12–1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96–1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60 % of families (SIR = 0.61; 95 % CI 0.57–0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7 % of the population) was 12-times that for people in the lowest risk category (60 %) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

[1]  R. Houlston,et al.  A systematic review and meta-analysis of familial colorectal cancer risk , 2001, American Journal of Gastroenterology.

[2]  David P. Taylor,et al.  Population-based family history-specific risks for colorectal cancer: a constellation approach. , 2010, Gastroenterology.

[3]  Teuvo Kohonen,et al.  Self-Organizing Maps, Third Edition , 2001, Springer Series in Information Sciences.

[4]  M. Slattery,et al.  Family history of cancer and colon cancer risk: the Utah Population Database. , 1994, Journal of the National Cancer Institute.

[5]  L. Cannon-Albright,et al.  A comprehensive survey of cancer risks in extended families , 2012, Genetics in Medicine.

[6]  S. Gallinger,et al.  Hereditary colorectal cancer syndromes: familial adenomatous polyposis and lynch syndrome. , 2008, Surgical Clinics of North America.

[7]  P. Fain,et al.  A nonparametric test of heterogeneity of family risk , 1986, Genetic epidemiology. Supplement.

[8]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[9]  C. la Vecchia,et al.  Family history of cancer and risk of colorectal cancer in Italy. , 1998, British Journal of Cancer.

[10]  J. Hopper Disease-specific prospective family study cohorts enriched for familial risk , 2011, Epidemiologic perspectives & innovations : EP+I.

[11]  Aung Ko Win,et al.  Risk profiling: Familial colorectal cancer , 2014 .

[12]  J. Carlin,et al.  Familial aggregation of a disease consequent upon correlation between relatives in a risk factor measured on a continuous scale. , 1992, American journal of epidemiology.

[13]  G A Colditz,et al.  A prospective study of family history and the risk of colorectal cancer. , 1994, The New England journal of medicine.

[14]  J. Ferlay,et al.  Cancer Incidence in Five Continents , 1970, Union Internationale Contre Le Cancer / International Union against Cancer.

[15]  A. Zauber,et al.  Risk of colorectal cancer in the families of patients with adenomatous polyps. National Polyp Study Workgroup. , 1996, The New England journal of medicine.

[16]  Esa Alhoniemi,et al.  SOM Toolbox for Matlab 5 , 2000 .

[17]  R. Kerber,et al.  A cohort study of cancer risk in relation to family histories of cancer in the Utah population database , 2005, Cancer.

[18]  G. Launoy,et al.  Estimation of the familial relative risk of cancer by site from a French population based family study on colorectal cancer (CCREF study) , 2004, Gut.

[19]  D. Ershoff,et al.  Risk of colorectal cancer in families of patients with adenomatous polyps. , 1996, The New England journal of medicine.

[20]  M R Treat,et al.  Family History of Colorectal Adenomatous Polyps and Increased Risk for Colorectal Cancer , 1998, Annals of Internal Medicine.

[21]  M. Okano,et al.  Cohort Study , 2020, Definitions.

[22]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[23]  John D Potter,et al.  Colon Cancer Family Registry: An International Resource for Studies of the Genetic Epidemiology of Colon Cancer , 2007, Cancer Epidemiology Biomarkers & Prevention.

[24]  M H Skolnick,et al.  Systematic population-based assessment of cancer risk in first-degree relatives of cancer probands. , 1994, Journal of the National Cancer Institute.

[25]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  P. Boyle,et al.  Measures of familial aggregation depend on definition of family history: meta-analysis for colorectal cancer. , 2006, Journal of clinical epidemiology.

[28]  N. Breslow,et al.  Statistical methods in cancer research. Volume II--The design and analysis of cohort studies. , 1987, IARC scientific publications.

[29]  E. Feuer,et al.  SEER Cancer Statistics Review, 1975-2003 , 2006 .

[30]  Aung Ko Win,et al.  Risks of primary extracolonic cancers following colorectal cancer in lynch syndrome. , 2012, Journal of the National Cancer Institute.