Challenges to building a platform for a breast cancer risk score

Cancer has recently become the leading cause of death worldwide according to the World Health Organization. As a consequence, health authorities acknowledge the need to implement prevention and screening programs to decrease its incidence. The efficiency of these programs can be increased by targeting higher risk subsets of the population. Efficient tools capable of monitoring the population risk are therefore needed. Constraints to building cancer risk scores and impacts on the tools platform are presented. Major constraints beyond performance of a risk score concern the role of domain experts and their acceptability by end users. Readability is therefore an important criterion. It is shown that a simple k-nearest-neighbor algorithm can achieve good performance with the help of the domain expert. To illustrate this, a risk score made of only four attributes is presented for the French population.

[1]  JapkowiczNathalie,et al.  The class imbalance problem: A systematic study , 2002 .

[2]  F. Clavel-Chapelon,et al.  E3N, a French cohort study on cancer risk factors. E3N Group. Etude Epidémiologique auprès de femmes de l'Education Nationale. , 1997, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[3]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[4]  N Risch,et al.  Autosomal dominant inheritance of early‐onset breast cancer. Implications for risk prediction , 1994, Cancer.

[5]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[6]  Mitchell H Gail,et al.  Projecting individualized absolute invasive breast cancer risk in Asian and Pacific Islander American women. , 2011, Journal of the National Cancer Institute.

[7]  Laurent Brisson,et al.  Breast cancer risk score: a data mining approach to improve readability , 2011, IEEE ICDM 2011.

[8]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[9]  C K Redmond,et al.  Tamoxifen for prevention of breast cancer: report of the National Surgical Adjuvant Breast and Bowel Project P-1 Study. , 1999, Journal of the National Cancer Institute.

[10]  R. Rubin The war on cancer. , 1996, U.S. news & world report.

[11]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[12]  José Antonio Gómez-Ruiz,et al.  A combined neural network and decision trees model for prognosis of breast cancer relapse , 2003, Artif. Intell. Medicine.

[13]  J Benichou,et al.  Validation studies for models projecting the risk of invasive and total breast cancer incidence. , 1999, Journal of the National Cancer Institute.

[14]  Karla Kerlikowske,et al.  Prospective breast cancer risk prediction model for women undergoing screening mammography. , 2006, Journal of the National Cancer Institute.

[15]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[16]  Stefano Calza,et al.  Gail model for prediction of absolute risk of invasive breast cancer: independent evaluation in the Florence-European Prospective Investigation Into Cancer and Nutrition cohort. , 2006, Journal of the National Cancer Institute.

[17]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[18]  D Spiegelman,et al.  Validation of the Gail et al. model of breast cancer risk prediction and implications for chemoprevention. , 2001, Journal of the National Cancer Institute.

[19]  M. Sporn The war on cancer , 1996, The Lancet.

[20]  Hiroshi Tanaka,et al.  Comparison of Seven Algorithms to Predict Breast Cancer Survival( Contribution to 21 Century Intelligent Technologies and Bioinformatics) , 2008 .

[21]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[22]  B. Stewart,et al.  World Cancer Report , 2003 .

[23]  Ralescu Anca,et al.  ISSUES IN MINING IMBALANCED DATA SETS - A REVIEW PAPER , 2005 .

[24]  M. Gail,et al.  Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. , 1989, Journal of the National Cancer Institute.