Automated Selection of Relevant Information for Notification of Incident Cancer Cases within a Multisource Cancer Registry

OBJECTIVE The aim of this study was to develop and evaluate a selection algorithm of relevant records for the notification of incident cases of cancer on the basis of the individual data available in a multi-source information system. METHODS This work was conducted on data for the year 2008 in the general cancer registry of Poitou-Charentes region (France). The selection algorithm hierarchizes information according to its level of relevance for tumoral topography and tumoral morphology independently. The selected data are combined to form composite records. These records are then grouped in respect with the notification rules of the International Agency for Research on Cancer for multiple primary cancers. The evaluation, based on recall, precision and F-measure confronted cases validated manually by the registry's physicians with tumours notified with and without records selection. RESULTS The analysis involved 12,346 tumours validated among 11,971 individuals. The data used were hospital discharge data (104,474 records), pathology data (21,851 records), healthcare insurance data (7508 records) and cancer care centre's data (686 records). The selection algorithm permitted performances improvement for notification of tumour topography (F-measure 0.926 with vs. 0.857 without selection) and tumour morphology (F-measure 0.805 with vs. 0.750 without selection). CONCLUSION These results show that selection of information according to its origin is efficient in reducing noise generated by imprecise coding. Further research is needed for solving the semantic problems relating to the integration of heterogeneous data and the use of non-structured information.

[1]  Paolo Crosignani,et al.  Comparison with manual registration reveals satisfactory completeness and efficiency of a computerized cancer registration system , 2008, J. Biomed. Informatics.

[2]  A. Schott,et al.  Breast cancer incidence using administrative data: correction with sensitivity and specificity. , 2009, Journal of clinical epidemiology.

[3]  C. Muir,et al.  International Classification of Diseases for Oncology , 1990 .

[4]  Sandro Tognazzo,et al.  Probabilistic classifiers and automated cancer registration: An exploratory application , 2009, J. Biomed. Informatics.

[5]  April Fritz,et al.  International Classification of Diseases for Oncology: ICD-0. , 2000 .

[6]  Jacques Ferlay,et al.  International rules for multiple primary cancers. , 2005, Asian Pacific journal of cancer prevention : APJCP.

[7]  A. Schott,et al.  [Critical analysis of French DRG based information system (PMSI) databases for the epidemiology of cancer: a longitudinal approach becomes possible]. , 2011, Revue d'epidemiologie et de sante publique.

[8]  A Burgun,et al.  Automated Classification of Free-text Pathology Reports for Registration of Incident Cases of Cancer , 2011, Methods of Information in Medicine.

[9]  G. Tagliabue,et al.  Consistency and accuracy of diagnostic cancer codes generated by automated registration: comparison with manual registration , 2006, Population health metrics.

[10]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[11]  J. Ferlay,et al.  Cancer Incidence in Five Continents , 1970, Union Internationale Contre Le Cancer / International Union against Cancer.

[12]  L. Simonato,et al.  Automated Data Collection in Cancer Registration , 1998 .

[13]  N. Kreiger,et al.  Cancer registration in Ontario: a computer approach. , 1991, IARC scientific publications.

[14]  N. Bossard,et al.  Analyse critique des données du PMSI pour l’épidémiologie des cancers : une approche longitudinale devient possible , 2011 .

[15]  P. Zambon,et al.  Quality control of automatically defined cancer cases by the automated registration system of the Venetian Tumour Registry. Quality control of cancer cases automatically registered. , 2005, European journal of public health.

[16]  Cancer incidence in five continents. Volume VIII. , 2002, IARC scientific publications.