Machine learning methods for breast cancer CADx over digital and film mammograms

This work explores the usage of machine learning classifiers (MLCs) to support breast cancer diagnosis over digital and film mammograms. Whichever the source, breast cancer datasets are costly to build, requiring the cooperation of specialists over a tedious process. Often, the choice of digital or film mammograms is limited and we need to understand the implications of using either. Our goal is to use similar data analysis methodology on both kinds of mammograms and understand the behavior of MLC on each one. We trained several MLC configurations on the Breast Cancer Digital Repository, a comprehensive annotated repository of mammograms built in this collaboration and publicly available. We show that intensive use of computer resources provides sound insights on the behaviour of MLC even with small or unbalanced datasets. This supports further decisions on the MLC models generated regarding the need for larger datasets, integration in clinical practice, etc.