A Big Data Analytics Framework for Supporting Multidimensional Mining over Big Healthcare Data

Nowadays, a great deal of attention is being devoted to big data analytics in complex healthcare environments. Fetal growth curves, which are a classical case of big healthcare data, are used in prenatal medicine to early detect potential fetal growth problems, estimate the perinatal outcome and promptly treat possible complications. However, the currently adopted curves and the related diagnostic techniques have been criticized because of their poor precision. New techniques, based on the idea of customized growth curves, have been proposed in literature. In this perspective, the problem of building customized or personalized fetal growth curves by means of big data techniques is discussed in this paper. The proposed framework introduces the idea of summarizing the massive amounts of (input) big data via multidimensional views on top of which well-known Data Mining methods like clustering and classification are applied. This overall defines a multidimensional mining approach, targeted to complex healthcare environments. A preliminary analysis on the effectiveness of the framework is also proposed.

[1]  Edmon Begoli,et al.  Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods , 2016, 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).

[2]  J. Gardosi,et al.  Customised antenatal growth charts , 1992, The Lancet.

[3]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[4]  Alfredo Cuzzocrea,et al.  Mining constrained frequent itemsets from distributed uncertain data , 2014, Future Gener. Comput. Syst..

[5]  Khalil Drira,et al.  A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare , 2015, Journal of Medical Systems.

[6]  Nicola G. Best,et al.  A shared component model for detecting joint and selective clustering of two diseases , 2001 .

[7]  Mario Cannataro,et al.  XAHM: an adaptive hypermedia model based on XML , 2002, SEKE '02.

[8]  Thomas Seidl,et al.  I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[9]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[10]  Mahdi Niamanesh,et al.  ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks , 2015, Big Data Res..

[11]  Lucia Vaira,et al.  Ultrasonographic Fetal Growth Charts: An Informatic Approach by Quantitative Analysis of the Impact of Ethnicity on Diagnoses Based on a Preliminary Report on Salentinian Population , 2014, BioMed research international.

[12]  N. S. Nithya,et al.  A Survey on Clustering Techniques in Medical Diagnosis , 2014 .

[13]  Sherif Sakr,et al.  Towards a Comprehensive Data Analytics Framework for Smart Healthcare Services , 2016, Big Data Res..

[14]  Mario A. Bochicchio,et al.  Creating dynamic and customized fetal growth curves using cloud computing , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[15]  Alfredo Cuzzocrea,et al.  On Managing Very Large Sensor-Network Data Using Bigtable , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[16]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[17]  Tania Cerquitelli,et al.  Exploiting clustering algorithms in a multiple-level fashion: A comparative study in the medical care scenario , 2016, Expert Syst. Appl..

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[20]  P. Deb Finite Mixture Models , 2008 .

[21]  Alfredo Cuzzocrea Analytics over Big Data: Exploring the Convergence of DataWarehousing, OLAP and Data-Intensive Cloud Infrastructures , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[22]  Ramiz M. Aliguliyev,et al.  Performance evaluation of density-based clustering methods , 2009, Inf. Sci..

[23]  Felix Wortmann,et al.  Internet of Things , 2015, Business & Information Systems Engineering.

[24]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[25]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[26]  Giorgia Buscicchio,et al.  Reference interval for fetal biometry in Italian population. , 2009, Journal of prenatal medicine.

[27]  Tania Cerquitelli,et al.  A Clustering-Based Approach to Analyse Examinations for Diabetic Patients , 2014, 2014 IEEE International Conference on Healthcare Informatics.

[28]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[29]  Jeffrey D. Ullman,et al.  Big data: a research agenda , 2013, IDEAS '13.

[30]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[31]  T. Kiserud,et al.  P14.12: Longitudinal reference charts for growth of the fetal head, abdomen and femur , 2004, European journal of obstetrics, gynecology, and reproductive biology.

[32]  Taehun Kim,et al.  A data acquisition architecture for healthcare services in mobile sensor networks , 2016, 2016 International Conference on Big Data and Smart Computing (BigComp).