Multidimensional Mining over Big Healthcare Data: A Big Data Analytics Framework

Nowadays, a great deal of attention is being devoted to big data analytics in complex healthcare environments. Fetal growth curves, which are a classical case of big healthcare data, are used in prenatal medicine to early detect potential fetal growth problems, estimate the perinatal outcome and promptly treat possible complications. However, the currently adopted curves and the related diagnostic techniques have been criticized because of their poor precision. New techniques, based on the idea of customized growth curves, have been proposed in literature. In this perspective, the problem of building customized or personalized fetal growth curves by means of big data techniques is discussed in this paper. The proposed framework introduces the idea of summarizing the massive amounts of (input) big data via multidimensional views on top of which well-known Data Mining methods like clustering and classification are applied. This overall defines a multidimensional mining approach, targeted to complex healthcare environments. A preliminary analysis on the effectiveness of the framework is also proposed. Keywords—Mining Big Data; Big Healthcare Data; Healthcare Systems.

[1]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[2]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[3]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[4]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[5]  Mario A. Bochicchio,et al.  Creating dynamic and customized fetal growth curves using cloud computing , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[6]  Giorgia Buscicchio,et al.  Reference interval for fetal biometry in Italian population. , 2009, Journal of prenatal medicine.

[7]  Tania Cerquitelli,et al.  A Clustering-Based Approach to Analyse Examinations for Diabetic Patients , 2014, 2014 IEEE International Conference on Healthcare Informatics.

[8]  N. S. Nithya,et al.  A Survey on Clustering Techniques in Medical Diagnosis , 2014 .

[9]  Edmon Begoli,et al.  Real-Time Discovery Services over Large, Heterogeneous and Complex Healthcare Datasets Using Schema-Less, Column-Oriented Methods , 2016, 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService).

[10]  Sherif Sakr,et al.  Towards a Comprehensive Data Analytics Framework for Smart Healthcare Services , 2016, Big Data Res..

[11]  Alfredo Cuzzocrea,et al.  On Managing Very Large Sensor-Network Data Using Bigtable , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[12]  T Chard,et al.  Customised antenatal growth charts , 1992, The Lancet.

[13]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[14]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[15]  Lucia Vaira,et al.  Ultrasonographic Fetal Growth Charts: An Informatic Approach by Quantitative Analysis of the Impact of Ethnicity on Diagnoses Based on a Preliminary Report on Salentinian Population , 2014, BioMed research international.

[16]  Nicola G. Best,et al.  A shared component model for detecting joint and selective clustering of two diseases , 2001 .

[17]  Jeffrey D. Ullman,et al.  Big data: a research agenda , 2013, IDEAS '13.

[18]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[19]  Daqiang Zhang,et al.  Internet of Things , 2012, J. Univers. Comput. Sci..

[20]  Tania Cerquitelli,et al.  Exploiting clustering algorithms in a multiple-level fashion: A comparative study in the medical care scenario , 2016, Expert Syst. Appl..

[21]  Mario Cannataro,et al.  XAHM: an adaptive hypermedia model based on XML , 2002, SEKE '02.

[22]  Alfredo Cuzzocrea,et al.  Mining constrained frequent itemsets from distributed uncertain data , 2014, Future Gener. Comput. Syst..

[23]  Khalil Drira,et al.  A Semantic Big Data Platform for Integrating Heterogeneous Wearable Data in Healthcare , 2015, Journal of Medical Systems.

[24]  Thomas Seidl,et al.  I-HASTREAM: Density-Based Hierarchical Clustering of Big Data Streams and Its Application to Big Graph Analytics Tools , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[25]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[26]  Mahdi Niamanesh,et al.  ScaDiPaSi: An Effective Scalable and Distributable MapReduce-Based Method to Find Patient Similarity on Huge Healthcare Networks , 2015, Big Data Res..

[27]  Alfredo Cuzzocrea Analytics over Big Data: Exploring the Convergence of DataWarehousing, OLAP and Data-Intensive Cloud Infrastructures , 2013, 2013 IEEE 37th Annual Computer Software and Applications Conference.

[28]  Ramiz M. Aliguliyev,et al.  Performance evaluation of density-based clustering methods , 2009, Inf. Sci..

[29]  T. Kiserud,et al.  P14.12: Longitudinal reference charts for growth of the fetal head, abdomen and femur , 2004, European journal of obstetrics, gynecology, and reproductive biology.

[30]  Taehun Kim,et al.  A data acquisition architecture for healthcare services in mobile sensor networks , 2016, 2016 International Conference on Big Data and Smart Computing (BigComp).