A UML profile for the conceptual modelling of structurally complex data: Easing human effort in the KDD process

ContextDomains where data have a complex structure requiring new approaches for knowledge discovery from data are on the increase. In such domains, the information related to each object under analysis may be composed of a very broad set of interrelated data instead of being represented by a simple attribute table. This further complicates their analysis. ObjectiveIt is becoming more and more necessary to model data before analysis in order to assure that they are properly understood, stored and later processed. On this ground, we have proposed a UML extension that is able to represent any set of structurally complex hierarchically ordered data. Conceptually modelled data are human comprehensible and constitute the starting point for automating other data analysis tasks, such as comparing items or generating reference models. MethodThe proposed notation has been applied to structurally complex data from the stabilometry field. Stabilometry is a medical discipline concerned with human balance. We have organized the model data through an implementation based on XML syntax. ResultsWe have applied data mining techniques to the resulting structured data for knowledge discovery. The sound results of modelling a domain with such complex and wide-ranging data confirm the utility of the approach. ConclusionThe conceptual modelling and the analysis of non-conventional data are important challenges. We have proposed a UML profile that has been tested on data from a medical domain, obtaining very satisfactory results. The notation is useful for understanding domain data and automating knowledge discovery tasks.

[1]  Michael J. Kamfonas Recursive Hierarchies : The Relational Taboo ! Making their use possible in dimensional models . By , 2014 .

[2]  Dajun Song,et al.  The assessment of postural stability after ambulatory anesthesia: a comparison of desflurane with propofol. , 2002, Anesthesia and analgesia.

[3]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  F O Black,et al.  Vestibulo-spinal control differs in patients with reduced versus distorted vestibular function. , 1984, Acta oto-laryngologica. Supplementum.

[5]  S. Archana,et al.  Survey of Classification Techniques in Data Mining , 2014 .

[6]  Montserrat Lázaro,et al.  Valor de la posturografía en ancianos con caídas de repetición , 2005 .

[7]  Stefano Rizzi,et al.  UML-based Conceptual Modeling of Pattern-Bases , 2004, PaRMa.

[8]  Martin Gogolla,et al.  Towards a semantic view of an extended entity-relationship model , 1991, TODS.

[9]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[10]  Hiroki Arimura,et al.  Efficient Text Mining with Optimized Pattern Discovery , 2002, CPM.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Lara Torralbo,et al.  Marco de Descubrimiento de Conocimiento para DatosEstructuralmente Complejos con Énfasis en el Análisis de Eventos en Series Temporales , 2011 .

[13]  Thomas Kudrass,et al.  Rule-Based Generation of XML Schemas from UML Class Diagrams , 2003, Berliner XML Tage.

[14]  James Martin Information engineering, planning & analysis: book 2 , 1990 .

[15]  Ramez Elmasri,et al.  A graphical data manipulation language for an extended entity-relationship model , 1990, Computer.

[16]  M. M. Naidu,et al.  An Algorithm for Classification in Data Mining Based on Classification Codes , 2007, IMECS.

[17]  Rudolf Bayer,et al.  Organization and maintenance of large ordered indexes , 1972, Acta Informatica.

[18]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[19]  Ernest Teniente,et al.  Automated reasoning on UML conceptual schemas with derived information and queries , 2013, Inf. Softw. Technol..

[20]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[21]  Johann A. Makowsky,et al.  Identifying Extended Entity-Relationship Object Structures in Relational Schemas , 1990, IEEE Trans. Software Eng..

[22]  Danilo Caivano,et al.  Assessing the influence of stereotypes on the comprehension of UML sequence diagrams: A family of experiments , 2011, Inf. Softw. Technol..

[23]  SongIl-Yeol,et al.  A UML profile for multidimensional modeling in data warehouses , 2006 .

[24]  Jian Yin,et al.  A Clustering Algorithm for Time Series Data , 2006, 2006 Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06).

[25]  Chengcui Zhang,et al.  Multimedia Data Mining for Traffic Video Sequences , 2001, MDM/KDD.

[26]  Terry Halpin,et al.  Object-Role Modeling (ORM/NIAM) , 2006, Handbook on Architectures of Information Systems.

[27]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[28]  Alan Liu,et al.  Pattern discovery of fuzzy time series for financial prediction , 2006, IEEE Transactions on Knowledge and Data Engineering.

[29]  Anneke Kleppe,et al.  MDA explained - the Model Driven Architecture: practice and promise , 2003, Addison Wesley object technology series.

[30]  Andrew Trotman,et al.  Emergent Semantic Patterns in Large Scale Image Dataset: A Datamining Approach , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[31]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[32]  R Boniver [Posture and posturography]. , 1994, Revue medicale de Liege.

[33]  Jose-Norberto Mazón,et al.  A family of experiments to validate measures for UML activity diagrams of ETL processes in data warehouses , 2010, Inf. Softw. Technol..

[34]  Il-Yeol Song,et al.  A UML profile for multidimensional modeling in data warehouses , 2006, Data Knowl. Eng..

[35]  Mark Strembeck,et al.  Modeling process-related RBAC models with extended UML activity models , 2011, Inf. Softw. Technol..

[36]  Andreas Zell,et al.  Optimal assignment kernels for attributed molecular graphs , 2005, ICML.

[37]  J. Eisman,et al.  Identification of High‐Risk Individuals for Hip Fracture: A 14‐Year Prospective Study , 2005, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[38]  Gaddam Saidi Reddy,et al.  Clustering Algorithm with a Novel Similarity Measure , 2012 .

[39]  Juan Alfonso Lara,et al.  Generating time series reference models based on event analysis , 2010, ECAI.

[40]  Francesco Di Tria,et al.  Hybrid methodology for data warehouse conceptual design by UML schemas , 2012, Inf. Softw. Technol..

[41]  Juan Alfonso Lara,et al.  Comparing Posturographic Time Series through Events Detection , 2008, 2008 21st IEEE International Symposium on Computer-Based Medical Systems.

[42]  Hermann Ney,et al.  Automatic categorization of medical images for content-based retrieval and data mining. , 2005, Computerized medical imaging and graphics : the official journal of the Computerized Medical Imaging Society.

[43]  Thomas A. Bruce,et al.  Designing Quality Databases With IDEF1X Information Models , 1991 .

[44]  Isabelle Comyn-Wattiau,et al.  A UML-based data warehouse design method , 2006, Decis. Support Syst..

[45]  F. Owen Black,et al.  Postural Control in Four Classes of Vestibular Abnormalities1 , 1985 .

[46]  Juan Trujillo,et al.  A UML 2.0 profile to design Association Rule mining models in the multidimensional conceptual modeling of data warehouses , 2007, Data Knowl. Eng..

[47]  Viyada Raiva,et al.  Age and gender effects on postural stability and static balance in Thai community dwelling adults. , 2004, Journal of the Medical Association of Thailand = Chotmaihet thangphaet.

[48]  Eladio Domínguez,et al.  Evolution of XML schemas and documents from stereotyped UML class models: A traceable approach , 2011, Inf. Softw. Technol..

[49]  Elisa Bertino,et al.  Towards a Logical Model for Patterns , 2003, ER.

[50]  J. M. Ronda,et al.  Asociación entre síntomas clínicos y resultados de la posturografía computarizada dinámica , 2002 .

[51]  Martin Gogolla,et al.  Comprehensive two-level analysis of role-based delegation and revocation policies with UML and OCL , 2012, Inf. Softw. Technol..

[52]  Eduardo Martín Sanz,et al.  Vértigo paroxístico benigno infantil: categorización y comparación con el vértigo posicional paroxístico benigno del adulto , 2007 .

[53]  Kimiaki Shirahama,et al.  Video data mining: mining semantic patterns with temporal constraints from movies , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[54]  Jesús Pardillo,et al.  A UML profile for the conceptual modelling of data-mining with time-series in data warehouses , 2009, Inf. Softw. Technol..

[55]  Kenton R Kaufman,et al.  Significant reduction in risk of falls and back pain in osteoporotic-kyphotic women through a Spinal Proprioceptive Extension Exercise Dynamic (SPEED) program. , 2005, Mayo Clinic proceedings.

[56]  Charles W. Bachman,et al.  Data structure diagrams , 1969, DATB.

[57]  T. J. Teorey,et al.  A logical design methodology for relational databases using the extended entity-relationship model , 1986, CSUR.