Classification of auditory brainstem responses through symbolic pattern discovery

INTRODUCTION Numeric time series are present in a very wide range of domains, including many branches of medicine. Data mining techniques have proved to be useful for knowledge discovery in this type of data and for supporting decision-making processes. OBJECTIVES The overall objective is to classify time series based on the discovery of frequent patterns. These patterns will be discovered in symbolic sequences obtained from the time series data by means of a temporal abstraction process. METHODS Firstly, we transform numeric time series into symbolic time sequences, where the symbols aim to represent the relevant domain concepts. These symbols can be defined using either public or expert domain knowledge. Then we apply a symbolic pattern discovery technique to the output symbolic sequences. This technique identifies the subsequences frequently found in a population group. These subsequences (patterns) are representative of population groups. Finally, we employ a classification technique based on the identified patterns in order to classify new individuals. Thanks to the inclusion of domain knowledge, the classification results can be explained using domain terminology. This makes the results easier to interpret for the domain specialist (physician). RESULTS This method has been applied to brainstem auditory evoked potentials (BAEPs) time series. Preliminary experiments were carried out to analyse several aspects of the method including the best configuration of the pattern discovery technique parameters. We then applied the method to the BAEPs of 83 individuals belonging to four classes (healthy, conductive hearing loss, vestibular schwannoma-brainstem involvement and vestibular schwannoma-8th-nerve involvement). According to the results of the cross-validation, overall accuracy was 99.4%, sensitivity (recall) was 97.6% and specificity was 100% (no false positives). CONCLUSION The proposed method effectively reduces dimensionality. Additionally, if the symbolic transformation includes the right domain knowledge, the method arguably outputs a data representation that denotes the relevant domain concepts more clearly. The method is capable of finding patterns in BAEPs time series and is very accurate at correctly predicting whether or not new patients have an auditory-related disorder.

[1]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[2]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[3]  Balaji Padmanabhan,et al.  A Belief-Driven Method for Discovering Unexpected Patterns , 1998, KDD.

[4]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[5]  Christopher C. Yang,et al.  Intelligent healthcare informatics in big data era , 2015, Artif. Intell. Medicine.

[6]  Dimitrios Gunopulos,et al.  A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams , 2005, PAKDD.

[7]  J. Eggermont,et al.  The auditory brainstem response , 1990 .

[8]  Yunhao Liu,et al.  Indexable PLA for Efficient Similarity Search , 2007, VLDB.

[9]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[10]  Amir-Masoud Eftekhari-Moghadam,et al.  Knowledge discovery in medicine: Current issue and future trend , 2014, Expert Syst. Appl..

[11]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Juan Pedro Caraça-Valente,et al.  Discovering similar patterns in time series , 2000, KDD '00.

[13]  Ming-Tat Ko,et al.  Discovering time-interval sequential patterns in sequence databases , 2003, Expert Syst. Appl..

[14]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[15]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[16]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Juan Carlos Augusto,et al.  Temporal reasoning for decision support in medicine , 2005, Artif. Intell. Medicine.

[19]  Daniel P. Siewiorek,et al.  Generalized feature extraction for structural pattern recognition in time-series data , 2001 .

[20]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[21]  Riccardo Bellazzi,et al.  Predictive data mining in clinical medicine: a focus on selected methods and applications , 2011, WIREs Data Mining Knowl. Discov..

[22]  Dimitrios Gunopulos,et al.  Mining Time Series Data , 2005, Data Mining and Knowledge Discovery Handbook.

[23]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[24]  Xindong Wu,et al.  Mining Sequential Patterns across Time Sequences , 2007, New Generation Computing.

[25]  Eamonn J. Keogh,et al.  Time series shapelets: a novel technique that allows accurate, interpretable and fast classification , 2010, Data Mining and Knowledge Discovery.

[26]  Milos Hauskrecht,et al.  Mining recent temporal patterns for event detection in multivariate time series data , 2012, KDD.

[27]  Alfredo Pulvirenti,et al.  Similarity Measures and Dimensionality Reduction Techniques for Time Series Data Mining , 2012 .

[28]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[29]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[30]  Giuseppe Psaila,et al.  Querying Shapes of Histories , 1995, VLDB.

[31]  Ting-jie Lv,et al.  A novel pattern extraction method for time series classification , 2009 .

[32]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[33]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[34]  Yuval Shahar,et al.  Classification of multivariate time series via temporal abstraction and time intervals mining , 2015, Knowledge and Information Systems.

[35]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[36]  Cheng Zhang,et al.  Biomedical text mining and its applications in cancer research , 2013, J. Biomed. Informatics.

[37]  Mathieu Guillame-Bert,et al.  Learning Temporal Association Rules on Symbolic Time Sequences , 2012, ACML.

[38]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[39]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[40]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[41]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[42]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[43]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[44]  Fernando Alonso,et al.  Cooperation between expert knowledge and data mining discovered knowledge: Lessons learned , 2012, Expert Syst. Appl..

[45]  Jiawei Han,et al.  Mining Segment-Wise Periodic Patterns in Time-Related Databases , 1998, KDD.

[46]  Aurora Pérez,et al.  Cooperation Between Expert Knowledge and Data Mining Discovered Knowledge , 2011 .

[47]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.