A Mining Technique Using n-Grams and Motion Transcripts for Body Sensor Network Data Repository Togetefficientuseoflargeamountsofbodysensordata,theauthorsrepresenthuman movement data using clustering, and they propose a technique to analyze sensed physiological signals.

Recent years have witnessed a large influx of applications in the field of cyber-physical systems. An impor- tant class of these systems is body sensor networks (BSNs) where lightweight embedded processors and communication systems are tightly coupled with the human body. BSNs can provide researchers, care providers and clinicians access to tremendously valuable information extracted from data that are collected in users' natural environment. With this informa- tion, one can monitor the progression of a disease, identify its early onset, or simply assess user's wellness. One major obstacle is managing repositories that store the large amount of sensing data. To address this issue, we propose a data mining approach inspired by the experience in the areas of text and natural language processing. We represent sensor read- ings with a sequence of characters, called motion transcripts. Transcripts reduce complexity of the data significantly while maintaining morphological and structural properties of the physiological signals. To further take advantage of the phys- iological signal's structure, our data mining technique focuses on the characteristic transitions in the signals. These transi- tions are efficiently captured using the concept of n-grams. To facilitate a lightweight and fast mining approach, we reduce the overwhelmingly large number of n-grams via information gain (IG) feature selection. We report the effectiveness of the proposed approach in terms of the speed of mining while maintaining an acceptable accuracy in terms of the F-score combining both precision and recall.

[1]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[2]  N. Stergiou Innovative Analyses of Human Movement , 2003 .

[3]  Igor Kononenko,et al.  Semi-Naive Bayesian Classifier , 1991, EWSL.

[4]  Kiyoshi Yamaoka,et al.  Application of Akaike's information criterion (AIC) in the evaluation of linear pharmacokinetic equations , 1978, Journal of Pharmacokinetics and Biopharmaceutics.

[5]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[6]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[7]  Matt Welsh,et al.  Sensor networks for medical care , 2005, SenSys '05.

[8]  Paul A. Viola,et al.  Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade , 2001, NIPS.

[9]  Luca Benini,et al.  Bio-feedback system for rehabilitation based on a wireless body area network , 2006, Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW'06).

[10]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[11]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[12]  Reinhold Orglmeister,et al.  Posture and Motion Detection Using Acceleration Data for Context Aware Sensing in Personal Healthcare Systems , 2009 .

[13]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[14]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[15]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[16]  H. P. Friedman,et al.  On Some Invariant Criteria for Grouping Data , 1967 .

[17]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Ayumi Shinohara,et al.  Discovering Best Variable-Length-Don't-Care Patterns , 2002, Discovery Science.

[19]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[20]  Kunihiko Sadakane,et al.  Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array , 2000, ISAAC.

[21]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[22]  J. Kent Information gain and a general measure of correlation , 1983 .

[23]  Hassan Ghasemzadeh,et al.  Collaborative signal processing for action recognition in body sensor networks: a distributed classification algorithm using motion transcripts , 2010, IPSN '10.

[24]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[25]  Wu Chou,et al.  Decision tree state tying based on penalized Bayesian information criterion , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Xiaoming Jin,et al.  Indexing and Mining of the Local Patterns in Sequence Database , 2002, IDEAL.

[28]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[29]  Claude E. Shannon,et al.  Communication theory of secrecy systems , 1949, Bell Syst. Tech. J..

[30]  Yoram Singer,et al.  Beyond Word N-Grams , 1996, VLC@ACL.

[31]  Bhavani M. Thuraisingham,et al.  A scalable multi-level feature extraction technique to detect malicious executables , 2007, Inf. Syst. Frontiers.

[32]  I. K. Sethi,et al.  Hierarchical Classifier Design Using Mutual Information , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Guang-Zhong Yang,et al.  FROM IMAGING NETWORKS TO BEHAVIOR PROFILING: UBIQUITOUS SENSING FOR MANAGED HOMECARE OF THE ELDERLY , 2005 .

[34]  Martin Vingron,et al.  q-gram based database searching using a suffix array (QUASAR) , 1999, RECOMB.

[35]  Mohamed Kamel,et al.  Adaptive fuzzy k-NN classifier for EMG signal decomposition. , 2006, Medical engineering & physics.

[36]  Joseph A. Paradiso,et al.  A Distributed Wearable, Wireless Sensor System for Evaluating Professional Baseball Pitchers and Batters , 2009, 2009 International Symposium on Wearable Computers.

[37]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[38]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[39]  Bernhard Pfahringer,et al.  Locally Weighted Naive Bayes , 2002, UAI.

[40]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[41]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[42]  Hynek Hermansky,et al.  Segmentation of speech for speaker and language recognition , 2003, INTERSPEECH.

[43]  Enrique Vidal,et al.  Computation of Normalized Edit Distance and Applications , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Peter H Veltink,et al.  Accelerometer and rate gyroscope measurement of kinematics: an inexpensive alternative to optical motion analysis systems. , 2002, Journal of biomechanics.

[45]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .