An ontology based text mining system for knowledge discovery from the diagnosis data in the automotive domain

Abstract In automotive domain, overwhelming volume of textual data is recorded in the form of repair verbatim collected during the fault diagnosis (FD) process. Here, the aim of knowledge discovery using text mining (KDT) task is to discover the best-practice repair knowledge from millions of repair verbatim enabling accurate FD. However, the complexity of KDT problem is largely due to the fact that a significant amount of relevant knowledge is buried in noisy and unstructured verbatim. In this paper, we propose a novel ontology-based text mining system, which uses the diagnosis ontology for annotating key terms recorded in the repair verbatim. The annotated terms are extracted in different tuples, which are used to identify the field anomalies. The extracted tuples are further used by the frequently co-occurring clustering algorithm to cluster the repair verbatim data such that the best-practice repair actions used to fix commonly observed symptoms associated with the faulty parts can be discovered. The performance of our system has been validated by using the real world data and it has been successfully implemented in a web based distributed architecture in real life industry.

[1]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[2]  Stanley F. Chen,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[3]  Soon Myoung Chung,et al.  Text document clustering based on frequent word meaning sequences , 2008, Data Knowl. Eng..

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[7]  James F. Cali,et al.  TQM for Purchasing Management , 1992 .

[8]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[9]  G. T. Nicol,et al.  Flex : the lexical scanner generator , 1993 .

[10]  Ashish Sureka,et al.  Mining Automotive Warranty Claims Data for Effective Root Cause Analysis , 2008, DASFAA.

[11]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[12]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[13]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Approach to Identifying Sentence Boundaries , 1997, ANLP.

[14]  Nicola Ferro,et al.  Annotations as Context for Searching Documents , 2005, CoLIS.

[15]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part II: Qualitative models and search strategies , 2003, Comput. Chem. Eng..

[16]  Halasya Siva Subramania,et al.  Ontology-driven data collection and validation framework for the diagnosis of vehicle healthmanagement , 2012, Int. J. Comput. Integr. Manuf..

[17]  Hongfang Liu,et al.  Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS , 2002, J. Am. Medical Informatics Assoc..

[18]  Tim Baines,et al.  State-of-the-art in integrated vehicle health management , 2009 .

[19]  Benjamin C. M. Fung,et al.  Document Clustering Method Based on Frequent Co-occurring Words , 2006, PACLIC.

[20]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[21]  Lee W. Lacy OWL: Representing Information Using the Web Ontology Language , 2006 .

[22]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part III: Process history based methods , 2003, Comput. Chem. Eng..

[23]  José Francisco Martínez Trinidad,et al.  Document Clustering Based on Maximal Frequent Sequences , 2006, FinTAL.

[24]  Ashraf Labib,et al.  World‐class maintenance using a computerised maintenance management system , 1998 .

[25]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[26]  Manfred Broy,et al.  Engineering Automotive Software , 2007, Proceedings of the IEEE.

[27]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[28]  S. Mukherjee,et al.  Automated Fault Tree Generation: Bridging Reliability with Text Mining , 2007, 2007 Annual Reliability and Maintainability Symposium.

[29]  Marti A. Hearst,et al.  Adaptive Multilingual Sentence Boundary Disambiguation , 1997, CL.

[30]  Dietrich Rebholz-Schuhmann,et al.  BIOINFORMATICS ORIGINAL PAPER Data and text mining Resolving abbreviations to their senses in Medline , 2005 .

[31]  Rahul Chougule,et al.  Application of ontology guided search for improved equipment diagnosis in a vehicle assembly plant , 2009, 2009 IEEE International Conference on Automation Science and Engineering.

[32]  Serguei V. S. Pakhomov Semi-Supervised Maximum Entropy Based Approach to Acronym and Abbreviation Normalization in Medical Texts , 2002, ACL.

[33]  Dnyanesh Rajpathak,et al.  A generic ontology development framework for data integration and decision support in a distributed environment , 2011, Int. J. Comput. Integr. Manuf..

[34]  Xijin Tang,et al.  Text clustering using frequent itemsets , 2010, Knowl. Based Syst..

[35]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[36]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[37]  Jirachai Buddhakulsomsiri,et al.  Sequential pattern mining algorithm for automotive warranty data , 2009, Comput. Ind. Eng..

[38]  Dieter Fensel,et al.  The essence of problem-solving methods: making assumptions to gain efficiency , 1998, Int. J. Hum. Comput. Stud..

[39]  Curt Burgess,et al.  Producing high-dimensional semantic spaces from lexical co-occurrence , 1996 .

[40]  Z. Williams Benefits of IVHM: an analytical approach , 2006, 2006 IEEE Aerospace Conference.

[41]  Enrico Motta,et al.  A generic library of problem solving methods for scheduling applications , 2006, IEEE Transactions on Knowledge and Data Engineering.

[42]  Horacio Rodríguez,et al.  Automatically extracting Translation Links using a wide coverage semantic taxonomy , 2007 .

[43]  S. C. Hui,et al.  Data mining for customer service support , 2000, Inf. Manag..

[44]  Pulak Bandyopadhyay,et al.  A domain-specific decision support system for knowledge discovery using association and text mining , 2011, Knowledge and Information Systems.

[45]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[46]  Jin Wang,et al.  An optimization model for concurrent selection of tolerances and suppliers , 2001 .

[47]  Dominique Haughton,et al.  A Review of Two Text-Mining Packages , 2005 .

[48]  Hongfang Liu,et al.  Disambiguating Ambiguous Biomedical Terms in Biomedical Narrative Text: An Unsupervised Method , 2001, J. Biomed. Informatics.

[49]  Óscar Corcho,et al.  Ontology based document annotation: trends and open research problems , 2006, Int. J. Metadata Semant. Ontologies.

[50]  Yi Lu Murphey,et al.  Text Mining with Application to Engineering Diagnostics , 2006, IEA/AIE.

[51]  José Palazzo Moreira de Oliveira,et al.  Knowledge Discovery in Texts for Constructing Decision Support Systems , 2004, Applied Intelligence.

[52]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[53]  Hong Yu,et al.  A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations , 2006, TOIS.

[54]  Raghunathan Rengaswamy,et al.  A review of process fault detection and diagnosis: Part I: Quantitative model-based methods , 2003, Comput. Chem. Eng..