Benchmarking for Keyword Extraction Methodologies in Maintenance Work Orders

Maintenance has largely remained a human-knowledge centered activity, with the primary records of activity being textbased maintenance work orders (MWOs). However, the bulk of maintenance research does not currently attempt to quantify human knowledge, though this knowledge can be rich with useful contextual and system-level information. The underlying quality of data in MWOs often suffers from misspellings, domain-specific (or even workforce specific) jargon, and abbreviations, that prevent its immediate use in computer analyses. Therefore, approaches to making this data computable must translate unstructured text into a formal schema or system; i.e., perform a mapping from informal technical language to some computable format. Keyword spotting (or, extraction) has proven a valuable tool in reducing manual efforts while structuring data, by providing a systematic methodology to create computable knowledge. This technique searches for known vocabulary in a corpus and maps them to designed higher level concepts, shifting the primary effort away from structuring the MWOs themselves, toward creating a dictionary of domain specific terms and the knowledge that they represent. The presented work compares rules-based keyword extraction to data-driven tagging assistance, through quantitative and qualitative discussion of the key advantages and disadvantages. This will enable maintenance practitioners to select an appropriate approach to information encoding that provides needed functionality at minimal cost and effort.

[1]  Doc Palmer,et al.  Maintenance Planning and Scheduling Handbook , 1999 .

[2]  Anthony M. Sc Kelly Maintenance Organization and Systems , 1997 .

[3]  Michael Brundage,et al.  Hybrid datafication of maintenance logs from AI-assisted human tags , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Mary K Pulvermacher,et al.  Toward the Use of an Upper Ontology for U.S. Government and U.S. Military Domains: An Evaluation , 2004 .

[5]  William Q. Meeker,et al.  Statistical Methods for Reliability Data Using SAS R Software , 1997 .

[6]  Ari Rappoport,et al.  Semi-Supervised Recognition of Sarcasm in Twitter and Amazon , 2010, CoNLL.

[7]  Joanna Sikorska,et al.  A collaborative data library for testing prognostic models , 2016 .

[8]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[9]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[10]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[11]  Daniel T. Heinze,et al.  Mining free-text medical records , 2001, AMIA.

[12]  George Hripcsak,et al.  Technical Brief: Agreement, the F-Measure, and Reliability in Information Retrieval , 2005, J. Am. Medical Informatics Assoc..

[13]  Monica Chiarini Tremblay,et al.  Identifying fall-related injuries: Text mining the electronic medical record , 2009, Inf. Technol. Manag..

[14]  Soumaya Yacout,et al.  Ontology Modeling in Physical Asset Integrity Management , 2015 .

[15]  Thurston Sexton,et al.  Semi-Autonomous Labeling of Unstructured Maintenance Log Data for Diagnostic Root Cause Analysis | NIST , 2016 .

[16]  Melinda R. Hodkiewicz,et al.  Goal Hierarchy: Improving Asset Data Quality by Improving Motivation , 2011, Reliab. Eng. Syst. Saf..

[17]  Hyoil Han,et al.  Approaches to text mining for clinical medical records , 2006, SAC '06.

[18]  Brigitte Chebel-Morello,et al.  A formal ontology for industrial maintenance , 2012, Appl. Ontology.

[19]  Olivier Bodenreider Multi-lingual Features of the Unified Medical Language System , 2013, CLEF.

[20]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[21]  Mark Ho,et al.  A shared reliability database for mobile mining equipment , 2015 .

[22]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[23]  Adolfo Crespo Mrquez The Maintenance Management Framework: Models and Methods for Complex Systems Maintenance , 2007 .

[24]  L. Venkata Subramaniam,et al.  Data Cleansing Techniques for Large Enterprise Datasets , 2011, 2011 Annual SRII Global Conference.

[25]  N. Zerhouni,et al.  Process of s-maintenance: decision support system for maintenance intervention , 2005, 2005 IEEE Conference on Emerging Technologies and Factory Automation.

[26]  Chengfang Fang,et al.  Information Leakage in Optimal Anonymized and Diversified Data , 2008, Information Hiding.

[27]  Melinda R. Hodkiewicz,et al.  Are managerial pressure, technological control and intrinsic motivation effective in improving data quality? , 2013, Reliab. Eng. Syst. Saf..

[28]  Glen D. Murphy,et al.  Testing a tri-partite contingent model of engineering cultures: A pilot study , 2010, Reliab. Eng. Syst. Saf..

[29]  Robert Arp,et al.  Building Ontologies with Basic Formal Ontology , 2015 .