Controlled English for knowledge representation

Knowledge representation is a long-standing research area of computer science that aims at representing human knowledge in a form that computers can interpret. Most knowledge representation approaches, however, have suffered from poor user interfaces. It turns out to be difficult for users to learn and use the logic-based languages in which the knowledge has to be encoded. A new approach to design more intuitive but still reliable user interfaces for knowledge representation systems is the use of controlled natural language (CNL). CNLs are subsets of natural languages that are restricted in a way that allows their automatic translation into formal logic. A number of CNLs have been developed but the resulting tools are mostly just prototypes so far. Furthermore, nobody has yet been able to provide strong evidence that CNLs are indeed easier to understand than other logic-based languages. The goal of this thesis is to give the research area of CNLs for knowledge representation a shift in perspective: from the present explorative and proof-of-concept-based approaches to a more engineering focused point of view. For this reason, I introduce theoretical and practical building blocks for the design and application of controlled English for the purpose of knowledge representation. I first show how CNLs can be defined in an adequate and simple way by the introduction of a novel grammar notation and I describe efficient algorithms to process such grammars. I then demonstrate how these theoretical concepts can be implemented and how CNLs can be embedded in knowledge representation tools so that they provide intuitive and powerful user interfaces that are accessible even to untrained users. Finally, I discuss how the understandability of CNLs can be evaluated. I argue that the understandability of CNLs cannot be assessed reliably with existing approaches, and for this reason I introduce a novel testing framework. Experiments based on this framework show that CNLs are not only easier to understand than comparable languages but also need less time to be learned and are preferred by users.

[1]  Eva-Martin Mueckstein Controlled natural language interfaces (extended abstract): the best of three worlds , 1985, CSC '85.

[2]  Brian Davis,et al.  On Controlled Natural Languages: Properties and Prospects , 2009, CNL.

[3]  Kaarel Kaljurand,et al.  Controlled English for Reasoning on the Semantic Web , 2009, REWERSE.

[4]  Norbert E. Fuchs,et al.  Attempto Controlled English (ACE) , 1996, ArXiv.

[5]  Arendse Bernth EasyEnglish: A Tool for Improving Document Quality , 1997, ANLP.

[6]  Mihai Radulescu,et al.  KiWi - A Platform for Semantic Social Software , 2009, SemWiki.

[7]  Uwe Reyle,et al.  From Discourse to Logic - Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory , 1993, Studies in linguistics and philosophy.

[8]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[9]  Richard Power,et al.  What You See Is What You Meant: direct knowledge editing with natural language feedback , 1998, ECAI.

[10]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[11]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[12]  Michael A. Covington,et al.  Natural Language Processing for Prolog Programmers , 1993 .

[13]  Abraham Bernstein,et al.  How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users? , 2007, ISWC/ASWC.

[14]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[15]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[16]  John F. Sowa,et al.  Knowledge representation: logical, philosophical, and computational foundations , 2000 .

[17]  Mark Steedman,et al.  Combinatory grammars and parasitic gaps , 1987 .

[18]  Mirina Grosz,et al.  World Wide Web Consortium , 2010 .

[19]  Silvie Spreeuwenberg,et al.  SBVR's Approach to Controlled Natural Language , 2009, CNL.

[20]  Fernando Pereira,et al.  Extraposition Grammars , 1981, Am. J. Comput. Linguistics.

[21]  Thomas Andreas Meyer,et al.  Sydney OWL Syntax - towards a Controlled Natural Language Syntax for OWL 1.1 , 2007, OWLED.

[22]  Ian Horrocks,et al.  FaCT++ Description Logic Reasoner: System Description , 2006, IJCAR.

[23]  Ian Pratt-Hartmann,et al.  A Two-Variable Fragment of English , 2002, J. Log. Lang. Inf..

[24]  Patrick Doherty,et al.  Temporal Action Logic for Question Answering in an Adventure Game , 2008, AGI.

[25]  R. Cole,et al.  Survey of the State of the Art in Human Language Technology , 2010 .

[26]  Catherine Dolbear,et al.  A Comparison of three Controlled Natural Languages for OWL 1.1 , 2008, OWLED.

[27]  Aravind K. Joshi,et al.  Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[28]  Mik Kersten,et al.  How are lava software developers using the eclipse IDE , 2006 .

[29]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[30]  John F. Sowa,et al.  Fads and Fallacies about Logic , 2007, IEEE Intelligent Systems.

[31]  Michael A. Covington,et al.  GULP 3.1: An extension of Prolog for unification-based grammar , 1994 .

[32]  Peter Clark,et al.  Acquiring and Using World Knowledge Using a Restricted Subset of English , 2005, FLAIRS Conference.

[33]  Sebastian Schaffert,et al.  IkeWiki: A Semantic Wiki for Collaborative Knowledge Management , 2006, 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE'06).

[34]  Philippe Martin,et al.  Knowledge Representation in CGLF, CGIF, KIF, Frame-CG and Formalized-English , 2002, ICCS.

[35]  Tobias Kuhn AceWiki: A Natural and Expressive Semantic Wiki , 2008, ArXiv.

[36]  Chen C. Chang,et al.  Model Theory: Third Edition (Dover Books On Mathematics) By C.C. Chang;H. Jerome Keisler;Mathematics , 1966 .

[37]  S. Joy Mountford,et al.  The Art of Human-Computer Interface Design , 1990 .

[38]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[39]  Robert T. Kasper,et al.  A Logical Semantics for Feature Structures , 1986, ACL.

[40]  Terry A. Halpin,et al.  Business Rule Verbalization , 2004, ISTA.

[41]  T. Grandon Gill Early Expert Systems : Where Are They Now ? , 2002 .

[42]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[43]  Peter F. Patel-Schneider,et al.  Usability Issues in Knowledge Representation Systems , 1998, AAAI/IAAI.

[44]  Sven Hurum Handling Scope Ambiguities In English , 1988, ANLP.

[45]  Friedrich L. Bauer,et al.  Revised report on the algorithm language ALGOL 60 , 1963, CACM.

[46]  Kalina Bontcheva,et al.  CLOnE: Controlled Language for Ontology Editing , 2007, ISWC/ASWC.

[47]  Kaarel Kaljurand ACE View --- an Ontology and Rule Editor based on Attempto Controlled English , 2008, OWLED.

[48]  Yorick Wilks,et al.  Syntax, Preference, and Right Attachment , 1985, IJCAI.

[49]  Ilkka Niemelä,et al.  Smodels - An Implementation of the Stable Model and Well-Founded Semantics for Normal LP , 1997, LPNMR.

[50]  Terry A. Halpin,et al.  Automated Verbalization for ORM 2 , 2006, OTM Workshops.

[51]  Massimo Poesio,et al.  Semantic Ambiguity and Perceived Ambiguity , 1995, ArXiv.

[52]  Martin Hepp,et al.  myOntology : The Marriage of Ontology Engineering and Collective Intelligence , 2007 .

[53]  Tania Tudorache,et al.  Web-Protege: A Lightweight OWL Ontology Editor for the Web , 2008, OWLED.

[54]  Christoph Lange,et al.  SWiM -- A Semantic Wiki for Mathematical Knowledge Management , 2008, ESWC.

[55]  Johan Bos,et al.  Let's not Argue about Semantics , 2008, LREC.

[56]  Tobias Kuhn,et al.  An Evaluation Framework for Controlled Natural Languages , 2009, CNL.

[57]  Teruko Mitamura,et al.  Controlled English for Knowledge-Based MT: Experience with the KANT System , 2006 .

[58]  David L. Waltz,et al.  An English language question answering system for a large relational database , 1978, CACM.

[59]  David Z. Hirtle TRANSLATOR : A TRANSlator from LAnguage TO Rules ∗ , 2006 .

[60]  Marc Hassenzahl,et al.  Interview with Don Norman , 2004, INTR.

[61]  Tobias Kuhn AceRules: Executing Rules in Controlled Natural Language , 2007, RR.

[62]  Stevan Harnad,et al.  Symbol grounding problem , 1990, Scholarpedia.

[63]  Ivan A. Sag,et al.  Syntactic Theory: A Formal Introduction , 1999, Computational Linguistics.

[64]  Norbert E. Fuchs,et al.  Reasoning in Attempto Controlled English , 2003, PPSWR.

[65]  Johan Bos,et al.  Computational Semantics in Discourse: Underspecification, Resolution, and Inference , 2004, J. Log. Lang. Inf..

[66]  Stuart C. Shapiro Review of Knowledge representation: logical, philosophical, and computational foundations by John F. Sowa. Brooks/Cole 2000. , 2001 .

[67]  Richard Power,et al.  Composing Questions through Conceptual Authoring , 2007, CL.

[68]  Tobias Kuhn,et al.  How Controlled English can Improve Semantic Wikis , 2009, SemWiki.

[69]  Christian Wagner End Users as Expert System Developers? , 2000, J. Organ. End User Comput..

[70]  Nigel Shadbolt,et al.  A Controlled Natural Language Interface for Semantic Media Wiki Using the Rabbit Language , 2009, CNL.

[71]  John McCarthy,et al.  Recursive functions of symbolic expressions and their computation by machine, Part I , 1959, Commun. ACM.

[72]  Kathleen Dahlgren,et al.  Using Commonsense Knowledge to Disambiguate Prepositional Phrase Modifiers , 1986, AAAI.

[73]  Ronald G. Ross,et al.  Principles of the business rule approach: Ronald G. Ross, Addison-Wesley Information Technology Series, February 2003, 256pp., price £30.99, ISBN 0-201-78893-4 , 2004, Int. J. Inf. Manag..

[74]  Axel Rauschmayer Next-Generation Wikis: What Users Expect; How RDF Helps , 2008, SemWiki.

[75]  Tobias Kuhn Combining Semantic Wikis and Controlled Natural Language , 2008, International Semantic Web Conference.

[76]  Peter Clark,et al.  Capturing and answering questions posed to a knowledge-based system , 2007, K-CAP '07.

[77]  Catherine Faron-Zucker,et al.  SweetWiki: A semantic wiki , 2008, J. Web Semant..

[78]  Dirk Schreurs,et al.  From Cogram To Alcogram: Toward A Controlled English Grammar Checker , 1992, COLING.

[79]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[80]  Daniele Nardi,et al.  An Introduction to Description Logics , 2003, Description Logic Handbook.

[81]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[82]  Chong Wang,et al.  PANTO: A Portable Natural Language Interface to Ontologies , 2007, ESWC.

[83]  Nigel Shadbolt,et al.  Development of a Controlled Natural Language Interface for Semantic MediaWiki , 2009, CNL.

[84]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[85]  Mik Kersten,et al.  How are Java software developers using the Elipse IDE? , 2006, IEEE Software.

[86]  Aarne Ranta,et al.  Grammatical Framework Web Service , 2009, EACL.

[87]  David W. Chadwick,et al.  Expressions of expertness: the virtuous circle of natural language for access control policy specification , 2008, SOUPS '08.

[88]  Norbert E. Fuchs,et al.  A Natural Language Front-End to Model Generation , 1999 .

[89]  John P. McDermott,et al.  R1: The Formative Years , 1981, AI Mag..

[90]  Murray Hill,et al.  Yacc: Yet Another Compiler-Compiler , 1978 .

[91]  Sharon O’Brien Controlling controlled English , 2003, EAMT.

[92]  Kaarel Kaljurand Paraphrasing Controlled English Texts , 2009, CNL.

[93]  Richard H. Wojcik,et al.  An Automated Grammar and Style Checker for Writers of Simplified English , 1992 .

[94]  Robert Stevens,et al.  OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns , 2004, EKAW.

[95]  Markus Krötzsch,et al.  Semantic Wikipedia , 2007, WWW '06.

[96]  Gregor Erbach,et al.  ProFIT: Prolog with Features, Inheritance and Templates , 1995, EACL.

[97]  William J. Clancey The Knowledge Level Reinterpreted: Modeling How Systems Interact , 2005, Machine Learning.

[98]  Aarne Ranta,et al.  Implementing Controlled Languages in GF , 2009, CNL.

[99]  Selmer Bringsjord,et al.  Reporting on Some Logic-Based Machine Reading Research , 2007, AAAI Spring Symposium: Machine Reading.

[100]  Shalom Lappin,et al.  An Algorithm for Pronominal Anaphora Resolution , 1994, CL.

[101]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[102]  Kalina Bontcheva,et al.  RoundTrip Ontology Authoring , 2008, SEMWEB.

[103]  Colin G. Drury,et al.  Simplified English for Aircraft Workcards , 1996 .

[104]  Ralf Schwitter,et al.  ECOLE: a look-ahead editor of controlled language , 2003, EAMT.

[105]  R. Bernardi,et al.  Lite Natural Language , 2006 .

[106]  Michael Gelfond,et al.  Classical negation in logic programs and disjunctive databases , 1991, New Generation Computing.

[107]  Peter Strevens,et al.  SEASPEAK: A Project in Applied Linguistics, Language Engineering, and Eventually ESP for Sailors. , 1983 .

[108]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[109]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[110]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[111]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[112]  Abraham Bernstein,et al.  Talking to the Semantic Web - A Controlled English Query Interface for Ontologies* , 2004 .

[113]  Gerold Schneider,et al.  Discourse representation structures for ACE 5 , 2006 .

[114]  Catherine Dolbear,et al.  Rabbit: Developing a Control Natural Language for Authoring Ontologies , 2008, ESWC.

[115]  Kalina Bontcheva,et al.  User-friendly ontology authoring using a controlled language , 2006, LREC.

[116]  Jonathan Pool Can Controlled Languages Scale to the Web , 2006 .

[117]  Stephen Pulman,et al.  Controlled Language for Knowledge Representation , 1996 .

[118]  I. A. Richards English Through Pictures , 2005 .

[119]  Teodor C. Przymusinski Stable semantics for disjunctive programs , 1991, New Generation Computing.

[120]  Gottlob Frege,et al.  Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens , 1879 .

[121]  Rolf Schwitter,et al.  Controlled Natural Languages meets the Semantic Web , 2004 .

[122]  Daniel Schwabe,et al.  Unifying Semantic Wikis and Semantic Web Applications , 2008, International Semantic Web Conference.

[123]  Krzysztof R. Apt,et al.  Acyclic programs , 2009, New Generation Computing.

[124]  Rolf Schwitter,et al.  Controlled Natural Language meets the SemanticWeb , 2004, ALTA.

[125]  Sören Auer,et al.  OntoWiki: A Tool for Social, Semantic Collaboration , 2006, CKC.

[126]  Donald Nute,et al.  Defeasible Logic , 1994, INAP.

[127]  Tobias Kuhn,et al.  AceWiki: Collaborative Ontology Management in Controlled Natural Language , 2008, SemWiki.

[128]  Amanda Spink,et al.  Failure analysis in query construction: data and analysis from a large sample of Web queries , 1998, DL '98.

[129]  Rolf Schwitter,et al.  Let's talk in description logic via controlled natural language , 2006 .

[130]  Geoff Sutcliffe The CADE-21 automated theorem proving system competition , 2008, AI Commun..

[131]  Kaarel Kaljurand,et al.  Attempto Controlled English for Knowledge Representation , 2008, Reasoning Web.

[132]  Boris Motik,et al.  OWL 2 Web Ontology Language: structural specification and functional-style syntax , 2008 .

[133]  Philipp Cimiano ORAKEL: A Natural Language Interface to an F-Logic Knowledge Base , 2004, NLDB.

[134]  Ceriel J. H. Jacobs,et al.  Parsing Techniques - A Practical Guide , 2007, Monographs in Computer Science.

[135]  Douglas R. Skuce,et al.  An English-Like Language For Qualitative Scientific Knowledge , 1975, IJCAI.

[136]  Robert Stevens,et al.  Editing OWL through Generated CNL , 2009, CNL.

[137]  Gary G. Hendrix,et al.  Developing a natural language interface to complex data , 1977, TODS.

[138]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[139]  M.McGee Wood,et al.  Natural language processing in Prolog , 1990 .

[140]  Hong-Gee Kim,et al.  Ontology-Based Controlled Natural Language Editor Using CFG with Lexical Dependency , 2007, ISWC/ASWC.

[141]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[142]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[143]  Leora Friedberg,et al.  The Impact of Technological Change on Older Workers: Evidence from Data on Computer Use , 2001 .

[144]  Kaarel Kaljurand,et al.  ATTEMPTO CONTROLLED ENGLISH AS A SEMANTIC WEB LANGUAGE , 2007 .

[145]  Sergey Lukichev,et al.  Verbalization of the REWERSE I1 Rule Markup Language , 2006 .

[146]  Björn Bringert,et al.  Interactive Multilingual Web Applications with Grammatical Framework , 2008, GoTAL.

[147]  U Schwertel,et al.  Plural semantics for natural language understanding — a computational proof-theoretic approach , 2005 .

[148]  Bruce G. Buchanan,et al.  Heuristic DENDRAL - A program for generating explanatory hypotheses in organic chemistry. , 1968 .

[149]  Abraham Bernstein,et al.  GINO - A Guided Input Natural Language Ontology Editor , 2006, SEMWEB.

[150]  John McDermott,et al.  The Formative Years , 1981 .

[151]  James H. Martin,et al.  Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .

[152]  Ronald J. Brachman,et al.  An overview of the KL-ONE Knowledge Representation System , 1985 .

[153]  Tobias Kuhn How to Evaluate Controlled Natural Languages , 2009, CNL.

[154]  Geoff Sutcliffe,et al.  The TPTP Problem Library , 1994, Journal of Automated Reasoning.

[155]  Norbert E. Fuchs,et al.  Web-Annotations for Humans and Machines , 2007, ESWC.

[156]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[157]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[158]  Tobias Kuhn,et al.  Writing Support for Controlled Natural Languages , 2008, ALTA.

[159]  Norbert E. Fuchs,et al.  Improving Text Mining with Controlled Natural Language: A Case Study for Protein Interactions , 2006, DILS.

[160]  Ivan A. Sag,et al.  Book Reviews: Head-driven Phrase Structure Grammar and German in Head-driven Phrase-structure Grammar , 1996, CL.

[161]  Robert Stevens,et al.  The Manchester OWL Syntax , 2006, OWLED.

[162]  Donald E. Knuth,et al.  backus normal form vs. Backus Naur form , 1964, CACM.

[163]  Fernando Pereira,et al.  Definite clause grammars for language analysis , 1986 .

[164]  Arthur B. Markman,et al.  Knowledge Representation , 1998 .

[165]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[166]  Michael Krauthammer,et al.  Writing clinical practice guidelines in controlled natural language , 2009 .

[167]  Verónica Dahl,et al.  Assumption Grammars for Processing Natural Language , 1997, ICLP.

[168]  IT Informatics,et al.  Backus-Naur Form , 2010 .

[169]  Charles A. Verbeke Caterpillar Fundamental English. , 1973 .

[170]  Catherine Dolbear,et al.  ROO: Involving Domain Experts in Authoring OWL Ontologies , 2008, International Semantic Web Conference.

[171]  Peter Clark,et al.  Naturalness vs. Predictability: A Key Debate in Controlled Languages , 2009, CNL.