Towards Evaluating the Impact of Semantic Support for Curating the Fungus Scientic Literature

We present our ongoing development of a semantic infras- tructure supporting biofuel research. Part of this eort is the automatic curation of knowledge from the massive amount of information on fungal enzymes that is available in genomics. Working closely with biologists who manually curate the existing literature, we developed ontological NLP pipelines, integrated through Web-based interfaces, to help them in two main tasks: spending less time to mine the literature for facts, while also being provided with richer and semantically linked information. An ongoing challenge is to measure precisely how much the developed semantic technologies benet the end users and what their overall impact on the quality of the curated data is. We present preliminary evaluation results that show a signicant reduction in manual curation time.

[1]  Sven Apel,et al.  An Algebra for Features and Feature Composition , 2008, AMAST.

[2]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[3]  Mathias Weske,et al.  Business Process Management: Concepts, Languages, Architectures , 2007 .

[4]  James A. Hendler,et al.  N3Logic: A logical framework for the World Wide Web , 2007, Theory and Practice of Logic Programming.

[5]  Nicholas R. Jennings,et al.  A Roadmap of Agent Research and Development , 2004, Autonomous Agents and Multi-Agent Systems.

[6]  Harold Boley,et al.  Social Semantic Rule Sharing and Querying in Wellness Communities , 2009, ASWC.

[7]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[8]  Don S. Batory,et al.  Scaling step-wise refinement , 2004, IEEE Transactions on Software Engineering.

[9]  Alexandre Passant,et al.  Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval in Weblogs: Theoretical background and corporate use-case , 2007, ICWSM.

[10]  Wil M. P. van der Aalst Configurable Services in the Cloud: Supporting Variability While Enabling Cross-Organizational Process Mining , 2010, OTM Conferences.

[11]  Radmila Juric,et al.  Sharing e-Health Information through Ontological Layering , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[12]  David L. Martin,et al.  Semantic Web Services , 2012, Springer Berlin Heidelberg.

[13]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[14]  Jan Bosch The challenges of broadening the scope of software product families , 2006, CACM.

[15]  Kalliopi Kravari,et al.  Cross-Community Interoperation between the EMERALD and Rule Responder Multi-Agent Systems , 2011, RuleML Europe.

[16]  Remco M. Dijkman,et al.  Similarity of business process models: Metrics and evaluation , 2011, Inf. Syst..

[17]  Sarunas Raudys,et al.  Group Interests of Agents Functioning in Changing Environments , 2005, CEEMAS.

[18]  J. Wareham,et al.  Health 2.0 and Medicine 2.0: Tensions and Controversies in the Field , 2008, Journal of medical Internet research.

[19]  M. Ebell,et al.  Analysis of questions asked by family doctors regarding patient care , 1999, BMJ.

[20]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[21]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[22]  Klaus Schmid,et al.  Software product lines in action - the best industrial practice in product line engineering , 2007 .

[23]  Klaus Pohl,et al.  Software Product Line Engineering - Foundations, Principles, and Techniques , 2005 .

[24]  D. Covell,et al.  Information needs in office practice: are they being met? , 1985, Annals of internal medicine.

[25]  Kyo Chul Kang,et al.  Feature-Oriented Domain Analysis (FODA) Feasibility Study , 1990 .

[26]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[27]  R. van Ommering Software reuse in product populations , 2005, IEEE Transactions on Software Engineering.

[28]  Michal Antkiewicz,et al.  Mapping features to models: a template approach based on superimposed variants , 2005, GPCE'05.

[29]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[30]  Johan Bos Introduction to the Shared Task on Comparing Semantic Representations , 2008, STEP.

[31]  Martin Erwig,et al.  The Choice Calculus: A Representation for Software Variation , 2011, TSEM.

[32]  Tomas Vitvar,et al.  SAWSDL: Semantic Annotations for WSDL and XML Schema , 2007, IEEE Internet Computing.

[33]  Michael Schrefl,et al.  Definition of Business Process Integration Operators for Generalization , 2005, ICEIS.

[34]  Justin Powlowski,et al.  Curation of characterized glycoside hydrolases of Fungal origin , 2011, Database J. Biol. Databases Curation.

[35]  James A. Hendler,et al.  Agents and the Semantic Web , 2001, IEEE Intell. Syst..

[36]  Marek Hatala,et al.  Towards open ontology learning and filtering , 2011, Inf. Syst..

[37]  Jan Bosch,et al.  Widening the Scope of Software Product Lines - From Variation to Composition , 2002, SPLC.

[38]  Raphael Volz,et al.  The Ontology Extraction & Maintenance Framework Text-To-Onto , 2001 .

[39]  Harry Chen,et al.  Intelligent agents meet semantic web in a smart meeting room , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[40]  Andreas Hotho,et al.  Towards Semantic Web Mining , 2002, SEMWEB.

[41]  Boris Motik,et al.  MAFRA - A MApping FRAmework for Distributed Ontologies , 2002, EKAW.

[42]  Dragan Gasevic,et al.  Stratified Analytic Hierarchy Process: Prioritization and Selection of Software Features , 2010, SPLC.

[43]  Adrian Paschke,et al.  RuleML 1.0: The Overarching Specification of Web Rules , 2010, RuleML.

[44]  Andreas Classen,et al.  Tag and prune: a pragmatic approach to software product line implementation , 2010, ASE.

[45]  Yorick Wilks,et al.  Natural language inference. , 1973 .

[46]  Axel Polleres,et al.  XSPARQL: Traveling between the XML and RDF Worlds - and Avoiding the XSLT Pilgrimage , 2008, ESWC.

[47]  Adrian Paschke,et al.  Rule responder: RuleML-based agents for distributed collaboration on the pragmatic web , 2007, ICPW '07.

[48]  Matthias Klusch,et al.  WSMO-MX: A hybrid Semantic Web service matchmaker , 2009, Web Intell. Agent Syst..

[49]  Pierre Zweigenbaum,et al.  Knowledge and Reasoning for Medical Question-Answering , 2009 .

[50]  Tim Finin,et al.  Adding Semantics to Social Websites for Citizen Science , 2007, AAAI 2007.

[51]  Leslie S. Liu,et al.  Barriers to the adoption and use of personal health record systems , 2011, iConference.

[52]  Mathieu Acher,et al.  Composing Feature Models , 2009, SLE.

[53]  José L. V. Mejino,et al.  Representing Complexity in Part-Whole Relationships within the Foundational Model of Anatomy , 2003, AMIA.

[54]  Harold Boley,et al.  The OO jDREW Reference Implementation of RuleML , 2005, RuleML.

[55]  Stephen B. Johnson,et al.  Accessing Heterogeneous Sources of Evidence to Answer Clinical Questions , 2001, J. Biomed. Informatics.

[56]  H. Cunningham,et al.  Web-based Collaborative Corpus Annotation : Requirements and a Framework Implementation , 2010 .

[57]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[58]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[59]  Harold Boley A RIF-Style Semantics for RuleML-Integrated Positional-Slotted, Object-Applicative Rules , 2011, RuleML Europe.

[60]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[61]  Harold Boley,et al.  Personal Agents in the Rule Responder Architecture , 2008, RuleML.

[62]  Tomek Strzalkowski,et al.  From Discourse to Logic , 1991 .

[63]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[64]  Leon Sterling,et al.  The Art of Prolog - Advanced Programming Techniques , 1986 .

[65]  Rob C. van Ommering Building product populations with software components , 2002, ICSE '02.

[66]  Christopher Arthur Brewster Mind the gap : bridging from text to ontological knowledge , 2008 .

[67]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[68]  Adrian Paschke,et al.  Principles of the SymposiumPlanner Instantiations of Rule Responder , 2011, RuleML America.

[69]  David Sánchez,et al.  Learning non-taxonomic relationships from web documents for domain ontology construction , 2008, Data Knowl. Eng..

[70]  Fabrizio Maria Maggi,et al.  Managing Business Process Flexibility and Reuse through Business Process Lines , 2009, ICSOFT.

[71]  Li Ding,et al.  Using semantic web technology in multi-agent systems: a case study in the TAGA trading agent environment , 2003, ICEC '03.

[72]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[73]  Mark Klein,et al.  Semantic Process Retrieval with iSPARQL , 2007, ESWC.

[74]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[75]  G. Eysenbach Medicine 2.0: Social Networking, Collaboration, Participation, Apomediation, and Openness , 2008, Journal of medical Internet research.

[76]  D. Mladení,et al.  SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION SYSTEM , 2006 .

[77]  Laurent Brisson,et al.  How to Semantically Enhance a Data Mining Process? , 2008, ICEIS.

[78]  Falko Menge Enterprise Service Bus , 2007 .

[79]  Marco Sinnema,et al.  VxBPEL: Supporting variability for Web services in BPEL , 2009, Inf. Softw. Technol..

[80]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[81]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[82]  Sean R. Eddy,et al.  The Distributed Annotation System , 2001, BMC Bioinformatics.

[83]  Michael Kifer,et al.  A Guide to the Basic Logic Dialect for Rule Interchange on the Web , 2010, IEEE Transactions on Knowledge and Data Engineering.

[84]  John Mylopoulos,et al.  Ontologies for Knowledge Management: An Information Systems Perspective , 2004, Knowledge and Information Systems.

[85]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[86]  Abraham Bernstein,et al.  The Creation and Evaluation of iSPARQL Strategies for Matchmaking , 2008, ESWC.

[87]  Antonio Ruiz Cortés,et al.  Automated Merging of Feature Models Using Graph Transformations , 2007, GTTSE.

[88]  Beatrice Alex,et al.  Assisted Curation: Does Text Mining Really Help? , 2007, Pacific Symposium on Biocomputing.

[89]  Enrico Motta,et al.  Bridging the gap between folksonomies and the semantic web: an experience report , 2007 .

[90]  Andrew B. Williams,et al.  Learning to Share Meaning in a Multi-Agent System , 2004, Autonomous Agents and Multi-Agent Systems.

[91]  Silvana Castano,et al.  Knowledge Representation and Transformation in Ontology-based Data Integration , 2003, Knowledge Transformation for the Semantic Web.

[92]  Petra Bosch-Sijtsema,et al.  From integration to composition: On the impact of software product lines, global development and ecosystems , 2010, J. Syst. Softw..

[93]  K. Bretonnel Cohen,et al.  Intrinsic Evaluation of Text Mining Tools May Not Predict Performance on Realistic Tasks , 2007, Pacific Symposium on Biocomputing.

[94]  Michael Schroeder,et al.  Facts from text: can text mining help to scale-up high-quality manual curation of gene products with ontologies? , 2008, Briefings Bioinform..

[95]  Ziqi Zhang,et al.  Issues in learning an ontology from text , 2009, BMC Bioinformatics.

[96]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[97]  Wil M. P. van der Aalst,et al.  Configurable Process Models as a Basis for Reference Modeling , 2005, Business Process Management Workshops.

[98]  René Witte,et al.  Semantic Assistants - User-Centric Natural Language Processing Services for Desktop Clients , 2008, ASWC.

[99]  Michael Schrefl,et al.  Behavior Based Integration of Composite Business Processes , 2005, Business Process Management.

[100]  Harold Boley RIF RuleML Rosetta Ring: Round-Tripping the Dlex Subset of Datalog RuleML and RIF-Core , 2009, RuleML.

[101]  Harold Boley Are Your Rules Online? Four Web Rule Essentials , 2007, RuleML.

[102]  Jaejoon Lee,et al.  A feature-oriented approach for developing reusable product line assets of service-based systems , 2010, J. Syst. Softw..

[103]  Hongfang Liu,et al.  Pacific Symposium on Biocomputing 9:238-249(2004) BIOLOGICAL NOMENCLATURES: A SOURCE OF LEXICAL KNOWLEDGE AND AMBIGUITY , 2022 .

[104]  Harold Boley,et al.  Integrating Positional and Slotted Knowledge on the Semantic Web , 2010 .

[105]  Haishan Liu,et al.  Towards Semantic Data Mining , 2010 .

[106]  Silvio Romero de Lemos Meira,et al.  SOPLE-DE: An Approach to Design Service-Oriented Product Line Architectures , 2010, SPLC.

[107]  René Witte,et al.  Ontology Design for Biomedical Text Mining , 2007 .

[108]  S. Federhen The Taxonomy Project , 2002 .

[109]  Sarunas Raudys Survival of Intelligent Agents in Changing Environments , 2004, ICAISC.

[110]  Frank Puhlmann,et al.  Variability Mechanisms in E-Business Process Families , 2006, BIS.

[111]  Adrian Paschke,et al.  Rule Responder Agents Framework and Instantiations , 2011 .

[112]  Sven Apel,et al.  An algebraic foundation for automatic feature-based program synthesis , 2010, Sci. Comput. Program..

[113]  Antje Chang,et al.  BRENDA, the enzyme information system in 2011 , 2010, Nucleic Acids Res..

[114]  Krzysztof Czarnecki,et al.  Generative programming - methods, tools and applications , 2000 .