Semantic Systems. In the Era of Knowledge Graphs: 16th International Conference on Semantic Systems, SEMANTiCS 2020, Amsterdam, The Netherlands, September 7–10, 2020, Proceedings

Since its inception in 2007, DBpedia has been constantly releasing open data in RDF, extracted from various Wikimedia projects using a complex software system called the DBpedia Information Extraction Framework (DIEF). For the past 12 years, the software received a plethora of extensions by the community, which positively affected the size and data quality. Due to the increase in size and complexity, the release process was facing huge delays (from 12 to 17 months cycle), thus impacting the agility of the development. In this paper, we describe the new DBpedia release cycle including our innovative release workflow, which allows development teams (in particular those who publish large, open data) to implement agile, cost-efficient processes and scale up productivity. The DBpedia release workflow has been re-engineered, its new primary focus is on productivity and agility, to address the challenges of size and complexity. At the same time, quality is assured by implementing a comprehensive testing methodology. We run an experimental evaluation and argue that the implemented measures increase agility and allow for costeffective quality-control and debugging and thus achieve a higher level of maintainability. As a result, DBpedia now publishes regular (i.e. monthly) releases with over 21 billion triples with minimal publishing effort.

[1]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[2]  Joachim Wackerow,et al.  Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases , 2013, SemStats@ISWC.

[3]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[4]  Eero Hyvönen,et al.  Modeling and Using an Actor Ontology of Second World War Military Units and Personnel , 2017, International Semantic Web Conference.

[5]  Hamish Cunningham,et al.  Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction , 2010, ESWC.

[6]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[7]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[8]  Shyamanta M. Hazarika,et al.  O-PrO: An Ontology for Object Affordance Reasoning , 2016, IHCI.

[9]  Eero Hyvönen,et al.  Linked Death - representing, publishing, and using Second World War death records as Linked Open Data , 2016, WHiSe@ESWC.

[10]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[11]  Rinke Hoekstra,et al.  YASGUI: Not Just Another SPARQL Client , 2013, SALAD@ESWC.

[12]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[13]  Frank van Harmelen Semantic Web Research Anno 2006: Main Streams, Popular Fallacies, Current Status and Future Challenges , 2006, CIA.

[14]  Natanael Arndt,et al.  Decentralized Collaborative Knowledge Management using Git , 2018, J. Web Semant..

[15]  Eero Hyvönen,et al.  WarSampo Data Service and Semantic Portal for Publishing Linked Open Data About the Second World War History , 2016, ESWC.

[16]  Nico Blodow,et al.  RoboSherlock: Unstructured information processing for robot perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[17]  Axel Polleres,et al.  A Scalable Consent, Transparency and Compliance Architecture , 2018, ESWC.

[18]  Robert J. Aalberts,et al.  Legal and Ethical Challenges of Online Behavioral Targeting in Advertising , 2014 .

[19]  Martin G. Skjæveland Sgvizler: A JavaScript Wrapper for Easy Visualization of SPARQL Result Sets , 2012, ESWC.

[20]  Jim Davies,et al.  Engineering Agile Big-Data Systems , 2018 .

[21]  Enrico Motta,et al.  Ontology-Based Recommendation of Editorial Products , 2018, International Semantic Web Conference.

[22]  Arvind Narayanan,et al.  Online Tracking: A 1-million-site Measurement and Analysis , 2016, CCS.

[23]  Sanja Fidler,et al.  Synthesizing Environment-Aware Activities via Activity Sketches , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Angela R. Cunningham After “it’s over over there”: Using record linkage to enable the reconstruction of World War I veterans’ demography from soldiers’ experiences to civilian populations , 2018, Historical Methods: A Journal of Quantitative and Interdisciplinary History.

[25]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[26]  Avigdor Gal,et al.  A framework for modeling and evaluating automatic semantic reconciliation , 2005, The VLDB Journal.

[27]  Aidan Hogan,et al.  RDF Explorer: A Visual Query Builder for Semantic Web Knowledge Graphs , 2019, SEMWEB.

[28]  Irlán Grangel-González,et al.  VoCol: An Integrated Environment to Support Version-Controlled Vocabulary Development , 2016, EKAW.

[29]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[30]  Heiko Paulheim,et al.  Knowledge graph refinement: A survey of approaches and evaluation methods , 2016, Semantic Web.

[31]  Georgios Meditskos,et al.  Converness: Ontology‐driven conversational awareness and context understanding in multimodal dialogue systems , 2019, Expert Syst. J. Knowl. Eng..

[32]  Andrei Z. Broder,et al.  A semantic approach to contextual advertising , 2007, SIGIR.

[33]  Conor Hayes,et al.  Using Linked Data to Build Open, Collaborative Recommender Systems , 2010, AAAI Spring Symposium: Linked Data Meets Artificial Intelligence.

[34]  Eero Hyvönen,et al.  Reassembling the Lives of Finnish Prisoners of the Second World War on the Semantic Web , 2019 .

[35]  Martin Hepp,et al.  GoodRelations: An Ontology for Describing Products and Services Offers on the Web , 2008, EKAW.

[36]  Walter Rudametkin,et al.  Beauty and the Beast: Diverting Modern Web Browsers to Build Unique Browser Fingerprints , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[37]  Andreas Harth,et al.  VisiNav: A system for visual search and navigation on web data , 2010, J. Web Semant..

[38]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[39]  Heiko Paulheim,et al.  Entity Extraction from Wikipedia List Pages , 2020, ESWC.

[40]  Sébastien Ferré,et al.  Sparklis: An expressive query builder for SPARQL endpoints with guidance in natural language , 2016, Semantic Web.

[41]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..

[42]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[43]  Ping Zhang,et al.  UNDERSTANDING CONSUMERS ATTITUDE TOWARD ADVERTISING , 2002 .

[44]  Renata Gonçalves Curty,et al.  Factors Influencing Research Data Reuse in the Social Sciences: An Exploratory Study , 2016, Int. J. Digit. Curation.

[45]  Eero Hyvönen,et al.  Exploring the Linked University Data with Visualization Tools , 2013, ESWC.

[46]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[47]  Axel Polleres,et al.  Binary RDF representation for publication and exchange (HDT) , 2013, J. Web Semant..

[48]  Paul Groth,et al.  Understanding data search as a socio-technical practice , 2018 .

[49]  Jorge A. Baier,et al.  How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval , 2017, IJCAI.

[50]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[51]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[52]  Daniel Hienert,et al.  Understanding the information needs of social scientists in Germany , 2019, ArXiv.

[53]  George Papastefanatos,et al.  rdf: SynopsViz - A Framework for Hierarchical Linked Data Visual Exploration and Analysis , 2014, ESWC.

[54]  Ahmad Alobaid,et al.  Automating ontology engineering support activities with OnToology , 2019, J. Web Semant..

[55]  Vivian Chu,et al.  Situated Bayesian Reasoning Framework for Robots Operating in Diverse Everyday Environments , 2017, ISRR.

[56]  Jeff Sauro,et al.  The Factor Structure of the System Usability Scale , 2009, HCI.

[57]  Sébastien Ferré SQUALL: The expressiveness of SPARQL 1.1 made available as a controlled natural language , 2014, Data Knowl. Eng..

[58]  Jens Lehmann,et al.  Sparklify: A Scalable Software Component for Efficient Evaluation of SPARQL Queries over Distributed RDF Datasets , 2019, SEMWEB.

[59]  Jens Lehmann,et al.  Wikidata through the Eyes of DBpedia , 2015, Semantic Web.

[60]  Muhammad Saleem,et al.  9th Challenge on Question Answering over Linked Data (QALD-9) (invited paper) , 2018, Semdeep/NLIWoD@ISWC.

[61]  Michael Beetz,et al.  Knowledge Representation for Cognition- and Learning-enabled Robot Manipulation , 2018, CogRob@KR.

[62]  Ruben Verborgh,et al.  Comunica: A Modular SPARQL Query Engine for the Web , 2018, SEMWEB.

[63]  Xiaolin Du,et al.  Short Text Classification: A Survey , 2014, J. Multim..

[64]  Gary William Grewal,et al.  Historical Data Integration a Study of WWI Canadian Soldiers , 2016, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[65]  Stefan Decker,et al.  FedViz: A Visual Interface for SPARQL Queries Formulation and Execution , 2015, VOILA@ISWC.

[66]  Jens Lehmann,et al.  Distributed Semantic Analytics Using the SANSA Stack , 2017, SEMWEB.

[67]  Xun Wang,et al.  Real Time Event Detection in Twitter , 2013, WAIM.

[68]  Katarina Boland,et al.  A Digital Library for Research Data and Related Information in the Social Sciences , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[69]  Markus Krötzsch,et al.  Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia's Knowledge Graph , 2018, SEMWEB.

[70]  Ruben Verborgh,et al.  Triple Pattern Fragments: A low-cost knowledge graph interface for the Web , 2016, J. Web Semant..

[71]  Ruben Verborgh,et al.  Declarative Rules for Linked Data Generation at Your Fingertips! , 2018, ESWC.

[72]  Nick Bassiliades,et al.  A Review on Intelligent Object Perception Methods Combining Knowledge-based Reasoning and Machine Learning , 2020, AAAI Spring Symposium: Combining Machine Learning with Knowledge Engineering.

[73]  Séverin Lemaignan,et al.  Artificial cognition for social human-robot interaction: An implementation , 2017, Artif. Intell..

[74]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[75]  Sonia Chernova,et al.  RoboCSE: Robot Common Sense Embedding , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[76]  Till Mossakowski,et al.  Ontohub: A semantic repository engine for heterogeneous ontologies , 2017, Appl. Ontology.

[77]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[78]  Jens Lehmann,et al.  Test-driven evaluation of linked data quality , 2014, WWW.

[79]  Evgeny Kharlamov,et al.  SemFacet: semantic faceted search over yago , 2014, WWW.

[80]  Alvaro Graves,et al.  Creation of visualizations based on linked data , 2013, WIMS '13.

[81]  A. Swartz MusicBrainz: A Semantic Web Service , 2002, IEEE Intell. Syst..

[82]  Jens Lehmann,et al.  DBpedia FlexiFusion the Best of Wikipedia > Wikidata > Your Data , 2019, SEMWEB.

[83]  Heiko Paulheim,et al.  DBkWik: Towards Knowledge Graph Creation from Thousands of Wikis , 2017, International Semantic Web Conference.

[84]  Axel Polleres,et al.  Creating A Vocabulary for Data Privacy , 2019 .

[85]  Hai-Tao Zheng,et al.  An ontology-based approach to Chinese semantic advertising , 2012, Inf. Sci..

[86]  Miguel A. Martínez-Prieto,et al.  Exchange and Consumption of Huge RDF Data , 2012, ESWC.

[87]  Jens Lehmann,et al.  Clustering Pipelines of Large RDF POI Data , 2019, ESWC.

[88]  Jouni Tuominen,et al.  WarSampo knowledge graph: Finland in the Second World War as Linked Open Data , 2021, Semantic Web.

[89]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[90]  Maribel Acosta,et al.  Crowdsourcing Linked Data Quality Assessment , 2013, SEMWEB.

[91]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[92]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[93]  Sanne Kruikemeier,et al.  Behavioral Advertising : A Literature Review and Research Agenda , 2017 .

[94]  Julian Eggert,et al.  Which tool to use? Grounded reasoning in everyday environments with assistant robots , 2018, CogRob@KR.

[95]  Jeff Z. Pan,et al.  Semantic Advertising for Web 3.0 , 2009, FIS.

[96]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[97]  Mathieu d'Aquin,et al.  Where to publish and find ontologies? A survey of ontology libraries , 2012, J. Web Semant..

[98]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[99]  Gunnar Thorvaldsen,et al.  Record Linkage in the Historical Population Register for Norway , 2015, Population Reconstruction.

[100]  Tanja Friedrich,et al.  The Ofness and Aboutness of Survey Data: Improved Indexing of Social Science Questionnaires , 2014, ECDA.

[101]  Paul R. Smart,et al.  NITELIGHT: A Graphical Editor for SPARQL Queries , 2008, SEMWEB.

[102]  Jindong Chen,et al.  Deep Short Text Classification with Knowledge Powered Attention , 2019, AAAI.

[103]  Jens Lehmann,et al.  Big POI data integration with Linked Data technologies , 2019, EDBT.

[104]  Pierre Genevès,et al.  SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark , 2016, International Semantic Web Conference.

[105]  Lifang Gu,et al.  Record Linkage: Current Practice and Future Directions , 2003 .

[106]  Joachim Wackerow,et al.  DDI-RDF Discovery Vocabulary: A Metadata Vocabulary for Documenting Research and Survey Data , 2013, LDOW.

[107]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[108]  Maarten van Someren,et al.  The Think Aloud Method: A Practical Guide to Modelling Cognitive Processes , 1994 .

[109]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[110]  Jens Lehmann,et al.  The Hubs and Authorities Transaction Network Analysis using the SANSA framework , 2019, SEMANTiCS.

[111]  Michael Beetz,et al.  Know Rob 2.0 — A 2nd Generation Knowledge Processing Framework for Cognition-Enabled Robotic Agents , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[112]  Dave Kolas,et al.  GeoSPARQL : Enabling a Geospatial Semantic Web , 2011 .

[113]  Ian Horrocks,et al.  OptiqueVQS: A visual query system over ontologies for industry , 2018, Semantic Web.