ACM: Article Content Miner for Assessing the Quality of Scientific Output

This paper presents the Article Content Miner (a.k.a. ACM), i.e., a method for processing the research papers in PDF format available for the 2016 edition of the Semantic Publishing Challenge in order to extract relevant semantic data and publish them in a RDF triplestore according to the Semantic Publishing And Referencing (SPAR) Ontologies (http://www.sparontologies.net). In particular, the extraction of all the information needed for addressing the queries of the second task of the challenge (https://github.com/ceurws/lod/wiki/SemPub16_Task2) is guaranteed by ACM by using techniques based on Natural Language Processing (i.e., Combinatory Categorial Grammar, Discourse Representation Theory, Linguistic Frames), Semantic Web technologies and good Ontology Design practices (i.e., Content Analysis, Ontology Design Patterns, Discourse Referent Extraction and Linking, Topic Extraction).

[1]  Angelo Di Iorio,et al.  Towards the Automatic Identification of the Nature of Citations , 2013, SePublica.

[2]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[3]  Diego Reforgiato Recupero,et al.  A Semantic Web Based Core Engine to Efficiently Perform Sentiment Analysis , 2014, ESWC.

[4]  Rik Van de Walle,et al.  RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data , 2014, LDOW.

[5]  Aldo Gangemi,et al.  Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames , 2012, EKAW.

[6]  Silvio Peroni Semantic Web Technologies and Legal Scholarly Publishing , 2014 .

[7]  Min-Yen Kan,et al.  Logical Structure Recovery in Scholarly Articles with Rich Document Features , 2010, Int. J. Digit. Libr. Syst..

[8]  Fabio Vitali,et al.  Scholarly publishing and linked data: describing roles, statuses, temporal and contextual extents , 2012, I-SEMANTICS '12.

[9]  Enrico Motta,et al.  Watson: a gateway for next generation semantic web applications , 2007 .

[10]  Fabio Vitali,et al.  The Document Components Ontology (DoCO) , 2016, Semantic Web.

[11]  H. Kamp A Theory of Truth and Semantic Representation , 2008 .

[12]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[13]  Andrei Voronkov,et al.  PDFX: fully-automated PDF-to-XML conversion of scientific literature , 2013, ACM Symposium on Document Engineering.

[14]  Angelo Di Iorio,et al.  Recognising document components in XML-based academic articles , 2013, ACM Symposium on Document Engineering.

[15]  Diego Reforgiato Recupero,et al.  Uncovering the Semantics of Wikipedia Pagelinks , 2014, EKAW.

[16]  Andrea Giovanni Nuzzolese,et al.  Automatic Typing of DBpedia Entities , 2012, SEMWEB.

[17]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[18]  Johan Bos,et al.  Wide-Coverage Semantic Analysis with Boxer , 2008, STEP.

[19]  Fabien L. Gandon,et al.  Semantic Web Evaluation Challenges , 2015, Communications in Computer and Information Science.

[20]  Jaime G. Carbonell,et al.  Generation from Abstract Meaning Representation using Tree Transducers , 2016, NAACL.

[21]  David M. Shotton,et al.  Semantic publishing: the coming revolution in scientific journal publishing , 2009, Learn. Publ..

[22]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23]  Dominika Tkaczyk,et al.  CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[24]  Angelo Di Iorio,et al.  Dealing with structural patterns of XML documents , 2014, J. Assoc. Inf. Sci. Technol..

[25]  Peroni Silvio Example of use of FRAPO #1 , 2015 .

[26]  Rik Van de Walle,et al.  Extraction and Semantic Annotation of Workshop Proceedings in HTML Using RML , 2014, SemWebEval@ESWC.

[27]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[28]  Angelo Di Iorio,et al.  Semantic Publishing Challenge - Assessing the Quality of Scientific Output , 2014, SemWebEval@ESWC.

[29]  Andrea Giovanni Nuzzolese,et al.  Describing bibliographic references in RDF , 2014, SePublica.

[30]  Diego Reforgiato Recupero,et al.  Sentilo: Frame-Based Sentiment Analysis , 2014, Cognitive Computation.

[31]  Silvio Peroni,et al.  FaBiO and CiTO: Ontologies for describing bibliographic resources and citations , 2012, J. Web Semant..

[32]  Aldo Gangemi,et al.  A Comparison of Knowledge Extraction Tools for the Semantic Web , 2013, ESWC.

[33]  Diego Reforgiato Recupero,et al.  Frame-Based Detection of Opinion Holders and Topics: A Model and a Tool , 2014, IEEE Computational Intelligence Magazine.