Shallow and deep learning for event relatedness classification

Abstract In the two recent decades various security authorities around the world acknowledged the importance of exploiting the ever-growing amount of information published on the web on various types of events for early detection of certain threats, situation monitoring and risk analysis. Since the information related to a particular real-world event might be scattered across various sources and mentioned on different dates, an important task is to link together all event mentions that are interrelated. This article studies the application of various statistical and machine learning techniques to solve a new application-oriented variation of the task of event pair relatedness classification, which merges different fine-grained event relation types reported elsewhere into one concept. The task focuses on linking event templates automatically extracted from online news by an existing event extraction system, which contain only short text snippets, and potentially erroneous and incomplete information. Results of exploring the performance of shallow learning methods such as decision tree-based random forest and gradient boosted tree ensembles (XGBoost) along with kernel-based support vector machines (SVM) are presented in comparison to both simpler shallow learners as well as a deep learning approach based on long short-term memory (LSTM) recurrent neural network. Our experiments focus on using linguistically lightweight features (some of which not reported elsewhere) which are easily portable across languages. We obtained F1 scores ranging from 92% (simplest shallow learner) to 96.4% (LSTM-based recurrent neural network) evaluated on a newly created event linking corpus.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[3]  Heng Ji,et al.  Cross-document Event Extraction and Tracking: Task, Evaluation, Techniques and Challenges , 2009, RANLP.

[4]  Jing Hu,et al.  Massive Media Event Data Analysis to Assess World-Wide Political Conflict and Instability , 2013, SBP.

[5]  Sule Yildirim Yayilgan,et al.  The impact of deep learning on document classification using semantically rich representations , 2019, Inf. Process. Manag..

[6]  Marko Grobelnik,et al.  News Across Languages - Cross-Lingual Document Similarity and Event Tracking , 2015, J. Artif. Intell. Res..

[7]  E. Marcotte,et al.  A flaw in the typical evaluation scheme for pair-input computational predictions , 2012, Nature Methods.

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  Eneko Agirre,et al.  *SEM 2013 shared task: Semantic Textual Similarity , 2013, *SEMEVAL.

[10]  Romaric Besançon,et al.  A Dataset for Open Event Extraction in English , 2016, LREC.

[11]  Jakub Piskorski,et al.  Multilingual Real-time Event Extraction for Border Security Intelligence Gathering , 2011, Counterterrorism and Open Source Intelligence.

[12]  James Pustejovsky,et al.  FactBank: a corpus annotated with event factuality , 2009, Lang. Resour. Evaluation.

[13]  Sanda M. Harabagiu,et al.  Unsupervised Event Coreference Resolution with Rich Linguistic Features , 2010, ACL.

[14]  Hans Uszkoreit,et al.  Event Linking with Sentential Features from Convolutional Neural Networks , 2016, CoNLL.

[15]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[16]  Jürgen Schmidhuber,et al.  LSTM recurrent networks learn simple context-free and context-sensitive languages , 2001, IEEE Trans. Neural Networks.

[17]  Marti A. Hearst,et al.  newsLens: building and visualizing long-ranging news stories , 2017, NEWS@ACL.

[18]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[19]  Bernard De Baets,et al.  A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression , 2018, Neural Computation.

[20]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[21]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[22]  Antske Fokkens,et al.  NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news , 2016, Knowl. Based Syst..

[23]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[24]  Mohak Shah,et al.  Hold-out Risk Bounds for Classifier Performance Evaluation , 2009 .

[25]  Jürgen Schmidhuber,et al.  Learning Nonregular Languages: A Comparison of Simple Recurrent Networks and LSTM , 2002, Neural Computation.

[26]  W. C. Guenther,et al.  Analysis of variance , 1968, The Mathematical Gazette.

[27]  Ann Bies,et al.  Cross-Document, Cross-Language Event Coreference Annotation Using Event Hoppers , 2018, LREC.

[28]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[29]  Ruihong Huang,et al.  Improving Event Coreference Resolution by Modeling Correlations between Event Coreference Chains and Document Topic Structures , 2018, ACL.

[30]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[31]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[32]  J. Friedman Multivariate adaptive regression splines , 1990 .

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  Elena Lloret,et al.  NATSUM: Narrative abstractive summarization through cross-document timeline generation , 2019, Inf. Process. Manag..

[35]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[36]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[37]  Heng Ji,et al.  Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media , 2013, ACL.

[38]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[39]  Piek T. J. M. Vossen,et al.  Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution , 2014, LREC.

[40]  Jakub Piskorski,et al.  On Training Classifiers for Linking Event Templates , 2018, EventStory@Coling.

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Tommaso Caselli,et al.  The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction , 2017, NEWS@ACL.

[43]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[44]  Heng Ji,et al.  Building a Cross-document Event-Event Relation Corpus , 2016, LAW@ACL.

[45]  Tom M. Mitchell,et al.  Weakly Supervised Extraction of Computer Security Events from Twitter , 2015, WWW.

[46]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[47]  Joel Nothman,et al.  Event Linking: Grounding Event Reference in a News Archive , 2012, ACL.

[48]  Jakub Piskorski,et al.  Real-Time News Event Extraction for Global Crisis Monitoring , 2008, NLDB.

[49]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[50]  Lei Gao,et al.  Modeling Document-level Causal Structures for Event Causal Relation Identification , 2019, NAACL.

[51]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[52]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[53]  Ralf Steinberger,et al.  Multi-word Entity Classification in a Highly Multilingual Environment , 2017, MWE@EACL.

[54]  Francesca Mangili,et al.  Should We Really Use Post-Hoc Tests Based on Mean-Ranks? , 2015, J. Mach. Learn. Res..

[55]  Teruko Mitamura,et al.  Overview of TAC KBP 2015 Event Nugget Track , 2015, TAC.

[56]  Ralf Steinberger,et al.  JRC-Names: Multilingual entity name variants and titles as Linked Data , 2016, Semantic Web.

[57]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[58]  Sanda M. Harabagiu,et al.  A Linguistic Resource for Discovering Event Structures and Resolving Event Coreference , 2008, LREC.

[59]  Teruko Mitamura,et al.  Graph Based Decoding for Event Sequencing and Coreference Resolution , 2018, COLING.

[60]  Zenun Kastrati,et al.  Performance analysis of machine learning classifiers on improved concept vector space models , 2019, Future Gener. Comput. Syst..

[61]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[62]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[63]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[64]  W. G. Cochran The comparison of percentages in matched samples. , 1950, Biometrika.

[65]  P. Santhi Thilagam,et al.  Crime base: Towards building a knowledge base for crime entities and their relationships from online news papers , 2019, Inf. Process. Manag..

[66]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[67]  Fabio Crestani,et al.  Event mining and timeliness analysis from heterogeneous news streams , 2019, Inf. Process. Manag..

[68]  Jing Lu,et al.  Improving Event Coreference Resolution by Learning Argument Compatibility from Unlabeled Data , 2019, NAACL.

[69]  Jan Snajder,et al.  TakeLab: Systems for Measuring Semantic Text Similarity , 2012, *SEMEVAL.

[70]  Ido Dagan,et al.  Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution , 2019, ACL.

[71]  R. Tibshirani,et al.  Flexible Discriminant Analysis by Optimal Scoring , 1994 .

[72]  Marcin Sydow,et al.  On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages , 2009, Information Retrieval.

[73]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[74]  Jakub Piskorski,et al.  On the Creation of a Security-Related Event Corpus , 2017, NEWS@ACL.