Semi-supervised Deep Embedded Clustering with Anomaly Detection for Semantic Frame Induction

Although FrameNet is recognized as one of the most fine-grained lexical databases, its coverage of lexical units is still limited. To tackle this issue, we propose a two-step frame induction process: for a set of lexical units not yet present in Berkeley FrameNet data release 1.7, first remove those that cannot fit into any existing semantic frame in FrameNet; then, assign the remaining lexical units to their correct frames. We also present the Semi-supervised Deep Embedded Clustering with Anomaly Detection (SDEC-AD) model—an algorithm that maps high-dimensional contextualized vector representations of lexical units to a low-dimensional latent space for better frame prediction and uses reconstruction error to identify lexical units that cannot evoke frames in FrameNet. SDEC-AD outperforms the state-of-the-art methods in both steps of the frame induction process. Empirical results also show that definitions provide contextual information for representing and characterizing the frame membership of lexical units.

[1]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[2]  Philip Resnik,et al.  Inducing Frame Semantic Verb Classes from WordNet and LDOCE , 2004, ACL.

[3]  Laura Kallmeyer,et al.  SemEval-2019 Task 2: Unsupervised Lexical Frame Induction , 2019, *SEMEVAL.

[4]  Chris Callison-Burch,et al.  FrameNet+: Fast Paraphrastic Tripling of FrameNet , 2015, ACL.

[5]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[6]  Richard Johansson,et al.  Automatic expansion of the Swedish FrameNet lexicon: Comparing and combining lexicon-based and corpus-based methods , 2014 .

[7]  Guntis Barzdins FrameNet CNL: A Knowledge Representation and Information Extraction Language , 2014, CNL.

[8]  Nathan Schneider,et al.  The NLTK FrameNet API: Designing for Discoverability with a Rich Linguistic Resource , 2017, EMNLP.

[9]  Tiago Torrent,et al.  FrameNet-Based Automatic Suggestion of Translation Equivalents , 2016, PROPOR.

[10]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[11]  Caroline Sporleder,et al.  Evaluating FrameNet-style semantic parsing: the role of coverage gaps in FrameNet , 2010, COLING.

[12]  Jan Scheffczyk,et al.  BioFrameNet: A Domain-Specific FrameNet Extension with Links to Biomedical Ontologies , 2006, KR-MED.

[13]  Christian Biemann,et al.  Unsupervised Semantic Frame Induction using Triclustering , 2018, ACL.

[14]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[15]  A G N,et al.  Bibliographical References , 1965 .

[16]  Antoine Cornuéjols,et al.  An initialization scheme for supervized K-means , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[17]  Natália Duarte Marção,et al.  FrameNet-Based Modeling of the Domains of Tourism and Sports for the Development of a Personal Travel Assistant Application , 2018 .

[18]  Alexander Panchenko,et al.  Neural GRANNy at SemEval-2019 Task 2: A combined approach for better modeling of semantic relationships in semantic frame induction , 2019, SemEval@NAACL-HLT.

[19]  Mohammad Teshnehlab,et al.  An anomaly detection method to detect web attacks using Stacked Auto-Encoder , 2018, 2018 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS).

[20]  Ira Assent,et al.  Clustering high dimensional data , 2012 .

[21]  Zenglin Xu,et al.  Semi-supervised deep embedded clustering , 2019, Neurocomputing.

[22]  Sanda M. Harabagiu,et al.  Open Domain Information Extraction via Automatic Semantic Labeling , 2003, FLAIRS Conference.

[23]  Roberto Basili,et al.  Automatic induction of FrameNet lexical units , 2008, EMNLP.

[24]  Siti Zanariah Satari,et al.  Single-linkage method to detect multiple outliers with different outlier scenarios in circular regression model , 2019 .

[25]  Simonetta Montemagni,et al.  Towards a FrameNet Resource for the Legal Domain , 2009 .

[26]  Ricardo Ribeiro,et al.  L2F/INESC-ID at SemEval-2019 Task 2: Unsupervised Lexical Semantic Frame Induction using Contextualized Word Representations , 2019, SemEval@NAACL-HLT.

[27]  Toby P. Breckon,et al.  SMS Spam Filtering Using Probabilistic Topic Modelling and Stacked Denoising Autoencoder , 2016, ICANN.

[28]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[29]  Josef Ruppenhofer,et al.  FrameNet II: Extended theory and practice , 2006 .

[30]  Emanuele Pianta,et al.  A novel approach to mapping FrameNet lexical units to WordNet synsets (short paper) , 2009, IWCS.

[31]  Dmitry Ustalov,et al.  HHMM at SemEval-2019 Task 2: Unsupervised Frame Induction using Contextualized Word Embeddings , 2019, *SEMEVAL.

[32]  Ali A. Minai,et al.  Using Semantic Clustering And Autoencoders For Detecting Novelty In Corpora Of Short Texts , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[33]  Ivan Titov,et al.  Unsupervised Induction of Frame-Semantic Representations , 2012, HLT-NAACL 2012.

[34]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[35]  Chris Callison-Burch,et al.  PPDB: The Paraphrase Database , 2013, NAACL.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Benjamin Van Durme,et al.  Augmenting FrameNet Via PPDB , 2014, EVENTS@ACL.

[38]  Yizhou Sun,et al.  Entity Embedding-Based Anomaly Detection for Heterogeneous Categorical Events , 2016, IJCAI.

[39]  Jansen Orfan Toward Learning High-Level Semantic Frames from Definitions , 2013 .

[40]  Jirí Materna,et al.  LDA-Frames: An Unsupervised Approach to Generating Semantic Frames , 2012, CICLing.

[41]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[42]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.