Context-aware Entity Morph Decoding

People create morphs, a special type of fake alternative names, to achieve certain communication goals such as expressing strong sentiment or evading censors. For example, “Black Mamba”, the name for a highly venomous snake, is a morph that Kobe Bryant created for himself due to his agility and aggressiveness in playing basketball games. This paper presents the first end-to-end context-aware entity morph decoding system that can automatically identify, disambiguate, verify morph mentions based on specific contexts, and resolve them to target entities. Our approach is based on an absolute “cold-start” it does not require any candidate morph or target entity lists as input, nor any manually constructed morph-target pairs for training. We design a semi-supervised collective inference framework for morph mention extraction, and compare various deep learning based approaches for morph resolution. Our approach achieved significant improvement over the state-of-the-art method (Huang et al., 2013), which used a large amount of training data. 1

[1]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[4]  Sampo Pyysalo,et al.  Open-domain Anatomical Entity Mention Detection , 2012, ACL 2012.

[5]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[6]  Rada Mihalcea,et al.  Using Wikipedia for Automatic Word Sense Disambiguation , 2007, NAACL.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Heng Ji,et al.  Be Appropriate and Funny: Automatic Entity Morph Encoding , 2014, ACL.

[9]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[10]  Qun Liu,et al.  HHMM-based Chinese Lexical Analyzer ICTCLAS , 2003, SIGHAN.

[11]  Heng Ji,et al.  Incremental Joint Extraction of Entity Mentions and Relations , 2014, ACL.

[12]  Heng Ji,et al.  Collective Tweet Wikification based on Semi-supervised Graph Regularization , 2014, ACL.

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Shiwen Yu,et al.  Chinese Noun Phrase Metaphor Recognition with Maximum Entropy Approach , 2006, CICLing.

[15]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[16]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[17]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[18]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[19]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[22]  Yulia Tsvetkov,et al.  Cross-Lingual Metaphor Detection Using Common Semantic Features , 2013 .

[23]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[24]  Ralph Weischedel,et al.  Automatic Extraction of Linguistic Metaphors with LDA Topic Modeling , 2013 .

[25]  Dong-Hong Ji,et al.  Relation Extraction Using Label Propagation Based Semi-Supervised Learning , 2006, ACL.

[26]  Heng Ji,et al.  Joint bilingual name tagging for parallel corpora , 2012, CIKM '12.

[27]  Christopher Meek,et al.  Semantic Parsing for Single-Relation Question Answering , 2014, ACL.

[28]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[29]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[30]  Hae-Chang Rim,et al.  Joint Relational Embeddings for Knowledge-based Question Answering , 2014, EMNLP.

[31]  Imed Zitouni,et al.  Mention Detection Crossing the Language Barrier , 2008, EMNLP.

[32]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[33]  Alva Erwin,et al.  Analysis of Machine learning Techniques Used in Behavior-Based Malware Detection , 2010, 2010 Second International Conference on Advances in Computing, Control, and Telecommunication Technologies.

[34]  Heng Ji,et al.  Resolving Entity Morphs in Censored Data , 2013, ACL.

[35]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[36]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[37]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[38]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[39]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..