Learning Turkish Hypernymy UsingWord Embeddings

Recently, Neural Network Language Models have been effectively applied to many types of Natural Language Processing (NLP) tasks. One popular type of tasks is the discovery of semantic and syntactic regularities that support the researchers in building a lexicon. Word embedding representations are notably good at discovering such linguistic regularities. We argue that two supervised learning approaches based on word embeddings can be successfully applied to the hypernym problem, namely, utilizing embedding offsets between word pairs and learning semantic projection to link the words. The offset-based model classifies offsets as hypernym or not. The semantic projection approach trains a semantic transformation matrix that ideally maps a hyponym to its hypernym. A semantic projection model can learn a projection matrix provided that there is a sufficient number of training word pairs. However, we argue that such models tend to learn is-a-particular-hypernym relation rather than to generalize is-a relation. The embeddings are trained by applying both the Continuous Bag-of Words and the Skip-Gram training models using a huge corpus in Turkish text. The main contribution of the study is the development of a novel and efficient architecture that is well-suited to applying word embeddings approaches to the Turkish language domain. We report that both the projection and the offset classification models give promising and novel results for the Turkish Language.

[1]  Wanxiang Che,et al.  Revisiting Embedding Features for Simple Semi-supervised Learning , 2014, EMNLP.

[2]  Josef van Genabith,et al.  A Minimally Supervised Approach for Synonym Extraction with Word Embeddings , 2016, Prague Bull. Math. Linguistics.

[3]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[4]  Arzucan Özgür,et al.  Named Entity Recognition on Twitter for Turkish using Semi-supervised Learning with Word Embeddings , 2016, LREC.

[5]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[6]  Izzet Pembeci,et al.  Using Word Embeddings for Ontology Enrichment , 2016 .

[7]  David J. Weir,et al.  Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[8]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[9]  Jianfeng Gao,et al.  Modeling Interestingness with Deep Neural Networks , 2014, EMNLP.

[10]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[11]  Tugba Yildiz,et al.  Automatic Extraction of Turkish Hypernym-Hyponym Pairs From Large Corpus , 2012, COLING.

[12]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[13]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[14]  Ekaterina Vylomova,et al.  Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning , 2015, ACL.

[15]  Qin Lu,et al.  Chasing Hypernyms in Vector Spaces with Entropy , 2014, EACL.

[16]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[17]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[18]  Takaya Saito,et al.  The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets , 2015, PloS one.

[19]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[20]  Makoto Miwa,et al.  Word Embedding-based Antonym Detection using Thesauri and Distributional Information , 2015, NAACL.

[21]  Ted Briscoe,et al.  Looking for Hyponyms in Vector Space , 2014, CoNLL.

[22]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Yue Zhang,et al.  Feature Embedding for Dependency Parsing , 2014, COLING.

[25]  Radu State,et al.  The Challenge of Non-Technical Loss Detection using Artificial Intelligence: A Survey , 2016, Int. J. Comput. Intell. Syst..

[26]  Marie-Catherine de Marneffe,et al.  Deriving Adjectival Scales from Continuous Space Word Representations , 2013, EMNLP.

[27]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[28]  Wanxiang Che,et al.  Learning Semantic Hierarchies via Word Embeddings , 2014, ACL.

[29]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[30]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[31]  Núria Bel,et al.  Reading Between the Lines: Overcoming Data Sparsity for Accurate Classification of Lexical Relationships , 2015, *SEM@NAACL-HLT.

[32]  Hakan Erdogan,et al.  Learning word representations for Turkish , 2014, 2014 22nd Signal Processing and Communications Applications Conference (SIU).

[33]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[34]  Joel Pocostales NUIG-UNLP at SemEval-2016 Task 13: A Simple Word Embedding-based Approach for Taxonomy Extraction , 2016, SemEval@NAACL-HLT.

[35]  Oren Barkan,et al.  ITEM2VEC: Neural item embedding for collaborative filtering , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[36]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[37]  Ido Dagan,et al.  Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[38]  Haixun Wang,et al.  Learning Term Embeddings for Hypernymy Identification , 2015, IJCAI.

[39]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[40]  Josef van Genabith,et al.  USAAR-WLV: Hypernym Generation with Deep Neural Nets , 2015, *SEMEVAL.

[41]  Katrin Erk,et al.  Relations such as Hypernymy: Identifying and Exploiting Hearst Patterns in Distributional Vectors for Lexical Entailment , 2016, EMNLP.

[42]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[43]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[44]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..