GIANT: Scalable Creation of a Web-scale Ontology

Understanding what online users may pay attention to on the web is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online users. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from the vast volume of web documents and search click logs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our detailed techniques used in GIANT, and evaluate the proposed models and methods as compared to a variety of baselines, as well as deploy the resulted Attention Ontology in real-world applications, involving over a billion users, to observe its effect on content recommendation. The online performance of the ontology built by GIANT proves that it can significantly improve the click-through rate in news feeds recommendation.

[1]  Dennis V. Lindley,et al.  Principles of Random Walk. , 1965 .

[2]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[3]  Mihai Surdeanu,et al.  A Domain-independent Rule-based Framework for Event Extraction , 2015, ACL.

[4]  Bin Liang,et al.  CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System , 2017, IEA/AIE.

[5]  Joseph A. Konstan,et al.  Introduction to recommender systems , 2008, SIGMOD Conference.

[6]  Xiang Zhang,et al.  Automatically Labeled Data Generation for Large Scale Event Extraction , 2017, ACL.

[7]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[8]  Jun'ichi Tsujii,et al.  Event Extraction with Complex Event Classification Using Rich Features , 2010, J. Bioinform. Comput. Biol..

[9]  Jun Zhao,et al.  Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks , 2015, ACL.

[10]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[11]  Christopher De Sa,et al.  DeepDive: Declarative Knowledge Base Construction , 2016, SGMD.

[12]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[13]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[14]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[15]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[16]  Dan Roth,et al.  The Use of Classifiers in Sequential Inference , 2001, NIPS.

[17]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Ralph Grishman,et al.  Modeling Skip-Grams for Event Detection with Convolutional Neural Networks , 2016, EMNLP.

[20]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[21]  Rahul Gupta,et al.  Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[22]  Jing Li,et al.  Directional Skip-Gram: Explicitly Distinguishing Left and Right Context for Word Embeddings , 2018, NAACL.

[23]  Benjamin Van Durme,et al.  Weakly-Supervised Acquisition of Open-Domain Classes and Class Attributes from Web Documents and Query Logs , 2008, ACL.

[24]  KhreichWael,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, CI 2015.

[25]  Kunfeng Lai,et al.  A User-Centered Concept Mining System for Query and Document Understanding at Tencent , 2019, KDD.

[26]  Nick Craswell,et al.  Learning to Match using Local and Distributed Representations of Text for Web Search , 2016, WWW.

[27]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[28]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[29]  Ralph Grishman,et al.  Joint Event Extraction via Recurrent Neural Networks , 2016, NAACL.

[30]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[31]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[32]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[33]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[34]  Hsin-Hsi Chen,et al.  Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and its Automatic Evaluation , 1994, ACL.

[35]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[36]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[37]  Heng Ji,et al.  CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases , 2016, WWW.

[38]  Ziqi Zhang,et al.  A Comparative Evaluation of Term Recognition Algorithms , 2008, LREC.

[39]  Ellen Riloff,et al.  Bootstrapped Training of Event Extraction Classifiers , 2012, EACL.

[40]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[41]  Ralph Grishman,et al.  NYU's English ACE 2005 System Description , 2005 .

[42]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[43]  Philippe Cudré-Mauroux,et al.  Relation Extraction Using Distant Supervision , 2018, ACM Comput. Surv..

[44]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[45]  Branimir Boguraev,et al.  Automatic Glossary Extraction: Beyond Terminology Identification , 2002, COLING.

[46]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[47]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[48]  Tom M. Mitchell,et al.  Joint Extraction of Events and Entities within a Document Context , 2016, NAACL.

[49]  Aditya G. Parameswaran,et al.  Towards the web of concepts , 2010, Proc. VLDB Endow..

[50]  Schubert Foo,et al.  Design and Usability of Digital Libraries: Case Studies in the Asia Pacific , 2004 .

[51]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[52]  Georgia Koutrika,et al.  Modern Recommender Systems: from Computing Matrices to Thinking with Neurons , 2018, SIGMOD Conference.

[53]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[54]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[55]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[56]  Angli Liu,et al.  Effective Crowd Annotation for Relation Extraction , 2016, NAACL.

[57]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[58]  Zhifang Sui,et al.  Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction , 2018, AAAI.

[59]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[60]  Jun Zhao,et al.  Leveraging FrameNet to Improve Automatic Event Detection , 2016, ACL.

[61]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[62]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[63]  Oren Etzioni,et al.  Open domain event extraction from twitter , 2012, KDD.

[64]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[65]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[66]  João Gama,et al.  Online Social Networks Event Detection: A Survey , 2016, Solving Large Scale Learning Tasks.

[67]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing , 2011, ACL.

[68]  Naoaki Okazaki,et al.  Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web , 2009, ACL.

[69]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[70]  Keld Helsgaun,et al.  An effective implementation of the Lin-Kernighan traveling salesman heuristic , 2000, Eur. J. Oper. Res..

[71]  Xiao Liu,et al.  Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation , 2018, EMNLP.

[72]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[73]  Yu Xu,et al.  Growing Story Forest Online from Massive Breaking News , 2017, CIKM.

[74]  Pushpak Bhattacharyya,et al.  Relation Extraction : A Survey , 2017, ArXiv.

[75]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[76]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[77]  Katerina T. Frantzi,et al.  Automatic recognition of multi-word terms , 1998 .

[78]  Dong-Hong Ji,et al.  Relation Extraction Using Label Propagation Based Semi-Supervised Learning , 2006, ACL.

[79]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[80]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.