Organizing the Web's Information Explosion to Discover Unknown Unknowns

This paper introduces the TORISHIKI-KAI project, which aims to construct a million-word-scale semantic network from the Web using state of the art knowledge acquisition methods. The resulting network can be browsed as a Web search directory, and we show that the directory is useful for finding “unknown unknowns” — in the infamous words of D.H. Rumsfeld: things “we don't know we don't know.” Because typically we have no way to look for information we don't even know is missing, a crucial characteristic of unknown unknowns is that they are very difficult to discover through keyword-based Web search. Some examples of the unknown unknowns we have found include unexpected troubles associated with commercial products, surprising new combinations of ingredients in new recipes, unexpected tools or methods for commiting suicide, and so on. We expect such information to be useful for risk management, innovation support, and the detection of harmful information on the Web.

[1]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[2]  S J Alger Looking for trouble. , 1986, Group practice journal.

[3]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[4]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[5]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  Jong-Hoon Oh,et al.  Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL.

[8]  Kentaro Torisawa,et al.  Boosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia , 2008, LREC.

[9]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[10]  Olfa Nasraoui,et al.  Mining search engine query logs for query recommendation , 2006, WWW '06.

[11]  Marius Pasca,et al.  Acquisition of categorized named entities for web search , 2004, CIKM '04.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[14]  Kentaro Torisawa,et al.  Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[15]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[16]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[17]  Patrick Pantel,et al.  The Domain Restriction Hypothesis: Relating Term Similarity and Semantic Consistency , 2007, NAACL.

[18]  Ellen Riloff,et al.  Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[19]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[20]  Daisuke Kawahara,et al.  TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology , 2008, IJCNLP.

[21]  Kentaro Torisawa An Unsupervised Method for Canonicalization of Japanese Postpositions , 2001, NLPRS.

[22]  Patrick Pantel,et al.  Automatically Labeling Semantic Classes , 2004, NAACL.

[23]  Yuji Matsumoto,et al.  Two-Phased Event Relation Acquisition: Coupling the Relation-Oriented and Argument-Oriented Approaches , 2008, COLING.

[24]  Masaki Murata,et al.  Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures , 2009, EMNLP.

[25]  石崎 俊,et al.  Automatic Extraction of Hyponyms from Newspaper Using Lexicosyntactic Patterns , 2003 .

[26]  Masaki Murata,et al.  Large Scale Relation Acquisition Using Class Dependent Patterns , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[27]  Kentaro Torisawa,et al.  Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[28]  Kentaro Torisawa Automatic Acquisition of Expressions Representing Preparation and Utilization of an Object Kentaro Torisawa Japan , 2005 .

[29]  Qing Zeng-Treitler,et al.  Research Paper: Assisting Consumer Health Information Retrieval with Query Recommendations , 2006, J. Am. Medical Informatics Assoc..

[30]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[33]  Zellig S. Harris,et al.  Distributional Structure , 1954 .