论文信息 - Organizing the Web's Information Explosion to Discover Unknown Unknowns - 字舞流文

Organizing the Web's Information Explosion to Discover Unknown Unknowns

This paper introduces the TORISHIKI-KAI project, which aims to construct a million-word-scale semantic network from the Web using state of the art knowledge acquisition methods. The resulting network can be browsed as a Web search directory, and we show that the directory is useful for finding “unknown unknowns” — in the infamous words of D.H. Rumsfeld: things “we don't know we don't know.” Because typically we have no way to look for information we don't even know is missing, a crucial characteristic of unknown unknowns is that they are very difficult to discover through keyword-based Web search. Some examples of the unknown unknowns we have found include unexpected troubles associated with commercial products, surprising new combinations of ingredients in new recipes, unexpected tools or methods for commiting suicide, and so on. We expect such information to be useful for risk management, innovation support, and the detection of harmful information on the Web.

Yasunori Kakizawa | Masaki Murata | Kentaro Torisawa | Stijn De Saeger | Ichiro Yamada | Jun'ichi Kazama | Asuka Sumida | Kow Kuroda | Daisuke Noguchi | Kentaro Torisawa | M. Murata | Jun'ichi Kazama | Kow Kuroda | Ichiro Yamada | Yasunori Kakizawa | Asuka Sumida | Daisuke Noguchi

[1] Ellen Riloff,et al. Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[2] S J Alger. Looking for trouble. , 1986, Group practice journal.

[3] Simone Paolo Ponzetto,et al. Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[4] Ido Dagan,et al. Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[5] Daniel Jurafsky,et al. Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[6] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[7] Jong-Hoon Oh,et al. Bilingual Co-Training for Monolingual Hyponymy-Relation Acquisition , 2009, ACL.

[8] Kentaro Torisawa,et al. Boosting Precision and Recall of Hyponymy Relation Acquisition from Hierarchical Layouts in Wikipedia , 2008, LREC.

[9] Gerhard Weikum,et al. WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[10] Olfa Nasraoui,et al. Mining search engine query logs for query recommendation , 2006, WWW '06.

[11] Marius Pasca,et al. Acquisition of categorized named entities for web search , 2004, CIKM '04.

[12] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13] Doug Downey,et al. Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[14] Kentaro Torisawa,et al. Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[15] Avrim Blum,et al. The Bottleneck , 2021, Monopsony Capitalism.

[16] Kentaro Torisawa,et al. Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[17] Patrick Pantel,et al. The Domain Restriction Hypothesis: Relating Term Similarity and Semantic Consistency , 2007, NAACL.

[18] Ellen Riloff,et al. Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs , 2008, ACL.

[19] Ricardo A. Baeza-Yates,et al. Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[20] Daisuke Kawahara,et al. TSUBAKI: An Open Search Engine Infrastructure for Developing New Information Access Methodology , 2008, IJCNLP.

[21] Kentaro Torisawa. An Unsupervised Method for Canonicalization of Japanese Postpositions , 2001, NLPRS.

[22] Patrick Pantel,et al. Automatically Labeling Semantic Classes , 2004, NAACL.

[23] Yuji Matsumoto,et al. Two-Phased Event Relation Acquisition: Coupling the Relation-Oriented and Argument-Oriented Approaches , 2008, COLING.

[24] Masaki Murata,et al. Hypernym Discovery Based on Distributional Similarity and Hierarchical Structures , 2009, EMNLP.

[25] 石崎俊,et al. Automatic Extraction of Hyponyms from Newspaper Using Lexicosyntactic Patterns , 2003 .

[26] Masaki Murata,et al. Large Scale Relation Acquisition Using Class Dependent Patterns , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[27] Kentaro Torisawa,et al. Exploiting Wikipedia as External Knowledge for Named Entity Recognition , 2007, EMNLP.

[28] Kentaro Torisawa. Automatic Acquisition of Expressions Representing Preparation and Utilization of an Object Kentaro Torisawa Japan , 2005 .

[29] Qing Zeng-Treitler,et al. Research Paper: Assisting Consumer Health Information Retrieval with Query Recommendations , 2006, J. Am. Medical Informatics Assoc..

[30] Sharon A. Caraballo. Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[31] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32] Patrick Pantel,et al. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[33] Zellig S. Harris,et al. Distributional Structure , 1954 .