Mining Structures of Factual Knowledge from Text: An Effort-Light Approach

Abstract Real-world data, though massive, is largely unstructured in the form of natural-language text. It is challenging but highly desirable to mine structures from massive text data, without ext...

[1]  Christopher De Sa,et al.  Incremental Knowledge Base Construction Using DeepDive , 2015, Proceedings of the VLDB Endowment International Conference on Very Large Data Bases.

[2]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[5]  Sujith Ravi,et al.  Using structured text for large-scale attribute extraction , 2008, CIKM '08.

[6]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[7]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[8]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[9]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[10]  Heng Ji,et al.  Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems , 2017, Big Data.

[11]  Pu-Jen Cheng,et al.  Entity-driven Type Hierarchy Construction for Freebase , 2015, WWW.

[12]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[13]  Rahul Gupta,et al.  Biperpedia: An Ontology for Search Applications , 2014, Proc. VLDB Endow..

[14]  Heng Ji,et al.  Label Noise Reduction in Entity Typing by Heterogeneous Partial-Label Embedding , 2016, KDD.

[15]  Jiawei Han,et al.  Mining Quality Phrases from Massive Text Corpora , 2015, SIGMOD Conference.

[16]  Clare R. Voss,et al.  ClusType: Effective Entity Recognition and Typing by Relation Phrase-Based Clustering , 2015, KDD.

[17]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[18]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[20]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[21]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[22]  Jiawei Han,et al.  FacetGist: Collective Extraction of Document Facets in Large Technical Corpora , 2016, CIKM.

[23]  Wei Shen,et al.  A graph-based approach for ontology population with named entities , 2012, CIKM '12.

[24]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[25]  Changki Lee,et al.  Fine-grained named entity recognition and relation extraction for question answering , 2007, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[26]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[27]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.

[28]  Frank Harary,et al.  A Procedure for Clique Detection Using the Group Matrix , 1957 .

[29]  Heng Ji,et al.  CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases , 2016, WWW.

[30]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[31]  Clare R. Voss,et al.  Scalable Topical Phrase Mining from Text Corpora , 2014, Proc. VLDB Endow..

[32]  Rayid Ghani,et al.  Text mining for product attribute extraction , 2006, SKDD.

[33]  Fang Kong,et al.  Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy , 2009, EMNLP.

[34]  Rich Caruana,et al.  Classification with partial labels , 2008, KDD.

[35]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[36]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[37]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[38]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[39]  Xiang Ren,et al.  Automatic Entity Recognition and Typing in Massive Text Corpora , 2016, WWW.

[40]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[41]  Ariel Fuxman,et al.  Matching unstructured product offers to structured product specifications , 2011, KDD.

[42]  Jiawei Han,et al.  Automatic Synonym Discovery with Knowledge Bases , 2017, KDD.

[43]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[44]  Xiaojun Wan,et al.  Comparative news summarization using concept-based optimization , 2012, Knowledge and Information Systems.

[45]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[46]  Yeye He,et al.  SEISA: set expansion by iterative similarity aggregation , 2011, WWW.

[47]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[48]  Gang Wang,et al.  RC-NET: A General Framework for Incorporating Knowledge into Word Representations , 2014, CIKM.

[49]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[50]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  M. de Rijke,et al.  Learning Latent Vector Spaces for Product Search , 2016, CIKM.

[53]  Marcel Worring,et al.  Unsupervised, Efficient and Semantic Expertise Retrieval , 2016, WWW.

[54]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[55]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[56]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[57]  Ming-Wei Chang,et al.  Open Domain Question Answering via Semantic Enrichment , 2015, WWW.

[58]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[59]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[60]  Jimmy J. Lin,et al.  Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks , 2016, CIKM.

[61]  Ryan T. McDonald,et al.  Contrastive Summarization: An Experiment with Consumer Reviews , 2009, NAACL.

[62]  Jiawei Han,et al.  Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions , 2015, IEEE Transactions on Knowledge and Data Engineering.

[63]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[64]  Gerhard Weikum,et al.  Scalable knowledge harvesting with high precision and high recall , 2011, WSDM '11.

[65]  Fabian M. Suchanek,et al.  Canonicalizing Open Knowledge Bases , 2014, CIKM.

[66]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[67]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[68]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[69]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[70]  Jiawei Han,et al.  ClusCite: effective citation recommendation by information network-based clustering , 2014, KDD.

[71]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[72]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[73]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[74]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[75]  Yizhou Sun,et al.  Personalized entity recommendation: a heterogeneous information network approach , 2014, WSDM.

[76]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.