Unsupervised Biographical Event Extraction Using Wikipedia Traffic

“What is Julian Assange known for?” Can we define his importance in terms of key events? • Some people are famous for what they did, rather than what they are • Julian Assange founded WikiLeaks in 2006 and In 2012 he was granted political asylum by Ecuador are sentences which quickly convey Assange’s notability • Current way of extracting these events→ need to create a huge data set to train on! → $$$ •Many ways someone can be famous → lots of data • Spikes in Wikipedia page traffic → something interesting happened • Find spikes→ find edit to page at the same time → match sentence in current-day article to edit • No annotation needed! “Obama was re-elected president in November 2012..” “On November 4, Obama won the presidency..”

[1]  Julia Hirschberg,et al.  An Unsupervised Approach to Biography Production Using Wikipedia , 2008, ACL.

[2]  Elif Yamangil,et al.  Mining Wikipedia Revision Histories for Improving Sentence Compression , 2008, ACL.

[3]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[4]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[5]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[6]  Liang Zhou,et al.  Multi-Document Biography Summarization , 2005, EMNLP.

[7]  Cristina Ribeiro,et al.  WikiChanges: exposing Wikipedia revision activity , 2008, Int. Sym. Wikis.

[8]  Vasileios Hatzivassiloglou,et al.  Event-Based Extractive Summarization , 2004 .

[9]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[10]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Eduard Hovy,et al.  Assigning Time-Stamps to Event-Clauses , 2001, The Language of Time - A Reader.

[13]  Sergey Brin,et al.  Extracting Patterns and Relations from the World Wide Web , 1998, WebDB.

[14]  Scott B. Wilson,et al.  Spike detection: a review and comparison of algorithms , 2002, Clinical Neurophysiology.

[15]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[16]  Nenghai Yu,et al.  BioSnowball: automated population of Wikis , 2010, KDD '10.

[17]  Joel Nothman,et al.  Grounding event references in news , 2013 .

[18]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[19]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .