Making Biographical Data in Wikipedia Readable: A Pattern-based Multilingual Approach

In this paper we present Biografix, a pattern based tool that simplifies parenthetical structures with biographical information, whose aim is to create simple, readable and accessible sentences. To that end, we analysed the parenthetical structures that appear in the first paragraph of the Basque Wikipedia, and concentrated on biographies. Although it has been designed and developed for Basque we adapted it and evaluated with other five languages. We also perform an extrinsic evaluation with a question generation system to see if Biografix improve its results.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Vasileios Hatzivassiloglou,et al.  PROGENIE: Biographical Descriptions for Intelligence Analysis , 2003, ISI.

[3]  Diane Blakemore,et al.  Divisions of labour: The analysis of parentheticals , 2006 .

[4]  Nicole Dehé,et al.  Parentheticals : An introduction , 2007 .

[5]  David Hardcastle,et al.  Automatic Rewriting of Patient Record Narratives , 2008, LREC.

[6]  Renata Pontin de Mattos Fortes,et al.  A corpus analysis of simple account texts and the proposal of simplification strategies: first steps towards text simplification systems , 2008, SIGDOC '08.

[7]  Noah A. Smith,et al.  Extracting Simplified Statements for Factual Question Generation , 2010 .

[8]  Lucia Specia,et al.  Towards an on-demand Simple Portuguese Wikipedia , 2011 .

[9]  Agurtzane Azpeitia Eizagirre Enuntziatu parentetikoak: Koldo Mitxelenaren intentzio ironikoaren ispilu , 2011 .

[10]  Ani Nenkova,et al.  Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries , 2011, CL.

[11]  A. D. Ilarraza,et al.  First Approach to Automatic Text Simplification in Basque Marı́a , 2012 .

[12]  Violeta Seretan Acquisition of Syntactic Simplification Rules for French , 2012, LREC.

[13]  Sanja Stajner,et al.  Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules , 2013, CICLing.

[14]  David Kauchak,et al.  A user-study measuring the effects of lexical simplification and coherence enhancement on perceived and actual text difficulty , 2013, Int. J. Medical Informatics.

[15]  Iñigo Lopez-Gazpio,et al.  Two Approaches to Generate Questions in Basque , 2013, Proces. del Leng. Natural.

[16]  Oliver Ferschke,et al.  What makes a good biography?: multidimensional quality analysis based on wikipedia article feedback data , 2014, WWW.