论文信息 - Measuring Relevance withNamedEntity BasedPatterns inTopic-Focused DocumentSummarization

Measuring Relevance withNamedEntity BasedPatterns inTopic-Focused DocumentSummarization

summarization approaches arebased onextracting thesalient sentences inthedocuments thatare Inthispaper, theroleofnamedentity supposed toberelevant tothegiven topic. Thereby, based patterns isemphasized inmeasuring themostcritical issue forthemishowtomeasure thedocument sentences andtopic relevance therelevance between thedocument sentences and fortopic-focused extractive summarization. thetopic. Inearlier studies, thesentences are Patterns aredefined astheinformative,represented asbagsofwordslike intheclassical semantic-sensitive text bi-grams consistingvector spacemodel.Thereareatleast two ofatleast onenamedentity orthesemantic drawbacks withit. First, thesingle word(i.e. word class ofanamedentity. Theyareextracteduni-gram) isnotinformative enough torepresent automatically according to eightpre- underlying information inthesentences. For specified templates. Question types arealso example, themeaning oftheresidence ofUS takenintoconsideration iftheyare president is lostwhen "WhiteHouse"is available whendealing withtopic questions.represented by"White" and"House" separately. To alleviate problems withcoverage, Asaresult, namedentities, unlike other words, pattern anduni-gram models areintegratedshould betreated asmeaningful textunits when together tocompensate eachotherin measuring relevance. Second,the ordering similarity calculation. Automatic ROUGE information, especially thesemantic underlying evaluations indicate that theproposed idea information andthesentence structure cannotbe canproduce averygoodsystem that tops taken intoaccount byuni-gram models. N-gram, thebest-performing system atDocument e.g. bi-gram, models provide ameantocapture the Understanding Conference (DUC)2005. structural information bycombining twowordsin thesentence. Butmeanwhile, anyN-grammodel will moreorless suffer fromthebottleneck oflow

Yanxiang He