论文信息 - Tracking Information Flow in Financial Text - 字舞流文

Tracking Information Flow in Financial Text

Information is fundamental to Finance, and understanding how it flows from official sources to news agencies is a central problem. Readers need to digest information rapidly from high volume news feeds, which often contain duplicate and irrelevant stories, to gain a competitive advantage. We propose a text categorisation task over pairs of official announcements and news stories to identify whether the story repeats announcement information and/or adds value. Using features based on the intersection of the texts and relative timing, our system identifies information flow at 89.5% F-score and three types of journalistic contribution at 73.4% to 85.7% Fscore. Evaluation against majority annotator decision performs 13% better than a bag-of-words baseline.

James R. Curran | Will Radford | Ben Hachey | Maria Milosavljevic | J. Curran | Ben Hachey | Maria Milosavljevic | Will Radford

[1] Hal Daumé. Notes on CG and LM-BFGS Optimization of Logistic Regression , 2008 .

[2] Yiming Yang,et al. Topic Detection and Tracking Pilot Study Final Report , 1998 .

[3] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[4] James Allan,et al. Relevance models for topic detection and tracking , 2002 .

[5] Michael J. Wise,et al. YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[6] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[7] Nancy Chinchor,et al. Statistical Significance of MUC-6 Results , 1995, MUC.

[8] John Tait,et al. Karen Spärck Jones , 2008 .

[9] Yorick Wilks,et al. Measuring Text Reuse , 2002, ACL.

[10] Hector Garcia-Molina,et al. Copy detection mechanisms for digital documents , 1995, SIGMOD '95.

[11] W. Bruce Croft,et al. Similarity measures for tracking information flow , 2005, CIKM '05.

[12] Justin Zobel,et al. Methods for Identifying Versioned and Plagiarized Documents , 2003, J. Assoc. Inf. Sci. Technol..

[13] Tibor Kiss,et al. Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[14] Justin Zobel,et al. A Scalable System for Identifying Co-derivative Documents , 2004, SPIRE.

[15] Jon M. Kleinberg,et al. Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[16] Simone Santini,et al. Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Ian Soboroff,et al. Overview of the TREC 2004 Novelty Track , 2004, TREC.

[18] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[19] Karen Spärck Jones. Index term weighting , 1973, Inf. Storage Retr..

[20] Christopher D. Manning,et al. Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[21] Robert L. Mercer,et al. Aligning Sentences in Parallel Corpora , 1991, ACL.

[22] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[23] Eleazar Eskin,et al. Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[24] A. Zaheer,et al. Catching the wave: alertness, responsiveness, and market influence in global electronic networks , 1997 .

[25] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.

[26] Ewan Klein,et al. Natural Language Processing with Python , 2009 .

[27] Ramanathan V. Guha,et al. Information diffusion through blogspace , 2004, WWW '04.