论文信息 - High Value Media Monitoring With Machine Learning

High Value Media Monitoring With Machine Learning

The Gorkana Group provides high quality media monitoring services to its clients. This paper describes an ongoing project aimed at increasing the amount of automation in Gorkana Group’s workflow through the application of machine learning and language processing technologies. It is important that Gorkana Group’s clients should have a very high level of confidence, that, if an article is relevant to one of their briefs, then they will be shown the article. However, delivering this high-quality media monitoring service means that humans are required to read through very large quantities of data, only a small portion of which is typically deemed relevant. The challenge being addressed by the work reported in this paper is how to efficiently achieve such high-quality media monitoring in the face of huge increases in the amount of the data that needs to be monitored. We show that, while machine learning can be applied successfully to this real world business problem, the constraints of the task give rise to a number of interesting challenges.

David J. Weir | Daoud Clarke | Jeremy Reffin | Hamish Morgan | Matti Lyra

[1] Daoud Clarke,et al. Developing Robust Models for Favourability Analysis , 2011, WASSA@ACL.

[2] Patricio Martínez-Barco,et al. Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011) , 2011, WASSA@ACL.

[3] William A. Gale,et al. Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[4] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5] Dunja Mladenic,et al. Feature Subset Selection in Text-Learning , 1998, ECML.

[6] George Forman,et al. An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[7] CHENGXIANG ZHAI,et al. A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[8] I. Good. THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[9] Kenneth Ward Church,et al. - 1-What ’ s Wrong with Adding One ? , 1994 .

[10] Céline Rouveirol,et al. Machine Learning: ECML-98 , 1998, Lecture Notes in Computer Science.

[11] Hinrich Schütze,et al. Introduction to information retrieval , 2008 .

[12] Yiming Yang,et al. High-performing feature selection for text classification , 2002, CIKM '02.

[13] Austen Rainer,et al. Selecting Features in Origin Analysis , 2010, SGAI Conf..

[14] Stan Matwin,et al. Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[15] Huan Liu,et al. Bias analysis in text classification for highly skewed data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).