论文信息 - We’re Not in Kansas Anymore: Detecting Domain Changes in Streams

We’re Not in Kansas Anymore: Detecting Domain Changes in Streams

Domain adaptation, the problem of adapting a natural language processing system trained in one domain to perform well in a different domain, has received significant attention. This paper addresses an important problem for deployed systems that has received little attention - detecting when such adaptation is needed by a system operating in the wild, i.e., performing classification over a stream of unlabeled examples. Our method uses A-distance, a metric for detecting shifts in data streams, combined with classification margins to detect domain shifts. We empirically show effective domain shift detection on a variety of data sets and shift conditions.

[1] George F. Foster,et al. Confidence estimation for NLP applications , 2006, TSLP.

[2] Koby Crammer,et al. Active Learning with Confidence , 2008, ACL.

[3] ChengXiang Zhai,et al. Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[4] Gerhard Widmer,et al. Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[5] Koby Crammer,et al. Analysis of Representations for Domain Adaptation , 2006, NIPS.

[6] Koby Crammer,et al. Confidence-weighted linear classification , 2008, ICML '08.

[7] Andrew McCallum,et al. Confidence Estimation for Information Extraction , 2004, NAACL.

[8] S. Muthukrishnan,et al. Sequential Change Detection on Data Streams , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[9] Maya R. Gupta,et al. Part-of-speech histograms for genre classification of text , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10] S. Muthukrishnan,et al. Data streams: algorithms and applications , 2005, SODA '03.

[11] Daumé,et al. Domain Adaptation meets Active Learning , 2010, HLT-NAACL 2010.

[12] Koby Crammer,et al. Online Methods for Multi-Domain Learning and Adaptation , 2008, EMNLP.

[13] Koichiro Yamauchi,et al. Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[14] Aidan Finn,et al. Learning to classify documents according to genre: Special Topic Section on Computational Analysis of Style , 2006 .

[15] Thorsten Joachims,et al. Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[16] Kyosuke Nishida,et al. Learning and Detecting Concept Drift , 2008 .

[17] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[18] Hinrich Schütze,et al. Automatic Detection of Text Genre , 1997, ACL.

[19] Koby Crammer,et al. Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[20] Christopher D. Manning,et al. Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[21] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[22] Shai Ben-David,et al. Detecting Change in Data Streams , 2004, VLDB.

[23] Bernhard Schölkopf,et al. Support Vector Method for Novelty Detection , 1999, NIPS.

[24] John Blitzer,et al. Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[25] Carsten Lanquillon. Information Filtering in Changing Domains , 1999, IJCAI 1999.

[26] John Blitzer,et al. Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[27] Eugene Charniak,et al. Automatic Domain Adaptation for Parsing , 2010, NAACL.

[28] Sung-Hyon Myaeng,et al. Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[29] Miles Osborne,et al. Streaming First Story Detection with application to Twitter , 2010, NAACL.

[30] Razvan C. Bunescu. Learning with Probabilistic Features for Improved Pipeline Models , 2008, EMNLP.

[31] Carol Van Ess-Dykema,et al. The Form is the Substance: Classification of Genres in Text , 2001, HTLKM@ACL.

[32] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.

[33] Eugene Agichtein. Confidence Estimation Methods for Partially Supervised Information Extraction , 2006, SDM.