Recognizing Extended Spatiotemporal Expressions by Actively Trained Average Perceptron Ensembles

Precise geocoding and time normalization for text requires that location and time phrases be identified. Many state-of-the-art geoparsers and temporal parsers suffer from low recall. Categories commonly missed by parsers are: nouns used in a non- spatiotemporal sense, adjectival and adverbial phrases, prepositional phrases, and numerical phrases. We collected and annotated data set by querying commercial web searches API with such spatiotemporal expressions as were missed by state-of-the- art parsers. Due to the high cost of sentence annotation, active learning was used to label training data, and a new strategy was designed to better select training examples to reduce labeling cost. For the learning algorithm, we applied an average perceptron trained Featurized Hidden Markov Model (FHMM). Five FHMM instances were used to create an ensemble, with the output phrase selected by voting. Our ensemble model was tested on a range of sequential labeling tasks, and has shown competitive performance. Our contributions include (1) an new dataset annotated with named entities and expanded spatiotemporal expressions; (2) a comparison of inference algorithms for ensemble models showing the superior accuracy of Belief Propagation over Viterbi Decoding; (3) a new example re-weighting method for active ensemble learning that 'memorizes' the latest examples trained; (4) a spatiotemporal parser that jointly recognizes expanded spatiotemporal expressions as well as named entities.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[3]  Tiejun Zhao,et al.  Biomedical Named Entity Recognition Based on Classifiers Ensemble , 2008, Int. J. Comput. Sci. Appl..

[4]  Andrew McCallum,et al.  Reducing Labeling Effort for Structured Prediction Tasks , 2005, AAAI.

[5]  Gernot A. Fink,et al.  Active Learning of Ensemble Classifiers for Gesture Recognition , 2012, DAGM/OAGM Symposium.

[6]  Raymond J. Mooney,et al.  Diverse ensembles for active learning , 2004, ICML.

[7]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[11]  Christian Kohlschein An introduction to Hidden Markov Models , 2007 .

[12]  Fredric C. Gey,et al.  The crucial role of semantic discovery and markup in geo-temporal search , 2010, ESAIR '10.

[13]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[14]  Byoung-Tak Zhang,et al.  Ensemble Learning with Active Example Selection for Imbalanced Biomedical Data Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Xindong Wu,et al.  Active Learning through Adaptive Heterogeneous Ensembling , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[17]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[18]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[19]  Chenliang Li,et al.  Fine-grained location extraction from tweets with temporal awareness , 2014, SIGIR.

[20]  Sven Hartrumpf,et al.  Interpretation and Normalization of Temporal Expressions for Question Answering , 2006, CLEF.

[21]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[22]  Jüri Lember,et al.  Bridging Viterbi and posterior decoding: a generalized risk approach to hidden path inference based on hidden Markov models , 2014, J. Mach. Learn. Res..

[23]  Frank Schilder,et al.  From Temporal Expressions To Temporal Information: Semantic Tagging Of News Messages , 2001, The Language of Time - A Reader.

[24]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[25]  Yecheng Yuan Extracting spatial relations from document for geographic information retrieval , 2011, 2011 19th International Conference on Geoinformatics.

[26]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Bernhard Schölkopf,et al.  Regularization Networks and Support Vector Machines , 2000 .

[29]  Rüdiger L. Urbanke,et al.  The capacity of low-density parity-check codes under message-passing decoding , 2001, IEEE Trans. Inf. Theory.

[30]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[31]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[32]  Hong Shen,et al.  Voting Between Multiple Data Representations for Text Chunking , 2005, Canadian AI.

[33]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[34]  Malvina Nissim,et al.  Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web , 2004, NLPBA/BioNLP.

[35]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[36]  Maksim Tkatchenko,et al.  Named entity recognition: Exploring features , 2012, KONVENS.

[37]  Joachim Hagenauer,et al.  A Viterbi algorithm with soft-decision outputs and its applications , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[38]  Paloma Martínez,et al.  Temporal Semantics Extraction for Improving Web Search , 2009, 2009 20th International Workshop on Database and Expert Systems Application.

[39]  Raymond J. Mooney,et al.  Creating diversity in ensembles using artificial data , 2005, Inf. Fusion.

[40]  Angel X. Chang,et al.  SUTime: A library for recognizing and normalizing time expressions , 2012, LREC.

[41]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Sven Hartrumpf,et al.  GikiP at GeoCLEF 2008: Joining GIR and QA Forces for Querying Wikipedia , 2008, CLEF.

[43]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[44]  Christopher D. Manning,et al.  Joint Parsing and Named Entity Recognition , 2009, NAACL.

[45]  Diego Marcheggiani,et al.  An Experimental Comparison of Active Learning Strategies for Partially Labeled Sequences , 2014, EMNLP.

[46]  Lutz Prechelt,et al.  Automatic early stopping using cross validation: quantifying the criteria , 1998, Neural Networks.

[47]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[48]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[49]  Xu Sun,et al.  Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference , 2008, COLING.