Geographically aware Web text mining

Tese de doutoramento em Informatica (Engenharia Informatica), apresentada a Universidade de Lisboa atraves da Faculdade de Ciencias, 2009

[1]  Matthew Richardson,et al.  The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[2]  Ellen M. Voorhees,et al.  The effect of topic set size on retrieval experiment error , 2002, SIGIR '02.

[3]  Marti A. Hearst,et al.  Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[4]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[5]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[6]  Dave Kolas,et al.  Geospatial semantic Web: architecture of ontologies , 2006, 2006 IEEE Aerospace Conference.

[7]  Yahiko Kambayashi,et al.  Models for Conceptual Geographical Prepositions Based on Web Resource , 2001 .

[8]  Gene H. Golub,et al.  Exploiting the Block Structure of the Web for Computing , 2003 .

[9]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[10]  K. Clarke Getting Started with Geographic Information Systems , 1996 .

[11]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[12]  Ellen M. Voorhees,et al.  Evaluating Evaluation Measure Stability , 2000, SIGIR 2000.

[13]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[14]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[15]  Ana Paula Afonso,et al.  Handling Locations in Search Engine Queries , 2006, GIR.

[16]  Keith C. Clarke,et al.  Interactive Visual Exploration of a Large Spatio-temporal Dataset: Reflections on a Geovisualization Mashup. , 2007, IEEE Transactions on Visualization and Computer Graphics.

[17]  Trystan Upstill,et al.  Document ranking using web evidence , 2005 .

[18]  Beth Sundheim Resources to facilitate progress in place name identification and reference resolution , 2002 .

[19]  Xing Xie,et al.  Detecting geographic locations from web resources , 2005, GIR '05.

[20]  Andrew Tomkins,et al.  How to build a WebFountain: An architecture for very large-scale text analytics , 2004, IBM Syst. J..

[21]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[22]  Cheng Niu,et al.  Location Normalization for Information Extraction , 2002, COLING.

[23]  Kalina Bontcheva,et al.  Experiments with geographic knowledge for information extraction , 2003, HLT-NAACL 2003.

[24]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[25]  S. Milgram Psychological maps of Paris , 1976 .

[26]  Marko Grobelnik,et al.  Text mining as integration of several related research areas: report on KDD's workshop on text mining 2000 , 2000, SKDD.

[27]  Harith Alani,et al.  Augmenting Thesaurus Relationships: Possibilities for Retrieval , 2001, J. Digit. Inf..

[28]  Xing Xie,et al.  Hybrid index structures for location-based web search , 2005, CIKM '05.

[29]  Jong-Hyeok Lee,et al.  Text categorization based on k-nearest neighbor approach for Web site classification , 2003, Inf. Process. Manag..

[30]  Alberto H. F. Laender,et al.  The role of gazetteers in geographic knowledge discovery on the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[31]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[32]  Michael Worboys,et al.  Metrics and topologies for geographic space , 2001 .

[33]  Miguel Costa,et al.  Indexação Distribuída de Colecções Web de Larga Escala , 2005 .

[34]  Vibhu O. Mittal,et al.  Stemming and its effects on TFIDF ranking. , 2000, SIGIR 2000.

[35]  George Karypis,et al.  Evaluation of hierarchical clustering algorithms for document datasets , 2002, CIKM '02.

[36]  Kevin S. McCurley,et al.  Geospatial mapping and navigation of the web , 2001, WWW '01.

[37]  David M. Mark,et al.  Naive Geography , 1995, COSIT.

[38]  Robert Lloyd,et al.  Systematic Distortions in Urban Cognitive Maps , 1987 .

[39]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[40]  Ingemar J. Cox,et al.  A comparison of dimensionality reduction techniques for text retrieval , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[41]  Rick Bennett,et al.  Trends in the Evolution of the Public Web: 1998 - 2002 , 2003, D Lib Mag..

[42]  Diana Santos,et al.  Reconhecimento de entidades mencionadas em português: Documentação e actas do HAREM, a primeira avaliação conjunta na área , 2007 .

[43]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[44]  Hideo Joho,et al.  Judging the Spatial Relevance of Documents for GIR , 2006, ECIR.

[45]  Paul Clough,et al.  GEOGRAPHIC IR SYSTEMS: REQUIREMENTS AND EVALUATION , 2005 .

[46]  Irena V. Marshakova-shaikevich System of Document Connections Based on References , 2009 .

[47]  David Raggett Clean Up Your Web Pages with HP's HTML Tidy , 1998, Comput. Networks.

[48]  Paul Clough,et al.  A proposal for comparative evaluation of automatic annotation for geo-referenced documents , 2005 .

[49]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[50]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[51]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[52]  Jeffrey P. Bigham,et al.  Names and Similarities on the Web: Fact Extraction in the Fast Lane , 2006, ACL.

[53]  Ray R. Larson,et al.  Spatial Ranking Methods for Geographic Information Retrieval (GIR) in Digital Libraries , 2004, ECDL.

[54]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[55]  Taher H. Haveliwala Efficient Computation of PageRank , 1999 .

[56]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[57]  Mário J. Silva,et al.  The WebCAT framework automatic generation of meta-data for Web resources , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[58]  Steve Draper,et al.  Questionnaires as a software evaluation tool , 1983, CHI '83.

[59]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[60]  Mário J. Silva,et al.  Challenges and resources for evaluating geographical IR , 2005, GIR '05.

[61]  M. Andrea Rodríguez,et al.  Defining and Comparing Content Measures of Topological Relations , 2004, GeoInformatica.

[62]  Gavriel Salvendy,et al.  A proposed index of usability: A method for comparing the relative usability of different software systems , 1997, Behav. Inf. Technol..

[63]  Linda L. Hill,et al.  Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints , 2000, ECDL.

[64]  Yi Li,et al.  Exploring Probabilistic Toponym Resolution for Geographical Information Retrieval , 2006, GIR.

[65]  Luis Gravano,et al.  Categorizing web queries according to geographical locality , 2003, CIKM '03.

[66]  Kevin S. McCurley,et al.  Analysis of anchor text for web search , 2003, SIGIR.

[67]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[68]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[69]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[70]  Yi-fang Brook Wu,et al.  Generating better concept hierarchies using automatic document classification , 2005, CIKM '05.

[71]  Mark Sanderson,et al.  Building, Testing, and Applying Concept Hierarchies , 2002 .

[72]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[73]  M. Andrea Rodríguez,et al.  Querying Heterogeneous Spatial Databases: Combining an Ontology with Similarity Functions , 2004, ER.

[74]  Jochen L. Leidner,et al.  Grounding spatial named entities for information extraction and question answering , 2003, HLT-NAACL 2003.

[75]  Paul D. Clough Extracting metadata for spatially-aware information retrieval on the internet , 2005, GIR '05.

[76]  Carol Peters,et al.  Cross-Language Evaluation Forum: Objectives, Results, Achievements , 2004, Information Retrieval.

[77]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[78]  Fabrizio Sebastiani,et al.  A scalable algorithm for high-quality clustering of web snippets , 2006, SAC.

[79]  Ying Li,et al.  Detecting dominant locations from search queries , 2005, SIGIR '05.

[80]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[81]  Wolfgang May Information Extraction and Integration with Florid: The MONDIAL Case Study , 1999 .

[82]  Xing Xie,et al.  Web resource geographic location classification and detection , 2005, WWW '05.

[83]  Tomás Soler,et al.  A note on frame transformations with applications to geodetic datums , 2003 .

[84]  Nancy Chinchor,et al.  Overview of MUC-7 , 1998, MUC.

[85]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[86]  Max J. Egenhofer,et al.  Reasoning about Binary Topological Relations , 1991, SSD.

[87]  Stefano Mizzaro Relevance: the whole history , 1997 .

[88]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[89]  Guoray Cai,et al.  GeoVSM: An Integrated Retrieval Model for Geographic Information , 2002, GIScience.

[90]  Katsumi Tanaka,et al.  Landmark Extraction: A Web Mining Approach , 2005, COSIT.

[91]  Torsten Suel,et al.  Efficient query processing in geographic web search engines , 2006, SIGMOD Conference.

[92]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[93]  David M. Mark,et al.  Natural-Language Spatial Relations Between Linear and Areal Objects: The Topology and Metric of English-Language Terms , 1998, Int. J. Geogr. Inf. Sci..

[94]  Jennifer Widom,et al.  Scaling personalized web search , 2003, WWW '03.

[95]  Francois Yergeau,et al.  UTF-8, a transformation format of ISO 10646 , 1998, RFC.

[96]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[97]  José Borbinha,et al.  The DIGMAP Geo-Temporal Web Gazetteer Service , 2009 .

[98]  Jochen L. Leidner Towards a Reference Corpus for Automatic Toponym Resolution Evaluation , 2004 .

[99]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[100]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[101]  Avi Arampatzis,et al.  Distributed Ranking Methods for Geographic Information Retrieval , 2004, SDH.

[102]  H. Simon Rational Decision Making in Business Organizations , 1978 .

[103]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[104]  Luis Gravano,et al.  Exploiting Geographical Location Information of Web Pages , 1999, WebDB.

[105]  Nuno Francisco Pereira Freire Cardoso Avaliação de Sistemas de Reconhecimento de Entidades Mencionadas , 2006 .

[106]  Kent L. Norman,et al.  Development of an instrument measuring user satisfaction of the human-computer interface , 1988, CHI '88.

[107]  Mário J. Silva,et al.  Language identification in web pages , 2005, SAC '05.

[108]  Diana Santos,et al.  What Kinds of Geographical Information Are There in the Portuguese Web? , 2006, PROPOR.

[109]  Max J. Egenhofer,et al.  Toward the semantic geospatial web , 2002, GIS '02.

[110]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[111]  José Luis Borbinha,et al.  Extracting and Exploring the Geo-Temporal Semantics of Textual Resources , 2008, 2008 IEEE International Conference on Semantic Computing.

[112]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[113]  Mário J. Silva,et al.  A Statistical Study of the WPT-03 Corpus , 2004, EsTAL.

[114]  Marvin V. Zelkowitz,et al.  Experimental Models for Validating Technology , 1998, Computer.

[115]  Hanan Samet,et al.  STEWARD: architecture of a spatio-textual search engine , 2007, GIS.

[116]  Lynette Hirschman,et al.  The Evolution of evaluation: Lessons from the Message Understanding Conferences , 1998, Comput. Speech Lang..

[117]  Declan Butler,et al.  Virtual globes: The web-wide world , 2006, Nature.

[118]  Ian Soboroff On evaluating web search with very few relevant documents , 2004, SIGIR '04.

[119]  Kevin S. McCurley,et al.  Ranking the web frontier , 2004, WWW '04.

[120]  Paola Velardi,et al.  Automatic adaptation of proper noun dictionaries through cooperation of machine learning and probabilistic methods , 2000, SIGIR '00.

[121]  Alberto Simões,et al.  The XLDB Group at the CLEF 2005 Ad-Hoc Task , 2005, CLEF.

[122]  Harith Alani,et al.  Associative and Spatial Relationships in Thesaurus-Based Retrieval , 2000, ECDL.

[123]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[124]  Vincent Ng Machine Learning for Coreference Resolution: Recent Successes and Future Challenges , 2003 .

[125]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[126]  Mark Gahegan,et al.  Proximity Operators for Qualitative Spatial Reasoning , 1995, COSIT.

[127]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[128]  Kentaro Toyama,et al.  Robust location search from text queries , 2007, GIS.

[129]  Anthony G. Cohn,et al.  Qualitative Spatial Representation and Reasoning: An Overview , 2001, Fundam. Informaticae.

[130]  Mark Sanderson,et al.  Spatio-textual Indexing for Geographical Search on the Web , 2005, SSTD.

[131]  Mário J. Silva,et al.  NOAH: A CSP-based language for describing the behaviour of coupled models , 2008 .

[132]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[133]  Nuno Seco,et al.  HAREM: An Advanced NER Evaluation Contest for Portuguese , 2006, LREC.

[134]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[135]  Robert R. Korfhage,et al.  Information Storage and Retrieval , 1963 .

[136]  Jean Tague-Sutcliffe,et al.  The Pragmatics of Information Retrieval Experimentation Revisited , 1997, Inf. Process. Manag..

[137]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[138]  Joachim Kohler Analyzing search engine queries for the use of geographic terms , 2003 .

[139]  Marty Himmelstein Local Search: The Internet Is the Yellow Pages , 2005, Computer.

[140]  Daniel Gomes,et al.  Web modelling for web warehouse design , 2007 .

[141]  Andrew Turpin,et al.  Why batch and user evaluations do not give the same results , 2001, SIGIR '01.

[142]  Stuart Weibel The State of the Dublin Core Metadata Initiative , 1999 .

[143]  D. Wolfe,et al.  Nonparametric Statistical Methods. , 1974 .

[144]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[145]  James F. Allen Time and time again: The many ways to represent time , 1991, Int. J. Intell. Syst..

[146]  Eugene Agichtein Scaling Information Extraction to Large Document Collections , 2005, IEEE Data Eng. Bull..

[147]  M. Sester,et al.  DERIVATION OF IMPLICIT INFORMATION FROM SPATIAL DATA SETS WITH DATA MINING , 2004 .

[148]  Alia I. Abdelmoty,et al.  Ontology-Based Spatial Query Expansion in Information Retrieval , 2005, OTM Conferences.

[149]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[150]  Max J. Egenhofer,et al.  Comparing geospatial entity classes: an asymmetric and context-dependent similarity measure , 2004, Int. J. Geogr. Inf. Sci..

[151]  James Pustejovsky,et al.  Temporal and Event Information in Natural Language Text , 2005, Lang. Resour. Evaluation.

[152]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[153]  Peter Siniakov,et al.  An Overview and Classification of Adaptive Approaches to Information Extraction , 2005, J. Data Semant..

[154]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[155]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[156]  James R. Lewis,et al.  IBM computer usability satisfaction questionnaires: Psychometric evaluation and instructions for use , 1995, Int. J. Hum. Comput. Interact..

[157]  Min-Yen Kan,et al.  Fast webpage classification using URL features , 2005, CIKM '05.

[158]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[159]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[160]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[161]  Mário J. Silva,et al.  O sistema CaGE no HAREM – reconhecimento de entidades geográficas em textos em língua portuguesa , 2007 .

[162]  Pasi Tapanainen,et al.  What is a word, What is a sentence? Problems of Tokenization , 1994 .

[163]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[164]  Eckhard Bick,et al.  Floresta Sintá(c)tica: A treebank for Portuguese , 2002, LREC.

[165]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[166]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[167]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[168]  Serge Fdida,et al.  Constraint-Based Geolocation of Internet Hosts , 2004, IEEE/ACM Transactions on Networking.

[169]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[170]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[171]  Alistair Moffat,et al.  Recommended reading for IR research students , 2005, SIGF.

[172]  Kalina Bontcheva,et al.  A Light-weight Approach to Coreference Resolution for Named Entities in Text , 2002 .

[173]  Donald E. Knuth,et al.  The TeXbook , 1984 .

[174]  Inderjeet Mani,et al.  Disambiguating Toponyms in News , 2005, HLT/EMNLP.

[175]  David O'Sullivan,et al.  Geographic Information Analysis , 2002 .

[176]  Einat Amitay,et al.  Hypertext: The Importance of being Different , 1997 .

[177]  Dimitrios Papadias,et al.  Acquiring, Representing and Processing Spatial Relations , 1994 .

[178]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[179]  Sophia Ananiadou,et al.  Extracting Nested Collocations , 1996, COLING.

[180]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[181]  Rob Malouf,et al.  Markov Models for Language-independent Named Entity Recognition , 2002, CoNLL.

[182]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[183]  Allison Woodruff,et al.  GIPSY: automated geographic indexing of text documents , 1994 .

[184]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[185]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[186]  Fakhri Karray,et al.  A concept-based model for enhancing text categorization , 2007, KDD '07.

[187]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[188]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[189]  Nuno Cardoso,et al.  The University of Lisbon at GeoCLEF 2007 , 2007, CLEF.

[190]  Katsumi Tanaka,et al.  Toward tighter integration of web search with a geographic information system , 2006, WWW '06.

[191]  Dan Shen,et al.  Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System , 2000, J. Digit. Inf..

[192]  Craig A. Knoblock,et al.  From Text to Geographic Coordinates: The Current State of Geocoding , 2007 .

[193]  Gerard de Melo,et al.  Multilingual Text Classification Using Ontologies , 2007, ECIR.

[194]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[195]  Rada Mihalcea,et al.  Random Walk Term Weighting for Improved Text Classification , 2007, Int. J. Semantic Comput..

[196]  Olga Uryupina Semi-supervised learning of geographical gazetteers from the internet , 2003, HLT-NAACL 2003.

[197]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[198]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[199]  Martin J. Conyon,et al.  Ranking the Importance of Boards of Directors , 2004 .

[200]  Avi Arampatzis,et al.  Multi-Dimensional Scattered Ranking Methods for Geographic Information Retrieval* , 2005, GeoInformatica.

[201]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[202]  Lise Getoor,et al.  Entity resolution in geospatial data integration , 2006, GIS '06.

[203]  Mário J. Silva,et al.  The XLDB Group at GeoCLEF 2005 , 2005, CLEF.

[204]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[205]  Paolo Ferragina,et al.  A personalized search engine based on Web‐snippet hierarchical clustering , 2008, Softw. Pract. Exp..

[206]  Malvina Nissim,et al.  Recognising Geographical Entities in Scottish Historical Documents , 2003 .

[207]  Robert J. Gaizauskas,et al.  Evaluation in language and speech technology , 1998, Comput. Speech Lang..

[208]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[209]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[210]  Mário J. Silva,et al.  Indexing and ranking in Geo-IR systems , 2005, GIR '05.

[211]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[212]  Mário J. Silva,et al.  Using Geographic Signatures as Query and Document Scopes in Geographic IR , 2007, CLEF.

[213]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[214]  Amit P. Sheth,et al.  Geospatial Ontology Development and Semantic Analytics , 2006, Trans. GIS.

[215]  William Sugar User-Centered Perspective of Information Retrieval Research and Analysis Methods. , 1995 .

[216]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[217]  Sérgio Augusto Sousa Freitas User interfaces for geographic information retrieval systems , 2007 .

[218]  David M. Mark,et al.  Cognitive models of geographical space , 1999, Int. J. Geogr. Inf. Sci..

[219]  Iryna Gurevych,et al.  Towards Enhanced Interoperability for Large HLT Systems : UIMA for NLP , 2008 .

[220]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[221]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[222]  Jakob Nielsen,et al.  A mathematical model of the finding of usability problems , 1993, INTERCHI.

[223]  Dimitris Papadias,et al.  Spatial Relations, Minimum Bounding Rectangles, and Spatial Data Structures , 1997, Int. J. Geogr. Inf. Sci..

[224]  Bernhard Seeger,et al.  Geographic Information Retrieval , 2004, WebDyn@WWW.

[225]  Fredric C. Gey,et al.  An Evaluation Resource for Geographic Information Retrieval , 2008, LREC.

[226]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[227]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[228]  Bob Carpenter,et al.  Phrasal Queries with LingPipe and Lucene: Ad Hoc Genomics Text Retrieval , 2004, TREC.

[229]  Mário J. Silva,et al.  GKB - Geographic Knowledge Base , 2005 .

[230]  Yannick Versley,et al.  Extracting spatial information : grounding , classifying and linking spatial expressions [ Extended Abstract ] , 2022 .

[231]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[232]  Paul Clough,et al.  Identifying imprecise regions for geographic information retrieval using the web , 2005 .

[233]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[234]  Xing Xie,et al.  Query Parsing Task for GeoCLEF2007 Report , 2007, CLEF.

[235]  Fredric C. Gey,et al.  GeoCLEF: the CLEF 2005 Cross-Language Geographic Information Retrieval Track , 2005, CLEF.

[236]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[237]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[238]  Lakshminarayanan Subramanian,et al.  Determining the geographic location of Internet hosts , 2001, SIGMETRICS '01.

[239]  Michael W. Berry,et al.  Survey of Text Mining: Clustering, Classification, and Retrieval , 2007 .

[240]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[241]  Mark Sanderson,et al.  Visualising the south yorkshire floods of '07 , 2007, GIR '07.

[242]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[243]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[244]  Ray R. Larson,et al.  Geographic information retrieval (GIR) ranking methods for digital libraries , 2004, JCDL.

[245]  Luís Sarmento,et al.  O projecto AC/DC: acesso a corpora/disponibilização de corpora , 2003 .

[246]  Mário J. Silva,et al.  Spelling Correction for Search Engine Queries , 2004, EsTAL.

[247]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[248]  Bernard J. Jansen,et al.  A review of Web searching studies and a framework for future research , 2001, J. Assoc. Inf. Sci. Technol..

[249]  Stefan M. Rüger,et al.  Geographic co-occurrence as a tool for gir. , 2007, GIR '07.

[250]  Mário J. Silva,et al.  A graph-ranking algorithm for geo-referencing documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[251]  David Nadeau,et al.  Semi-supervised named entity recognition: learning to recognize 100 entity types with little supervision , 2007 .

[252]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[253]  Justin Zobel,et al.  How reliable are the results of large-scale information retrieval experiments? , 1998, SIGIR '98.

[254]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.

[255]  John Maindonald,et al.  Data Analysis and Graphics Using R: Contents , 2006 .

[256]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .