From knowledge acquisition to information retrieval

Resumen: Introducimos una propuesta en recuperacion de informacion basada en la consideracion de recursos sintacticos y semanticos complejos y automaticamente generados a partir de la propia coleccion documental. Se describe una estrategia donde el lenguaje y el dominio de documentos son independientes del proceso. Palabras clave: adquisicion del conocimiento, analisis sintactico,extraccion de terminos, recuperacion de informacion, representacion del conocimiento Abstract: We introduce a proposal on information recovery based on the considera- tion of complex syntactic and semantic resources which are automatically generated from the documentary collection itself. The paper describes a strategy where the language and the domain of documents are independent of the process.

[1]  Esther Klabbers,et al.  The Contribution of Various Sources of Spectral Mismatch to Audible Discontinuities in a Diphone Database , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Daniel Marcu,et al.  The automatic construction of large-scale corpora for summarization research , 1999, SIGIR '99.

[3]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[4]  J. Adamson "The weakest link". , 1981, The Journal of plastic and reconstructive surgical nursing : official organ of the American Society of Plastic and Reconstructive Surgical Nurses.

[5]  Yannis Stylianou,et al.  Discontinuity detection in concatenated speech synthesis based on nonlinear speech analysis , 2005, INTERSPEECH.

[6]  Cecilia Ovesdotter Alm,et al.  Emotions from Text: Machine Learning for Text-based Emotion Prediction , 2005, HLT.

[7]  Josiane Mothe,et al.  Ontologies as Background Knowledge to Explore Document Collections , 2004, RIAO.

[8]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[9]  Brigitte Biebow,et al.  OWL et Terminae , 2004 .

[10]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[12]  Bruno Pouliquen,et al.  Indexation de textes médicaux par extraction de concepts, et ses utilisations. (Medical texts indexation using concepts extraction, and its use) , 2002 .

[13]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[14]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[15]  Roser Morante,et al.  Dialogue Simulation and Context Dynamics for Dialogue Management , 2007, NODALIDA.

[16]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[17]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  G. Lakoff,et al.  Metaphors We Live by , 1982 .

[20]  B. Navarro,et al.  Syntactic , semantic and pragmatic annotation in Cast 3 LB , 2003 .

[21]  James F. Allen,et al.  An architecture for a generic dialogue shell , 2000, Natural Language Engineering.

[22]  Joseph Kaye,et al.  Understanding how bloggers feel: recognizing affect in blog posts , 2006, CHI Extended Abstracts.

[23]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[24]  Hinrich Schütze,et al.  Word Space , 1992, NIPS.

[25]  R. Plutchik A GENERAL PSYCHOEVOLUTIONARY THEORY OF EMOTION , 1980 .

[26]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[27]  Hui-Lan Luo,et al.  A new method for constructing clustering ensembles , 2007, 2007 International Conference on Wavelet Analysis and Pattern Recognition.

[28]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[29]  Paolo Rosso,et al.  The UPV at GeoCLEF 2007 , 2007, CLEF.

[30]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[31]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[32]  Simon King,et al.  Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis , 2004, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Fredric C. Gey,et al.  Probabilistic retrieval based on staged logistic regression , 1992, SIGIR '92.

[34]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[35]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[36]  Didier Bourigault,et al.  Analyse distributionnelle et structuration de terminologie : Application à la construction d'une ontologie documentaire du Droit , 2002 .

[37]  Aitao Chen,et al.  Cross-language Retrieval Experiments at CLEF 2002 , 2002, CLEF.

[38]  Roberto Basili,et al.  Ontology-driven Information Retrieval in FF-Poirot , 2005, SWAP.

[39]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[40]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[41]  Benoît Lemaire,et al.  Computational cognitive models of summarization assessment skills , 2005 .

[42]  Branimir Boguraev,et al.  The Talent system: TEXTRACT architecture and data model , 2004, Natural Language Engineering.

[43]  Jonathan Harrington,et al.  Multi-level annotation in the Emu speech database management system , 2001, Speech Commun..

[44]  Derek Greene,et al.  Efficient Ensemble Methods for Document Clustering , 2006 .

[45]  Michael W. Macon,et al.  A perceptual evaluation of distance measures for concatenative speech synthesis , 1998, ICSLP.

[46]  Robert E. Donovan,et al.  A new distance measure for costing spectral discontinuities in concatenative speech synthesizers , 2001, SSW.

[47]  Yi Guan,et al.  Rich features based Conditional Random Fields for biological named entities recognition , 2007, Comput. Biol. Medicine.

[48]  Paul Clough,et al.  GEOGRAPHIC IR SYSTEMS: REQUIREMENTS AND EVALUATION , 2005 .

[49]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[50]  G. Bontempi,et al.  A Blocking Strategy to Improve Gene Selection for Classification of Gene Expression Data , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[51]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[52]  Ralph Grishman,et al.  TIPSTER Text Phase II Architecture Design Version 2.1p 19 June 1996 , 1996, TIPSTER.

[53]  Ted Pedersen,et al.  Maximizing Semantic Relatedness to Perform Word Sense Disambiguation , 2005 .

[54]  Paolo Rosso,et al.  A Passage Retrieval System for Multilingual Question Answering , 2005, TSD.

[55]  Mihai Surdeanu,et al.  A hybrid unsupervised approach for document clustering , 2005, KDD '05.

[56]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[57]  Jean Charlet,et al.  Construction de ressources terminologiques ou ontologiques à partir de textes Un cadre unificateur pour trois études de cas , 2004, Rev. d'Intelligence Artif..

[58]  Ann K. Syrdal,et al.  Perceptually-based data-driven join costs: comparing join types , 2005, INTERSPEECH.

[59]  H. Schlosberg The description of facial expressions in terms of two dimensions. , 1952, Journal of experimental psychology.

[60]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[61]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[62]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[63]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[64]  Jean Charlet,et al.  Methodological Principles for Structuring an "Ontology , 1995 .

[65]  Joan Claudi Socoró,et al.  Towards High-Quality Next-Generation Text-to-Speech Synthesis: A Multidomain Approach by Automatic Domain Classification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[66]  Henry Lieberman,et al.  A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[67]  Wiebke Petersen,et al.  A Set-Theoretical Approach for the Induction of Inheritance Hierarchies , 2004, FGMOL.

[68]  Korin Richmond,et al.  Informed blending of databases for emotional speech synthesis , 2005, INTERSPEECH.

[69]  Harry Bunt,et al.  Dialogue pragmatics and context specification , 2000, Abduction, Belief and Context in Dialogue.

[70]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[71]  Hans-Peter Seidel,et al.  Mixed feelings: expression of non-basic emotions in a muscle-based talking head , 2005, Virtual Reality.

[72]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[73]  Steven Bird,et al.  The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools , 2001, HLT.

[74]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[75]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[76]  H. Schlosberg Three dimensions of emotion. , 1954, Psychological review.

[77]  Daniel Ferrés,et al.  TALP at GeoCLEF 2007: Using Terrier with Geographical Knowledge Filtering , 2007, CLEF.

[78]  Herbert H. Clark,et al.  Contributing to Discourse , 1989, Cogn. Sci..

[79]  Tao Li,et al.  Document clustering via adaptive subspace iteration , 2004, SIGIR '04.

[80]  Walter Kintsch,et al.  Comprehension: A Paradigm for Cognition , 1998 .

[81]  Luis Alfonso Ureña López,et al.  The learning vector quantization algorithm applied to automatic text classification tasks , 2007, Neural Networks.

[82]  Barry Kirkpatrick,et al.  Feature extraction for spectral continuity measures in concatenative speech synthesis , 2006, INTERSPEECH.

[83]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[84]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[85]  Janet E. Cahn,et al.  A Psychological Model of Grounding and Repair in Dialog , 1999 .

[86]  Luis Alfonso Ureña López,et al.  Text Categorization using bibliographic records: beyond document content , 2005, Proces. del Leng. Natural.

[87]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[88]  Inderjit S. Dhillon,et al.  Information theoretic clustering of sparse cooccurrence data , 2003, Third IEEE International Conference on Data Mining.

[89]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[90]  Virginia Francisco,et al.  Análisis de dependencias para la marcación de cuentos con emociones , 2006, Proces. del Leng. Natural.

[91]  Benoît Sagot,et al.  Error Mining in Parsing Results , 2006, ACL.

[92]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[93]  Andrea Esuli,et al.  Determining the semantic orientation of terms through gloss analysis , 2005, CIKM 2005.

[94]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[95]  David R Traum,et al.  Towards a Computational Theory of Grounding in Natural Language Conversation , 1991 .

[96]  Peter Woollard,et al.  The minimum information required for reporting a molecular interaction experiment (MIMIx) , 2007, Nature Biotechnology.

[97]  Hinrich Schütze,et al.  A Vector Model for Syntagmatic and Paradigmatic Relatedness , 1993 .

[98]  Grigori Sidorov,et al.  A Term Frequency Range for Text Representation , 2006 .

[99]  Jaime Redondo,et al.  The Spanish adaptation of ANEW (Affective Norms for English Words) , 2007, Behavior research methods.

[100]  Fredric C. Gey,et al.  Berkeley2 at GeoCLEF: Cross-Language Geographic Information Retrieval of German and English Documents , 2005, CLEF.

[101]  E. Hall,et al.  The nature of biotechnology. , 1988, Journal of biomedical engineering.

[102]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[103]  João Graça,et al.  NLP Tools Integration Using a Multi-Layered Repository , 2006 .

[104]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[105]  Daniel Hanisch,et al.  ProMiner: rule-based protein and gene entity recognition , 2005, BMC Bioinformatics.

[106]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[107]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[108]  Nathalie Aussenac-Gilles,et al.  Prise en compte de l'application dans la constitution de produits terminologiques , 2002 .

[109]  Jen-Shin Hong,et al.  Emotion Detection in Textual Information by Semantic Role Labeling and Web Mining Techniques , 2006 .

[110]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[111]  F. Sugimoto,et al.  A method to classify emotional expressions of text and synthesize speech , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[112]  Ludmila I. Kuncheva,et al.  Selecting Diversifying Heuristics for Cluster Ensembles , 2007, MCS.

[113]  Jonathon Read,et al.  Recognising Affect in Text using Pointwise-Mutual Information , 2004 .

[114]  John Shawe-Taylor,et al.  The Perceptron Algorithm with Uneven Margins , 2002, ICML.

[115]  Mary Zajicek,et al.  The generation of representations of word meanings from dictionaries , 2000, INTERSPEECH.

[116]  Manfred A. Max-Neef Human Scale Development: Conception, Application and Further Reflections , 1989 .

[117]  Armando Suárez,et al.  Una propuesta de infraestructura para el Procesamiento del Lenguaje Natural , 2005, Proces. del Leng. Natural.

[118]  Eric Horvitz,et al.  Grounding Criterion: Toward a Formal Theory of Grounding , 2000 .

[119]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[120]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[121]  Antonio Toral,et al.  Geographic IR Helped by Structured Geospatial Knowledge Resources , 2006, CLEF.

[122]  Francesc Alías,et al.  A PHONETICALLY BALANCED MODIFIED RHYME TEST FOR EVALUATING CATALAN SPEECH INTELLIGIBILITY , 2007 .

[123]  D. Watson,et al.  Toward a consensual structure of mood. , 1985, Psychological bulletin.

[124]  Ben He,et al.  Terrier : A High Performance and Scalable Information Retrieval Platform , 2022 .

[125]  S. Vereza Philosophy in the flesh: the embodied mind and its challenge to Western thought , 2001 .

[126]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[127]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[128]  Joan Claudi Socoró,et al.  Robust Document Clustering by Exploiting Feature Diversity in Cluster Ensembles , 2006, Proces. del Leng. Natural.

[129]  J. Oberlander,et al.  Abduction, Belief and Context in Dialogue , 2000 .

[130]  G De Moor,et al.  The distinction between linguistic and conceptual semantics in medical terminology and its implication for NLP-based knowledge acquisition. , 1998, Methods of information in medicine.

[131]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[132]  Yannis Stylianou,et al.  Perceptual and objective detection of discontinuities in concatenative speech synthesis , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[133]  J. Russell A circumplex model of affect. , 1980 .

[134]  Paolo Rosso,et al.  Uso del punto de transición en la selección de términos índice para agrupamiento de textos cortos , 2005, Proces. del Leng. Natural.

[135]  Ray R. Larson Cheshire at GeoCLEF 2007: Retesting Text Retrieval Baselines , 2007, CLEF.

[136]  Kalina Bontcheva,et al.  Evolving GATE to meet new challenges in language engineering , 2004, Natural Language Engineering.

[137]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[138]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[139]  Luis Alfonso Ureña López,et al.  LVQ for text categorization using a multilingual linguistic resource , 2003, Neurocomputing.

[140]  Ralph Grishman,et al.  NOMLEX: a lexicon of nominalizations , 1998 .

[141]  Rocio Guillén GeoCLEF2007 Experiments in Query Parsing and Cross-language GIR , 2007, CLEF.

[142]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[143]  Carlo Strapparava,et al.  WordNet Affect: an Affective Extension of WordNet , 2004, LREC.

[144]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[145]  Carlo Strapparava,et al.  SemEval-2007 Task 14: Affective Text , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[146]  Deirdre Wilson,et al.  Relevance theory: A tutorial , 2002 .

[147]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[148]  Luis Alfonso Ureña López,et al.  Integración de Conocimiento en un Dominio Epecífico para Categorización Multietiqueta , 2007, Proces. del Leng. Natural.

[149]  Andrew Ortony,et al.  The Cognitive Structure of Emotions , 1988 .

[150]  Miguel Ángel García Cumbreras,et al.  Using Information Gain to Improve the ImageCLEF 2006 Collection , 2006, CLEF.

[151]  W. Kintsch Metaphor comprehension: A computational theory , 2000, Psychonomic bulletin & review.

[152]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[153]  Paolo Rosso,et al.  A WordNet-based Query Expansion Method for Geographical Information Retrieval , 2005, CLEF.

[154]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[155]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[156]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[157]  Steinberger Ralf,et al.  Why Keywording Matters , 2004 .

[158]  Miguel Ángel García Cumbreras,et al.  GEOUJA System. The First Participation of the University of Jaén at GEOCLEF 2006 , 2006, CLEF.

[159]  Miguel A. Alonso,et al.  Morphological and Syntactic Processing for Text Retrieval , 2004, DEXA.

[160]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[161]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[162]  Nuno Cardoso,et al.  The University of Lisbon at GeoCLEF 2007 , 2007, CLEF.

[163]  Ann K. Syrdal Phonetic effects on listener detection of vowel concatenation , 2001, INTERSPEECH.

[164]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[165]  Miguel Ángel García Cumbreras,et al.  BRUJA System. The University of Jaén at the Spanish Task of CLEFQA 2006 , 2006, CLEF.

[166]  D. G. Hays Dependency Theory: A Formalism and Some Observations , 1964 .

[167]  Magnus Sahlgren,et al.  From Words to Understanding , 2001 .

[168]  Natasha Vleduts-Stokolov Concept recognition in an automatic text‐processing system for the life sciences , 1987 .

[169]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[170]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[171]  Andrés Montoyo,et al.  Using Relevant Domains Resource for Word Sense Disambiguation , 2004, IC-AI.

[172]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[173]  Paolo Rosso,et al.  Conditional Random Fields vs. Hidden Markov Models in a biomedical Named Entity Recognition task , 2007 .

[174]  William W. Cohen,et al.  High-recall protein entity recognition using a dictionary , 2005, ISMB.

[175]  Mitsuru Ishizuka,et al.  Assessing Sentiment of Text by Semantic Dependency and Contextual Valence Analysis , 2007, ACII.

[176]  Mário J. Silva,et al.  The XLDB Group at GeoCLEF 2005 , 2005, CLEF.

[177]  Marc Schröder,et al.  Dimensional Emotion Representation as a Basis for Speech Synthesis with Non-extreme Emotions , 2004, ADS.

[178]  Bruno Pouliquen,et al.  Geographical information recognition and visualization in texts written in various languages , 2004, SAC '04.

[179]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[180]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[181]  Susan E. Brennan,et al.  The Grounding Problem in Conversations With and Through Computers , 2000 .

[182]  Ulrik Sandborg-Petersen,et al.  Emdros - a text database engine for analyzed or annotated text , 2004, COLING.

[183]  Ted Pedersen,et al.  Unsupervised Discrimination of Person Names in Web Contexts , 2009, CICLing.

[184]  Roser Morante,et al.  A dialogue act based model for context updating , 2007 .