Information extraction from the web using a search engine

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

[1]  Gerhard Friedrich,et al.  AllRight: Automatic Ontology Instantiation from Tabular Web Documents , 2007, ISWC/ASWC.

[2]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[3]  Jan Korst,et al.  Tagging Artists using Co-Occurrences on the Web , 2006 .

[4]  Paola Velardi,et al.  Enriching a Formal Ontology with a Thesaurus: an Application in the Cultural Heritage Domain , 2006, OntologyLearning@COLING/ACL.

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  Eyal Amir,et al.  Approximation Algorithms for Treewidth , 2010, Algorithmica.

[7]  François Pachet,et al.  A taxonomy of musical genres , 2000, RIAO.

[8]  Tao Jiang,et al.  On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[9]  N. Scaringella,et al.  Automatic genre classification of music content: a survey , 2006, IEEE Signal Process. Mag..

[10]  Jan H. M. Korst,et al.  Enriching music with synchronized lyrics, images and colored lights , 2008, Ambi-Sys '08.

[11]  Y. Matsuo,et al.  Extracting a Social Network among Entities by Web mining , 2006 .

[12]  Jan Korst,et al.  Learning Effective Surface Text Patterns for Information Extraction , 2006, Workshop On Adaptive Text Extraction And Mining ATEM.

[13]  Wessel Kraaij,et al.  Variations on language modeling for information retrieval , 2005, SIGF.

[14]  Claudia Niederée,et al.  Extracting Art Style Periods from the Web , 2006 .

[15]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[16]  Ichiro Fujinaga,et al.  Web Services for Music Information Retrieval , 2004, ISMIR.

[17]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[18]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[19]  Yongwei Zhu,et al.  Popular song and lyrics synchronization and its application to music information retrieval , 2006, Electronic Imaging.

[20]  Ichiro Fujinaga,et al.  Musical genre classification: Is it worth pursuing and how can it be improved? , 2006, ISMIR.

[21]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[22]  Frank van Harmelen,et al.  APPROXIMATE SEMANTIC MATCHING OF MUSIC CLASSES ON THE INTERNET , 2006 .

[23]  Eduard H. Hovy,et al.  Fine Grained Classification of Named Entities , 2002, COLING.

[24]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[25]  Mitsuru Ishizuka,et al.  Extracting Relations in Social Networks from the Web Using Similarity Between Collective Contexts , 2006, SEMWEB.

[26]  J. Westermeyer,et al.  The Great Gatsby. , 2009, The American journal of psychiatry.

[27]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[28]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[29]  Beth Logan,et al.  Semantic analysis of song lyrics , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[30]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[31]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[32]  Haizhou Li,et al.  Syllabic level automatic synchronization of music signals and text lyrics , 2006, MM '06.

[33]  Peter Knees,et al.  Artist Classification with Web-Based Data , 2004, ISMIR.

[34]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[35]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[36]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[37]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[38]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[39]  Peter Knees,et al.  Multiple Lyrics Alignment: Automatic Retrieval of Song Lyrics , 2005, ISMIR.

[40]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[41]  Jan Korst,et al.  Search Engine-Based Web Information Extraction , 2010 .

[42]  Peter Knees,et al.  Building an Interactive Next-Generation Artist Recommender Based on Automatically Derived High-Level Concepts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[43]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[44]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[45]  Daniel P. W. Ellis,et al.  Automatic Record Reviews , 2004, ISMIR.

[46]  Valentin Jijkoun,et al.  Data-driven type checking in open domain question answering , 2007, J. Appl. Log..

[47]  Kentaro Torisawa,et al.  Concept-Instance Relation Extraction from Simple Noun Sequences Using a Full-Text Search Engine , 2006 .

[48]  Rp Rob Nederpelt,et al.  Selected papers on Automath , 1994 .

[49]  Heiner Stuckenschmidt,et al.  Repairing Ontology Mappings , 2007, AAAI.

[50]  Andrew G. Clark,et al.  Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL) , 2002 .

[51]  Jan H. M. Korst,et al.  Efficient Lyrics Extraction from the Web , 2006, ISMIR.

[52]  Jan H. M. Korst,et al.  Automatic Ontology Population by Googling , 2005, BNAIC.

[53]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[54]  Jan H. M. Korst,et al.  Web-Based Artist Categorization , 2006, ISMIR.

[55]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[56]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[57]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[58]  Deborah L. McGuinness,et al.  Owl web ontology language guide , 2003 .

[59]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[60]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[61]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[62]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[63]  Daniel P. W. Ellis,et al.  The Quest for Ground Truth in Musical Artist Similarity , 2002, ISMIR.

[64]  V. de Boer,et al.  Instance Classification using Co-Occurrences on the Web , 2006 .

[65]  F. Corpet Multiple sequence alignment with hierarchical clustering. , 1988, Nucleic acids research.

[66]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[67]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[68]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[69]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[70]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .

[71]  Miltiadis D. Lytras,et al.  Semantic Web Engineering in the Knowledge Society , 2008 .

[72]  Miguel Mira da Silva,et al.  A Survey of Web Information Systems , 1997, WebNet.

[73]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[74]  Peter Knees,et al.  The Quest for Ground Truth in Musical Artist Tagging in the Social Web Era , 2007, ISMIR.

[75]  J. Korst,et al.  Efficient Lyrics Retrieval and Alignment , .

[76]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[77]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[78]  Jan H. M. Korst,et al.  Creating a Dead Poets Society: Extracting a Social Network of Historical Persons from the Web , 2007, ISWC/ASWC.

[79]  Roberto Basili,et al.  Classification of musical genre: a machine learning approach , 2004, ISMIR.

[80]  Lusheng Wang,et al.  Improved Approximation Algorithms for Tree Alignment , 1996, J. Algorithms.

[81]  Peter Knees,et al.  Assigning and Visualizing Music Genres by Web-based Co-Occurrence Analysis , 2006, ISMIR.

[82]  Lou Boves,et al.  Evaluating discourse-based answer extraction for why-question answering , 2007, SIGIR.

[83]  Gilad Mishne,et al.  Why Are They Excited? Identifying and Explaining Spikes in Blog Mood Levels , 2006, EACL.

[84]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[85]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[86]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[87]  Bob J. Wielinga,et al.  A redundancy-based method for the extraction of relation instances from the Web , 2007, Int. J. Hum. Comput. Stud..

[89]  Frank van Harmelen,et al.  Using Google distance to weight approximate ontology matches , 2007, WWW '07.

[90]  Jan Korst,et al.  Enriching text with images and colored light , 2008, Electronic Imaging.

[91]  Markus Koppenberger,et al.  Natural language processing of lyrics , 2005, ACM Multimedia.

[92]  Andrew Keen,et al.  Book Review: Andrew Keen, The Cult of the Amateur: How Today's Internet Is Killing Our Culture and Assaulting Our Economy. London and Boston, MA: Currency/Doubleday, 2007. 228 pp. ISBN 0—3855—2080—8, $22.95 (pbk) , 2008, New Media Soc..

[93]  Katja Hofmann,et al.  Automatic Extraction of Dutch Hypernym-Hyponym Pairs , 2007, CLIN 2007.

[94]  Jan H. M. Korst,et al.  Tool Play Live: Dealing with Ambiguity in Artist Similarity Mining from the Web , 2007, ISMIR.

[95]  Markus Schedl,et al.  Automatically Detecting Members and Instrumentation of Music Bands Via Web Content Mining , 2007, Adaptive Multimedia Retrieval.

[96]  Valter Crescenzi,et al.  Automatic information extraction from large websites , 2004, JACM.

[97]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[98]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[99]  Walter Daelemans,et al.  Memory-Based Named Entity Recognition using Unannotated Data , 2003, CoNLL.

[100]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[101]  Gemma Samuell,et al.  Harry Potter and the Order of the Phoenix , 2008, SIGGRAPH '08.

[102]  Guus Schreiber,et al.  The Semantic Web – ISWC 2004 , 2004, Lecture Notes in Computer Science.

[103]  Michael Collins,et al.  Answer Extraction , 2000, ANLP.

[104]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[105]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[106]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[107]  Tao Li,et al.  A comparative study on content-based music genre classification , 2003, SIGIR.

[108]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[109]  Michael J. Cafarella,et al.  Ontology-Driven Information Extraction with OntoSyphon , 2006, SEMWEB.

[110]  Hideki Mima,et al.  Automatic recognition of multi-word terms:. the C-value/NC-value method , 2000, International Journal on Digital Libraries.

[111]  Doug Downey,et al.  KnowItNow: Fast, Scalable Information Extraction from the Web , 2005, HLT.

[112]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[113]  Doug Downey,et al.  Locating Complex Named Entities in Web Text , 2007, IJCAI.

[114]  Peter Knees,et al.  A WEB-BASED APPROACH TO ASSESSING ARTIST SIMILARITY USING CO-OCCURRENCES , 2005 .

[115]  Ye Wang,et al.  LyricAlly: automatic synchronization of acoustic musical signals and textual lyrics , 2004, MULTIMEDIA '04.

[116]  Tao Jiang,et al.  Approximation algorithms for tree alignment with a given phylogeny , 1996, Algorithmica.

[117]  Marc Moens,et al.  Seventh Message Understanding Conference (MUC-7) , 1998 .

[118]  Véronique Malaisé,et al.  Anchoring Dutch Cultural Heritage Thesauri to WordNet: Two Case Studies , 2007, LaTeCH@ACL 2007.

[119]  Marco Tiemann,et al.  Towards ensemble learning for hybrid music recommendation , 2007, RecSys '07.

[120]  Alexiei Dingli,et al.  Learning to Harvest Information for the Semantic Web , 2004, ESWS.

[121]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[122]  Gerhard Widmer,et al.  Improvements of Audio-Based Music Similarity and Genre Classificaton , 2005, ISMIR.

[123]  Willem Robert van Hage,et al.  A Method for Learning Part-Whole Relations , 2006, International Semantic Web Conference.

[124]  J. Giles Internet encyclopaedias go head to head , 2005, Nature.

[125]  Jérôme Euzenat,et al.  A Survey of Schema-Based Matching Approaches , 2005, J. Data Semant..

[126]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.