QEAVis: Quantitative Evaluation of Academic Websites Visibility TIN 2007-67581-C02

The application of the Human-Language Technologies (HLT) in the web arises new technological challenges. First, the web pages structure and textual content of the web sites is not comparable to the traditional domains in the textual treatment such as the news repositories. Second, the processing of big portions of the web arises a scalability problem and new challenges in the development of methodologies, techniques and algorithms of textual treatment. The project plans the application of HLT to an important problem such as the measurement of the academic visibility in the web, giving the basis of a quantitative evaluation of the universities departments’ commitment in the public access to their information. Web indicators (Cybermetrics) must be developed and applied to the study of the academic websites visibility, with special focus on the presence of the Spanish language (of strategic importance) and the academic areas related to humanities (which need special help for their web positioning). First, we will determine the main web mediators of academic contents at web subdomains level. These subdomains should be crawled to unload, store and manage their web pages, so that the web pages are prepared for the automatic classication and extraction. Web subdomains should be classied under language, academic category (Humanities, Science, etc) and discipline(Philosophy, Philology, etc). Furthermore, the information necessary for creating the prole of each subdomain should be automatically extracted. All this information will be used to elaborate a prole and a description of each university department. A series of web indicators will be applied to the information of the subdomains in order to quantify their presence, visibility, impact and popularity. The resultant quantitative values will be used to make a ranking of subdomains/departments per each academic category. In the ranking the top positions will be for those departments whose commitment to the visibility of their information is the largest. The rankings, together with the criteria used in their construction, the recommendations and resources in order to improve the results, will be public available. Finally, we expect that the HLT application allow the development of new cybermetric

[1]  Juan Martínez-Romo,et al.  UNED at WebCLEF 2008: Applying High Restrictive Summarization, Low Restrictive Information Retrieval and Multilingual Techniques , 2008, CLEF.

[2]  Isidro F. Aguillo Caño Problemas técnicos, metodológicos y documentales en la elaboración de Rankings basados en indicadores Web , 2009 .

[3]  Juan Martínez-Romo,et al.  Retrieving broken web links using an approach based on contextual information , 2009, HT '09.

[4]  Lourdes Araujo Serna,et al.  Sistema de recomendación para la recuperación automática de enlaces web rotos , 2008 .

[5]  Isidro F. Aguillo Web Networks of Collaboration , 2009 .

[6]  Arkaitz Zubiaga,et al.  QEAVis: Quantitative Evaluation of Academic Websites Visibility , 2009 .

[7]  Isidro F. Aguillo Caño Métrica de repositorios y evaluación de la investigación , 2009 .

[8]  José Luis Ortega,et al.  How old is the Web? Characterizing the age and the currency of the European scientific Web , 2009, Scientometrics.

[9]  José Luis Ortega,et al.  Comparing university rankings , 2010, Scientometrics.

[10]  Víctor Fresno-Fernández,et al.  Una Representación Basada en Lógica Borrosa para el Clustering de páginas web con Mapas Auto-Organizativos , 2009, Proces. del Leng. Natural.

[11]  Juan Martínez-Romo,et al.  Detección de Web Spam basada en la Recuperación Automática de Enlaces , 2009, Proces. del Leng. Natural.

[12]  Juan Martínez-Romo,et al.  Recommendation System for Automatic Recovery of Broken Web Links , 2008, IBERAMIA.

[13]  Isidro F. Aguillo Ranking Web of Repositories Metrics, results and a plea for a change , 2009 .

[14]  Isidro F. Aguillo,et al.  Minería del uso de webs , 2009 .

[15]  Julio Gonzalo,et al.  Combining Evaluation Metrics with a Unanimous Improvement Ratio and its Application to the Web People Search Clustering Task , 2009 .

[16]  Isidro F. Aguillo Measuring the institution's footprint in the web , 2009, Libr. Hi Tech.

[17]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[18]  Isidro F. Aguillo,et al.  Género y visibilidad Web de la actividad de profesores universitarios españoles: el caso de la Universidad Complutense de Madrid , 2009 .

[19]  M. Felisa Verdejo,et al.  Testing the Reasoning for Question Answering Validation , 2008, J. Log. Comput..

[20]  Lourdes Araujo,et al.  Improving Query Expansion with Stemming Terms: A New Genetic Algorithm Approach , 2008, EvoCOP.

[21]  Arkaitz Zubiaga,et al.  Content-Based Clustering for Tag Cloud Visualization , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[22]  José Luis Ortega,et al.  Mapping world-class universities on the web , 2009, Inf. Process. Manag..

[23]  Lourdes Araujo,et al.  Comparing and Combining Methods for Automatic Query Expansion , 2008, ArXiv.

[24]  Isidro F. Aguillo,et al.  [Ranking the world's Web of hospitals: status of the hospitals on the World Wide Web]. , 2009, Medicina clinica.

[25]  M. Felisa Verdejo,et al.  Towards an Entity-based Recognition of Textual Entailment , 2008, TAC.

[26]  Arkaitz Zubiaga,et al.  Comparativa de Aproximaciones a SVM Semisupervisado Multiclase para Clasificación de Páginas Web , 2009, Proces. del Leng. Natural.

[27]  Víctor Fresno-Fernández,et al.  Web Page Clustering Using a Fuzzy Logic Based Representation and Self-Organizing Maps , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[28]  Julio Gonzalo,et al.  WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task , 2009 .

[29]  Arkaitz Zubiaga,et al.  Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification? , 2009, HLT-NAACL 2009.

[30]  Arkaitz Zubiaga Mendialdua,et al.  Aproximaciones a SVM semisupervisado multiclase para clasicaci on de p aginas web , 2008 .

[31]  Anselmo Peñas,et al.  Clasificación de Páginas Web en Dominio Específico , 2008, Proces. del Leng. Natural.

[32]  Juan Martínez-Romo,et al.  Web spam identification through language model analysis , 2009, AIRWeb '09.

[33]  Arkaitz Zubiaga,et al.  Getting the most out of social annotations for web page classification , 2009, DocEng '09.

[34]  Lourdes Araujo,et al.  Ranking List Dispersion as a Query Performance Predictor , 2009, ICTIR.

[35]  Julio Gonzalo,et al.  The Impact of Query Refinement in the Web People Search Task , 2009, ACL/IJCNLP.

[36]  Julio Gonzalo,et al.  The role of named entities in Web People Search , 2009, EMNLP.

[37]  Satoshi Sekine,et al.  WePS2 Attribute Extraction Task , 2009 .

[38]  Lourdes Araujo,et al.  STOCHASTIC PARSING AND EVOLUTIONARY ALGORITHMS , 2009, Appl. Artif. Intell..

[39]  Arkaitz Zubiaga,et al.  Clasificación de Páginas Web con Anotaciones Sociales , 2009, Proces. del Leng. Natural.

[40]  Hugo Zaragoza,et al.  Exploiting Morphological Query Structure Using Genetic Optimisation , 2008, NLDB.