CLARIN’s virtual language observatory (VLO) under scrutiny - the VLO taskforce of the CLARIN-D centres

Finding language resources poses a challenge for researchers from the humanities and social sciences using the CLARIN infrastructure. The challenges are in terms of usability of language resources, completeness of knowledge about existing resources and detail of information about possibly useful resources. The question of detail is related both, to the level of description and in the granularity of resource, i.e. if resources are grouped as one or if the parts are separated and treated as separate resources. CLARIN provides language resources including corpora, lexical resources, software tools, webservices, etc. The number of resources is huge and getting an overview is virtually impossible without technical assistance. For this reason, a specialised search and discovery service was created, the VLO, which allowed a faceted search for language resources that provide descriptive metadata at the CLARIN centres and other registered institutions providing metadata in accepted formats via OAI-PMH. The content of the facets are based on the content of the metadata files and mappings of data categories onto predefined facets. Due to the variability of the CMDI metadata framework (see Broeder, et al.,2010), various resource objects and types can be described and though the descriptions may be similar in some detail, they will be different as many other areas as there are types of resources. At present (June 2014) about 650 000 resources are searchable via the VLO, claiming to be about 250 resource types. The huge variety of types notwithstanding the question if this is justified or not creates a complexity for searches: individual search terms may not partition the search space significantly if they are too general while at the same time if they are too specific they are useless for faceted search or for guiding users to resources they are not aware of but where they have some characteristic features. Thus, huge amounts of resource descriptions are put together within the VLO and queries across this stock should be possible not only via string search but also via filtering methods, i.e. via lists of searchable categories provided by the facet browser. In an internal review process, it was apparent that the challenges were not completely met: resources were not easy to find, the facet values were inconsistent and confusing to users, the descriptions were problematic and the usability of the search interface was falling behind expectations.