Web services at the Center of Estonian Language Resources

Neeme Kahusk, Kadri Vider
University of Tartu, Center of Estonian Language Resources

In order to make digital resources of Estonian language and language tehcnologies accessible, many services are planned and implemented at the Center of Estonian Language Resources (CELR).

The registry of Estonian Language Resources (https://metashare.ut.ee) draws together information about language resources available to do research on Estonian language and to build tools for processing it. The registry provides public access to freely available data, and provides SSO login to data that require limited access. The registry has joined two international networks – META-SHARE and CLARIN. It provides resources with Digital Object Identifiers (DOI) registered at DataCite which makes it easy to cite the data similarly to publications.

For accessing various language corpora, the KORP tool developed at Språkbanken is installed at CELR. It gives access to publicly available corpora and restricted ones.

In order to help researchers to build up their own language resources from various texts, a new tool called Keeleliin is under development. This tool will make it easy to use text-only data with several language processing tools, like morphological analyser and syntactic parser.

Although designed so as to bear in mind the needs of language research and natural language processing, the services at CELR are useful for researchers in other areas of digital humanities and language learning. We are open to all language resource providers, every language corpus, lexical or conceptual resource, tool or service, or language description. DOIs provided by DataCite will enable the users of resources to give proper references and credit.