The Digital Archives of Estonian Dialects

Grethe Juhkason

The Archives of Estonian Dialects and Kindred Languages at the University of Tartu holds sound recordings of Estonian dialects and Finno-Ugric languages, collected since the 1950s, but also unpublished manuscripts, as well as photographic and video materials related to the collection and research of linguistic data.

The digitization of the archive collections has been carried out since 2000 under the guidance of -Liina Lindström and Pärtel Lippus. About 919 hours of sound recordings on tapes, about 191,000 pages of manuscript material, and almost 800 photos have been digitized up to now. The archives continues to grow as field research materials and student theses are added to its collections on a yearly basis.

In 2012, the digital database of the archives was made available for public use at An independent part of the archives is the Corpus of Estonian Dialects (, a database which incorporates transcribed texts of same type in all Estonian dialects. The main aim of the corpus is to make authentic data on all Estonian dialects, gathered and handled according to the same principles, digitally available for researchers. In addition, the dialectal texts are indexed and morphologically tagged. The corpus enables users to comparatively analyze Estonian dialects on the phonetic, morphological and syntactic level. The corpus currently contains more than 1.5 million running words.