Applying spatial data in linguistics

Kristel Uiboaed, Siim Antso, Liina Lindström, Maarja-Liisa Pilvik, Mirjam Ruutma
University of Tartu

Geographical information and spatial representation of linguistic data has always been inseparable part of dialectology. Nowadays, possibilities to manipulate and present this kind of data have become highly diverse. For Estonian dialects we have a lot of maps that visualize the data (e.g. Andrus Saareste’s (1955) printed dialect atlas and a large collection of Saareste’s unpublished maps in the archives of the University of Uppsala), but these maps are not digitally available and are therefore unsuitable for technical analysis. Neither it is possible to contrast this data with contemporary research findings.

In this presentation we give an overview of our project which aims to digitize older linguistic map data and make these data publicly available. This material is of interest not only for linguists but also for other researchers in humanities and for wider audience interested in dialects. We outline the workflow of our project (data insertion, mapping, processing). We also focus on the possible applications incorporating this data. For instance, digitized dialectal data can be mapped to the EMK data, which enables us to analyse older data with new technical applications and contrast this information with corpus data. In this format we can also observe the lexical bundles geographically and inspect for instance how lexical distances correlate with geographical distances.


  • EMK = Eesti murrete korpus (2015). [Corpus of Estonian Dialets;]
  • Saareste, Andrus (1955). Petit atlas des parlers estoniens: Väike eesti murdeatlas. [Estonian dialect atlas] Uppsala: Almqvist & Wiksell.