Special Panel – DH Estonia

Estonian Language and Culture in the Digital Age 2019-2027

Reili Argus (Tallinn University), Margit Langemets, Jelena Kallas (Institute of the Estonian Language), Liina Lindström (University of Tartu), Pärtel Lippus (University of Tartu) Maarja-Liisa Pilvik (University of Tartu), Külli Prillop (University of Tartu), Mari Sarv (Estonian Literary Museum), Mari Tõrv (University of Tartu), Virve-Anneli Vihman (University of Tartu).

RESEARCH AND DEVELOPMENT PROGRAMME OF THE ESTONIAN MINISTRY OF EDUCATION AND RESEARCH ESTONIAN LANGUAGE AND CULTURE IN THE DIGITAL AGE 2019–2027

The general purpose of the programme is to support the preservation of the Estonian language and culture, increase and strengthen the capacity of R&D in the field of Estonian language and culture in supporting Estonian research, and, more broadly, the development of the whole society in accordance with the sector strategies. Therein, increasing the digital dimension, cross-field nature, applicability, and using the results in further studies and other fields are important. As a result of the programme, the digital dimension in the research of Estonian language and culture is a part of daily life that provides additional value.

During the panel 9 projects funded from this programme give an overview of the topics and activities planned for the next three years.

Reili Argus (Tallinn University)
Data and corpora of Estonian children and youth multilingual communication

The aims of the project are 1) to create Estonian children and youth multilingual communication data set (with several sub-corpora), which would create basis for multilingualism research in Estonia and 2) research on the collected data.

The sub-corpora are: early bilingual communication sub-corpus (recording of every day spontaneous speech), data from Estonian as L2 usage among pre-school and primary school children (spontaneous speech and experimental data), data from young bilinguals’ blogs, Facebook and other social media multilingual communication, multilingual vlogs. Unlike monolingual corpora (standard language, oral communication) or learner language corpora, data and corpora of multilingual communication are rather diverse, and a lot depends on a particular contact linguistic situation. For that reason, such corpora are not numerous and there is no unified set of principles of their creation.

Margit Langemets, Jelena Kallas (Eesti Keele Insituut)
Lexis and planning of Estonian: descriptive and prescriptive aspects

The project aims at lexicological research, focusing on vocabulary and language planning: new words and meanings, and linguistic trends (such as changes in morphology) that require linguistic assessment; the historical stages of language standards; semantic relations of words. The principles of language planning theory will be updated, taking into account the actual language use and new digital capabilities for analyzing information: corpus renewals, corpus query systems, a web platform Neoveille for identifying new words. Lexis is described in terms of corpus-based analysis and principles of language management, and vocabulary, morphology and language advices are analyzed from the descriptive and the prescriptive point of view. New information will be added to the Ekilex dictionary system and to the language portal Sõnaveeb.

The results of the study will be published in the following ÕS (ÕS 2025). The reasonings and explanations are given in clear language and explicitly, so that the language user understands the recommendation and (ideally) wants to follow it.

Liina Lindström (University of Tartu)
Interdisciplinary corpus of Seto

The project aims to compile an interdisciplinary corpus of modern Seto, based on the interviews that were conducted during earlier fieldwork trips (2010-2016 in the eastern part of the area where Seto is spoken, in Russia) and will be conducted during the current project. During the project, about 50 hours of recordings will be transcribed and annotated on at least two levels (morphological annotation, thematic annotation). Audio and video recordings together with transcribed and annotated texts form a corpus, where all these levels of analyses are available. The corpus includes data which is interesting for researchers of different disciplines, such as linguistics, folkloristics, ethnology, anthropology, history, religious studies, etc. In addition to the compilation of the corpus, the members of the project study Seto language and culture, using the corpus as their main data source.

Pärtel Lippus (University of Tartu)
The prosody and information structure of surprise questions in Estonian in comparison with other languages

The project focuses on the prosody and semantics of surprise questions in Estonian in comparison with French, Hungarian and German. Surprise questions differ from information-seeking questions and rhetorical questions in that their main function is to express the speaker’s reaction towards unexpected situation or information. The aim of the project is to pinpoint the prosodic characteristics of surprise questions, and to study how their information structure and modality differ from those of other question types.

In addition to its importance for general linguistics the project is necessary for filling several gaps in the description of Estonian. The results of the project provide new knowledge about surprise questions as a separate category in Estonian and other languages. The project significantly promotes the study of semantic and pragmatic functions of Estonian utterance prosody, using the most up-to-date digital methods as well as promoting international cooperation.

Maarja-Liisa Pilvik (University of Tartu)
Possibilities of automatic analysis of historical texts by the example of 19th-century Estonian communal court minutes

Estonian communal court minute books from the 19th century form an important source for studying Estonian language and cultural history due to their systematic structure and topical contents. The books reflect the life, economic status, and the general mentality of the peasantry at that time, misconducts and disorders, as well as the development process of Estonian literary language. In spring 2019, a crowd-sourcing project was launched by the National Archives of Estonia for inserting the texts from thousands of minute books, thereby making this rich historical source more accessible. The aim of the current project is to enhance the usage possibilities of these communal court minute books by developing the methodology for the automatic processing of historical texts. This involves text normalization, automatic morphological analysis, named entity recognition, and other techniques.

Külli Prillop (University of Tartu)
Digi-OWLDI: Five centuries of written Estonian vocabulary, morphology and phonology

Written Estonian is five centuries old. The written language has gone through many lexical, morphological and orthographic changes, for which reason the old texts are not fully understandable to most contemporary readers.

Many old texts are digitized but the audience of the texts is limited due to the hard to understand language.

Digi-OWLDI (old written language dictionary) helps to preserve all the valuable information stored in old Estonian texts. The dictionary offers knowledge about the development of written Estonian words; it describes meaning changes as well as sound changes in word forms.

Digi-OWLDI is an essential tool for everyone who wants to delve into the fascinating old Estonian texts.

Mari Sarv (Estonian Literary Museum)
Source documents in the cultural process: Estonian materials in the collections and databases of the Estonian Literary Museum

Research in the framework of the project focuses on sources (or their creation) that, according to researchers of the Estonian Literary Museum, reflect important phenomena in Estonian culture. Two main foci of the research are (1) manifestations of the tensions and bottlenecks of society in cultural expression, including on the basis of the egodocuments of key-persons in cultural history, (2) reflections of the changes in worldwiew in cultural texts and variativity of folkloric texts.

The project supports implementation of digital methods and international standards in the management, publication and research of archival sources. The use of existing and emerging research corpora and databases and possibilities of computational analysis will allow for an increasingly better and more evidence-based overview of the various aspects of the information stored in the collections of ELM, as well as of changes in society, culture, mindsets.

Mari Tõrv (University of Tartu)
The Ethnic History of Estonian Peoples in the light of new research

The Project „EREA II“ is about the core question of humanities – who we are and where we come from. “EREA II” will launch a unique digital platform to present the newest scientific results and critical synthesis about the ethnic history of Estonian peoples. This multi-disciplinary project assembles the data from archaeology, history, genetics, linguistics, folkloristics, ethnology and geography enabling more precise and versatile conclusions and generalisations to comprehend the processes behind the formation of Estonian peoples from the first inhabitants to time of rapid changes in 21st century. “EREA II” assembles the data available in different scientific/research collections, databases and archives through the presentation of new interpretations about the ethnic history of Estonian peoples and makes it accessible to the general public in Estonia and to the international scientific community.

Virve-Anneli Vihman (University of Tartu)
Teen speak in Estonia

Teenagers are innovative language users and can inspire broader language change, although their role in language change is still unclear. Hence, it is vital to investigate teenagers’ language usage in order to describe youth language and culture, research language innovation and better understand longer-lasting linguistic and social processes. Today, a great proportion of youth interaction and language practices take place in online environments. Unfortunately, we have little information on teenagers’ spoken language and ‘netspeak’ in Estonia. This project (Teenspeak in Estonia, TeKE) will investigate youth language, aiming to produce the first systematic Estonian corpus of teenagers’ spoken and Internet language, and to thereby inspire further research on teenagers’ language use in Estonia. The project undertakes to investigate language use and code-switching among teenagers, explore language variation by age, gender and region, and compare netspeak to spoken language.