Navigating poetic texts in different languages, dialects and orthographies

Kati Kallio
Finnish Literature Society

My aim is to discuss the possibilities and challenges of a big, heterogeneous text corpus of Finnic oral poetry. The paper is based on the work done in the Finnish Literature Society by Senni Timonen, Jukka Saarinen, Lauri Harvilahti and others with the corpus of so-called Kalevala-metric or runic poetry “Suomen kansan vanhat runot” (SKVR) in Finnish, Karelian and Ingrian languages. The project has been connected to the work in the Estonian Literary Museum with the Estonian Runic songs’ database.

The SKVR-corpus consists of 89 247 texts, recorded in 1564–1947 mostly by handwriting, from Karelia, Ingria and Finland. Thus, it represents various languages, dialects, orthographies, personal writing styles and traces of different modes of performance. The naming of places and persons, and the amount of contextual information varies a lot. The poems were published as books (1908–1948, 1997), and formed into database with a thematic index. Besides the material in the database, there is still a considerable amount of unpublished poems and sound recordings in the archive.

As a large, structured corpus, the SKVR-database might be suitable for various big data -analyses. The problem is the very heterogeneous character of the material. In order to make valid analysis of the poetics, metrics, themes or uses of the poems, enough metadata on the linguistic and cultural variation and different recording strategies should be produced. Ideally, the corpus should also be made comparable to other similar corpuses of oral poetry in Europe or worldwide.