Keynotes – DH Estonia

Versification and Authorship Recognition

Petr Plecháč

Institute of Czech Literature, Czech Academy of Sciences /

Institute of Czech National Corpus, Charles University in Prague

Contemporary stylometry has developed extremely accurate and sophisticated methods of authorship recognition. The logic behind them is to tell the author by measuring the degree of stylistic similarity between the text in question and particular texts written by candidate authors. Various style markers are being taken into account for this purpose: frequencies of words, frequencies of parts-of-speech, frequencies of character n-grams, frequencies of collocations… One important aspect of style (of one important form of literature) however seems to be completely disregarded – versification.

The talk will present the ongoing project focusing on whether characeristics such as frequencies of stress patterns, frequencies of rhyme types etc. may be useful in the process of authorship recognition. Some pilot experiments comparing various classification methods (Delta family, SVM, Random forest) and their evaluation with Czech, German, Spanish, and English poetry will be presented.

Petr Plecháč is specialized in quantitative and corpus verse studies. He has been participating in the project of building the Corpus of Czech Verse (http://versologie.cz/v2/web_content/corpus.php?lang=en), in the project POSTDATA maintained by Laboratorio de Innovación de Humanidades Digitales, UNED Madrid (http://postdata.linhd.es), and most currently is being the leader of the project focusing on using versification characteristics for the purpose of the authorship attribution (http://versologie.cz/v2/web_content/projects.php?lang=en).

Detecting language change for the digital humanities; challenges and opportunities (presentation on slideshade)

Nina Tahmasebi

For the last decade, automatic detection of word sense change has primarily focused on detecting the main changes in meaning of a word. Most current methods rely on new, powerful embedding technologies, but do not differentiate between different senses of a word, which is needed in many applications in the digital humanities. Of course, this radically reduces the complexity, but often fails to answer questions like: what changed and how, and when did the change occur?

In this talk, I will present methods for automatically detecting sense change from large amounts of diachronic data. I will focus on a study on a Historical Swedish Newspaper Corpus, the Kubhist dataset with digitized Swedish newspapers from 1749-1925. I will present our work with detecting and correcting OCR errors, normalizing spelling variations, and creating representations for individual words using a popular neural embedding method, namely Word2Vec.

Methods for creating (neural) word embeddings are the state-of-the-art in sense change detection, and many other areas of study, and mainly studied on English corpora where the size of the datasets are sufficiently large. I will discuss the limitations of such methods for this particular context; fairly small-sized data with a high error rate as is common in a historical context for most languages. In addition, I will discuss the particularities of text mining methods for digital humanities and what is needed to bridge the gap between computer science and the digital humanities.

More about Nina Tahmasebi.

Something about the weather. Daily forecasts and the Dutch image of Europe.

Joris van Eijnatten

Historical newspapers offer insight into ‘collective mentalities’. Since these emerge through the iterative nature of information exchange, frequency counts gleaned from media involving a high degree of periodicity, such as newspapers, are one important means of outlining the contours of such a mentality. For domain experts (in my case historians), simple counts produce results that are as intuitively convincing as the complicated ‘shock and awe’ algorithms often vaunted in digital humanities projects. The research for this lecture focuses on twentieth-century Dutch newspapers, employing weather reports to trace popular conceptions of Europe as part of a collective mentality.

As a cultural historian at Utrecht University, Joris van Eijnatten works on various interrelated fields, including the history of ideas, religion, and media. He has written more than one hundred articles and books ranging from the Dutch poet Willem Bilderdijk (1756-1831) to eighteenth-century toleration and the cultural history of communication. His research is based on source material from the early modern period to the present and is now focused on digital humanities in the widest sense of the word. His current project involves digital historical research into popular conceptions in nineteenth and twentieth-century newspapers. Joris van Eijnatten is also an editor of the open-access journal HCM, the International Journal for History, Culture and Modernity.

More about Joris van Eijnatten.

Fake it till they believe it? A quest for authenticity on social media

Andra Siibak

In this technology saturated society the members of older and younger generations alike struggle to find the right balance between the advantages of constant connectivity and the fact that this very connectivity inevitably gives others access to information that was previously considered private. Due to the context collapse on networked publics, social media users are experiencing a never-ending tension between imagined audiences, actual receiving audiences and invisible audiences.

The personal publics that we have created to ourselves on social media usually challenge the users to maintain equilibrium between a contextual social norm of personal authenticity that encourages information-sharing amongst your ideal audience (peers, close friends), with a need to conceal some information from the eyes of the nightmare readers (e.g. parents, teachers, employers). User’s discrepancies between one’s imagined and actual online audience, however, may lead to unwanted consequences and may thus jeopardize the physical, mental or social well-being of a person.

In the current talk I will make use of the findings from different qualitative case-studies to illustrate how the imagined audiences on social media interpret and reflect upon the online identity constructions they have come across on social media. Focus groups with young followers of micro-celebrities (N= 51), and students lurking on their teachers on social media (N=43); as well as interviews with employers carrying out online background checks on job applicants (N=30) will be used to explore what authenticity means in the context of online impression management.

Andra Siibak is a professor in Media Studies at University of Tartu. More about Andra Siibak.

Putting the Reader Closer to Distant Reading

Pavel Braslavski

This keynote lecture has been cancelled.