Workshops – DH Estonia

Versification and Authorship Recognition Workshop

Petr Plecháč

28.09.

This workshop will be a hands-on session of the topics addressed in the keynote lecture of the same name. Theoretical background will be given in the lecture. Elementary knowledge of Python programming language may be useful, but not required.

Machine learning methods and results presentation with Microsoft Azure Machine Learning Studio and Jupyter Notebook using language data

Jaagup Kippar, Annika Loor, Kaisa Norak

27.09. 90 minutes

The workshop goes through steps needed for getting results in machine learning system. Creating account, uploading data. Machine learning possibilities and choosing between algorithms. Regression for predicting numerical values, classification for predicting categorical values. Technical examples with primitive data. Linguistic data in Estonian Interlanguage Corpus. Machine learning with labeled data. Using N-grams to compare texts. Data manipulation and presentating results with R-language.

Corpus Query Tutorial based on KORP corpus tool

Olga Gerassimenko, Neeme Kahusk, Kadri Vider (Centre of Estonian Language Resources)

90 minutes, date to be specified soon.

Big language corpora provide us with data on various research domains of digital humanitaria – we can use written language corpora as the source of data on gender attitudes, popularity of proverbs or correspondence contacts of historical figures, even if the corpora were not initially collected with those research goals in mind. In order to plan and to conduct the research in a corpus, we need to know what kind of data our search queries do really elicit and what they fail to find; how to narrow or broaden the search and how to document the search in order to reproduce it when needed. Our tutorial is aimed at non-linguists willing to enhance quality of the searches they make in the corpora through the smart usage of linguistic information available for the corpus data. We will base the tutorial on the Estonian Web Corpus 2013 and Estonian National Corpus search in the KORP corpus tool to demonstrate the usage of the logic operators on the linguistic data. We will also introduce the Corpus Query Processor which can be used both for making intricate searches in the linguistic data and to documenting the searches in a precise and convenient way for the future reference and reproduction.