A hands-on data exploration & challenge to become a derived data-set author on the British Library’s open data-set platform (

Speaker: Mahendra Mahey

  • Do you want to understand some of the challenges of working with cultural heritage data in a large national library such as the British Library?
  • Do you want to explore and get some ‘hands-on’ experience of working with the British Library’s digital collections and data?
  • Do you want to leave a ‘legacy’ of being a data-set author/creator/curator on the British Library’s data-set platform?
  • Do you have some digital literacy in using familiar data exploration tools such as Microsoft Excel (see ‘GUIDANCE FOR THIS WORKSHOP‘ below)?

If the answer is ‘Yes’ to any of these, then this workshop could be for you!

Mahendra Mahey, manager of British Library Labs (BL Labs) will examine some of the BL’s digital collections/data & discuss challenges he has had in making the BL’s cultural heritage data available openly or onsite at the British Library.

Mahendra will invite delegates to explore data-sets at their leisure, setting a challenge for those who are interested, skilled in exploring, finding patterns and grouping data. They could become data-set authors/creators of derived data-sets, based on pre-existing digital collections/data provided on the day or already available on

The workshop will conclude with reflections from the delegates and possibly highlighting a number derived data-sets that were generated by participants on the day that could now potentially exist on If selected, these new derived data-sets will be attributed with the creators’ / authors’ details and each will have its own cite-able Digital Object Identifier (D.O.I). These new data-sets would then be available for reuse by any researcher in the world.


We strongly recommend you come to this workshop with an appropriate device such as a laptop pre-installed with appropriate tools to analayse different kinds of data-sets, e.g. Microsoft Excel may work with smaller data-sets such as metadata (see other data exploration tools below). If you don’t have one, and would still like to attend, please request to ‘pair up’ with someone who is willing to share and has already signed up.

Other data exploration tools include: Notepad++ (e.g. for viewing text and XML); Open Refine (e.g. for cleaning data); Tableau Public (e.g. for visualising data); Google Sheets (e.g for visualising geo-spatial data); Spacy (e.g. for text and data mining), RStudio (an open source Statistical package), MATLAB (data analysis tool) & NLTK (Natural Language processing). 

Please note that this workshop is NOT about training you in using any of these tools, just tools you may be already familiar with to explore and find patterns in our data.

Datatypes you may be examining in this workshop could include: .ZIP.PDF.TXT.CSV.TSV.XLS, .XLSXRDF.ntXML (TEIALTO and bespoke), .JSON.JPG, .JPEG, .TIFF and .WARC

Please ensure you are able to read these files on your device before the workshop if you are interested in exploring them during our session.

Cadaver.exe (Forays into New Digital Poetry)

Speaker: Justice (Ruby) Thélot

New ideas exist in front of us, in the web of context, they are dormant and we need to unearth them. Borrowing from the practices of Surrealist and Automatist writers and artists, this workshop seeks to utilize the collective unconscious (accessed through the “feed”) in order to come up with funny, exciting, new, associations. What we will be doing basically is a digital exquisite cadaver with our timelines. Copying and posting screen-grabs onto a digital canvas at random. We will utilize our own likes, follows and cookies to pierce through and bring forth the unconscious language of the timeline.

Participants will come out of the workshop with digital collages, poems, pot-pourris of posts which they can assemble into mini-zines, digital or even analog by printing them. The goal being to broaden the scope of what can be considered as “digital literature”.

I want participants to PLAY! An essential part of Dadaist philosophy was indeed the notion that we could bring a child like energy and fun to art, poetry and writing. This workshop is sure to create uncanny juxtapositions and engender laughter in the group.

This is essentially an exercise in new-automatic-writing. We are composing new texts (poems) or collage from posts that already exist. Scrolling has become a modern ritual. We do it religiously 5 times a day, at dusk and at dawn. We will unearth the unspoken text of our timelines, we will see our activity of scrolling under a new lens and effectively we shall transform this often passive looking process (looking at the feed) into an active creative process (making something with the feed).

Python for Digital Humanities

Speaker: Sree Ganesh Thotempudi

The first part of the workshop will be spent learning the basics of the Python programming language. We will start from the assumption that the students have never used Python and move them through the basics of the language. The second part will focus on using these newly gained programming skills to automatically manipulate XML data. Using the lxml and Beautiful Soup Python libraries, we will take data from the Perseus repository and convert the TEI-XML in the repository to CTS-compliant XML and then feed the data back into the Perseus repository. In this second part, the students will design and implement their own XML manipulation pipeline that they will then be able to use later to automatically manipulate or convert large corpora of XML texts from one format to another.

Besides the basic skills in Python programming, the students will also learn to manipulate text using regular expressions, to expand the capabilities of Python by installing external libraries, and to use Python to make basic API calls, using the Perseus API as an example case. We expect you to have some familiarity XML and, preferably, some experience with a scripting or programming language. We also expect you to have either a Linux or Mac computer. Any Windows users will be expected to install a Linux virtual machine before the start of the workshop.

Accessing texts and data in the collections of the National Library of Estonia

Speaker: Peeter Tinits

The workshop presents a practical introduction to the use of textual data and metadata within the collections of the National Library of Estonia.

For example, the collections currently house a digital collection of around ~3.4M pages (~6.0M articles) from ~2,000 periodicals (1821…2019) and ~25,000 book-length publications as well as a metadata registry for ~300,000 printed publications published in Estonian or in connection to Estonia among other sources. These materials are the result of decades of digitization and data collection, while steps have always been taken to make the collections useful also for researchers. With the developments of the technological toolkit of a researcher in social sciences and digital humanities, these collections can find new value also within the research communities.

The workshop will offer a practical introduction to the access of textual data and metadata in the library collections, explain to what degree this can be done and how, and offer some use cases for this. It will introduce ongoing efforts at the library to keep improving this access and discuss also the future plans on this. Particularly, the workshop will look to its participants to understand the features that would make it most useful for researchers, and interesting for the general public. The participant should expect to walk away with some practical understanding of how the data can be accessed as well as ideas on what this could be used for.

Language Resources for Content Search and Metadata Search

Speakers: Kadri Vider, Neeme Kahusk, Olga Gerassimenko

The Center of Estonian Language Resources offers a 3-hour tutorial of main types of European language resources for the broader audience of DH researchers that might be interested in language resources usage.

Digital Humanities researchers (max 20 participants) interested in using Language Resources are invited to learn to perform content and metadata search in the Estonian and European Language Resources. We will demonstrate the main types of Language Resources of CLARIN, European Research Infrastructure of Language Resources and Technology, and Center of Estonian Language Resources as CLARIN national center.

1. KORP is a corpus query system that allows to perform flexible and intricate corpus searches based on all the features tagged or systematically appearing in content and metadata. KORP was created in Göteborg University Language Bank Språkbanken and is being developed in several other countries besides Sweden and Estonia: Finland Language Bank Kielipankki, Norway Centre for Saami language technology Giellatekno, Denmark, Iceland.

Tutorial (90 minutes) will teach to use the corpora found in Estonian and Finnish KORP, to make best use of simple, extended and advanced search interface, to export the results for the work with statistics programs. We will demonstrate the restricted access (text and speech) corpora that are available to the academical users through a Single Sign-On technology.

2. RABA is an Estonian Federated Content Search system that uses both text and speech corpora and lexical resources such as dictionaries to perform quick and efficient content search in differently annotated differently organised resources.

Tutorial (60 minutes) will teach to use simple and advanced query interfaces and make use of different data collections included in the content search. We will also demonstrate the usage of CLARIN Federated Content Search that covers other CLARIN data resources and collections.

3. Virtual Language Observatory is the most comprehensive CLARIN browser search system built to automatically harvest language resources, tools and services of CLARIN centers and to explore the large number of resources from various domains and providers in a uniform way.

Tutorial (30 minutes) will show the possibilities of narrowing and broadening the search in VLO to find specific resources or similar resources. We will demonstrate the smart usage of search facets and visualized uniform information about resources. Estonian register META-SHARE that provides VLO with data will also be demonstrated and taught in order to register the new resources that the participants of the tutorial might wish to add to the database.

The tutorial will be held in English with a focus on the Estonian resources but the knowledge of Estonian is not required to participate. The participants would need to bring their own laptops. Stable internet connection is necessary.