Peter Verhaar (Centre for Digital Scholarship / Leiden University Centre for Arts in Society)

Forging Active Partnerships: Academic Libraries As Collaborators In Computational Research

As more and more scholars try to harness the innovative affordances that can emanate from the use of digital technologies, it becomes increasingly important for academic libraries to develop appropriate forms of support for digital humanities research. At Leiden University in the Netherlands, this challenge is addressed actively by the Centre for Digital Scholarship (CDS), which is located physically and organisationally within Leiden University Libraries. Since its foundation in 2016, the CDS has aimed to stimulate researchers to experiment with computational methods and with the numerous possibilities associated with open science.

Next to offering consultancy and training on topics such as open access, data management and digital preservation, the CDS has also been in the process of exploring new services in the fields of data science and the digital humanities. Supporting scholarship based on data science is currently a new task for academic libraries, and, as is the case at many other institutions,1 the CDS is still in the process of evaluating the types of services that can productively be offered within this context. This exploration is complicated by the fact that data science is an exceedingly broad and dynamic research area, covering diverse topics such as text and data mining, natural language processing and machine learning.

During the last two years, the CDS has developed valuable insights on how to support digital humanities scholars, mainly by conducting a number of projects in close collaboration with researchers. Via such scholarly partnerships, it has become clear that there are at least two important ways in which academic libraries can be of value to digital humanities research. A first important observation is that, while many humanities scholars suspect that computational and algorithmic methods can enrich their research, few of them have the skills and the knowledge that is needed to get started at a practical level. Academic libraries which have developed expertise in this field can familiarise such relative novices with the central tools, methods and approaches. As a first important service, librarians can choose to participate in research projects, and to enable humanities researchers to answer their central questions via computational methods. Secondly, since many academic libraries have invested in developing knowledge about digital preservation and on technical interoperability, librarians can often help scholars to ensure that their outputs continue to be accessible after the completion of their projects, in such a way that these results can be reused by colleagues.

This paper discusses two projects that have been instrumental in shaping the CDS’s vision on how to support the digital humanities. The first of these centred on historical data about the military invention by the Dutch government in Indonesia after the second world war. The project involved researchers affiliated with the Royal Netherlands Institute of Southeast Asian and Caribbean Studies (KITLV), and the Netherlands Institute of Military History (NIMH).2 The researchers based their analyses predominantly on egodocuments such as letters, diaries and interviews. Many of these egoducuments are part of the collection of Leiden University Libraries. In 2017, the CDS has begun a project to explore whether the insights that were arrived at using traditional scholarly methods could be replicated, and perhaps even enhanced, using digital methods. The egodocument had firstly been scanned and converted into machine-readable text via OCR. The texts in the corpus were subsequently subjected to a range of computational analyses. During a first phase of the project, the aim was primarily to identify the words that were used within the corpus to characterise the native Indonesian soldiers fighting for the Dutch army, and to trace relevant historical developments in the attitudes that were expressed. Following the approaches taken by Benjamin Schmidt3 and by Kutuzov et al.,4 the words that were associated with the indigenous soldiers were investigated on the basis of word embeddings. The attitudes of Dutch soldiers were investigated additionally via techniques in the field of sentiment analysis. A second objective of the collaborative project was to build software for the recognition of specific acts of violence within the digitised texts, to be able to make an assessment of both the nature and the frequency of particular war crimes. Using word embeddings, topic modelling and collocation analysis, a lexicon of relevant terms was produced, and this lexicon was used to quantify occurrences of war crimes and to visualise their dispersion, both with individual documents and within the corpus as a whole. The project productively enabled the CDS to acquire practical experience with some of the central methods in the field of data science.

Next to this first project, the CDS has also carried out a project which aimed to enhance the reusability of research data. The project was initiated by a professor of book history at Leiden University who, over the course of his academic career, had produced a large number of born-digital text documents containing semi-structured research annotations. The archive contains detailed descriptions of people, organisations, and events related to the Leiden book industry in the early modern period. The archive has been created based on a logic that has evolved organically over the course of several decades. All the descriptions are based on a consistent system, but without an explicit explanation of the codes, abbreviations, and notational conventions defined by the archive’s creator, it can be difficult for researchers and archivists to make a productive use of this rich source of information. To improve the reusability of the data, the CDS has convert the semi-structured research annotations into a searchable database, based on a well-considered data model. The entries in this database were connected to entries in WikiData, so that the new data set could be integrated more effectively into existing data sets. The data set has also been archived at a data repository, and they have been made available for reuse in full compliance with the FAIR data management principles.5 In the spring of 2019, a university course has also been organised which focused specifically on this bookhistorical archive. Students were given the opportunity to learn more about the historical research underlying the data, and they have been actively involved in the process of restructuring, enhancing and analysing the data within individual research projects.

The various projects that have been conducted in the past few years have enabled the CDS to develop a clearer insight into the types of services that may be offered to humanities scholars. Among other things, it has become clear that librarians who have developed expertise in the field of data science can help to build the right kinds of conditions and terms for computer-assisted research. They can effectively connect researchers to the tools and resources they need, and they can explain to scholars how data science can be applied.

Another important finding is that it is not sufficient only to offer advice or to organise consultancy. The collaboration between researchers and researchers can be much more fruitful when librarians actually proceed to become active research partners and when they are enabled to carry out parts of the research as well. Because of these active partnerships with researchers, the CDS was also able to develop a better understanding of the caveats and the potential shortcomings of digital tools on a more principled level. In evaluating the results emerging from scholarly work based on digital tools, it is crucially important to recognise that human programmers consciously or unconsciously take decisions on the types of results that can be produced by tools. As research instruments, tools almost inevitably introduce a certain theoretical, practical or methodological bias. To explore these crucial issues in more detail, the CDS is currently co-editing a special issue of the Digital Humanities Quarterly dedicated to the topic of tools criticism.6 Research in tools criticism, in short, aims to recognise the bias that exists in scholarly tools, and seeks to evaluate the potential impact of these assumption on research outcomes.7 Next to the special issue for the DHQ, the CDS also organises a symposium about tools criticism in November 2019.

Next to reporting on the experiences acquired during collaborative projects, this paper discusses best practices and requirements that can help to effectuate a productive symbiosis between academic librarianship and the digital humanities.

1 Posner, Miriam. “No Half Measures: Overcoming Common Challenges to Doing Digital Humanities in the Library”. Journal of Library Administration, vol. 53, no 1: Digital Humanities in Libraries: New Models for Scholarly Engagement, 2013.
2 Initial results have been published in Gert Oostindie. Soldaat in Indonesie. Amsterdam: Prometheus, 2015.
3 Schmidt, Benjamin. “Vector Space Models for the Digital Humanities”,
http://bookworm.benschmidt.org/posts/2015-10-25-Word-Embeddings.html
4 Kutuzov, A., Velldal, E. and Øvrelid, L. “Tracing armed conflicts with diachronic word embedding models”. Proceedings of the Events and Stories in the News Workshop, 2017.
5 Wilkinson, M. D. et al. “The FAIR Guiding Principles for scientific data management and Stewardship”. Scientific Data, vol. 3, issue 160018, 2016 .
6 The call for papers can be found at http://www.digitalhumanities.org/dhq/submissions/cfps.html
7 Koolen, Marijn, Jasmijn van Gorp and Jacco van Ossenbruggen. “Toward a model for digital tool criticism: Reflection as integrative practice”. Digital Scholarship in the Humanities, Volume 34, Issue 2, pp. 368–385, 2018. https://doi.org/10.1093/llc/fqy048