October 19-20, Estonian Literary Museum, Tartu, Estonia
Working with various (archival) material, researchers are often confronted with the problem that although both quantitative and qualitative methods in the humanities are quickly becoming more versatile, efficient and numerous, tools for translingual and transcultural analysis remain underdeveloped. The majority of tools for data mining and analysis are available in the biggest languages only, which makes it difficult to study the smaller or extinct ones or compare them to for example English. In fact, sometimes researchers don’t even have access to source texts of geographically close cultures because of language barrier (e.g. Estonian and Latvian are mutually incomprehensible). In the case of non-textual material, the limitations are not linguistic, in this case the access to metadata and -information, and cultural interpretations are important, and the challenges big data has to offer to non-textual transcultural research. Thus, it may not possible to get an adequate overview of the material or gain accurate insight as the results are blurred by technical difficulties. At the same time, the translingual and -cultural analysis has long tradition in different disciplines – history, religious studies, linguistics, etc – and introducing the possibilities and advances of digital tools, systems, standards, and the results of research to the academic community is greatly needed.
The conference is third in the series of yearly digital humanities conferences in Estonia and includes a special panel on ongoing projects and developments in Estonian digital humanities.
During the whole week the events of the global Open Access Week take place at the University of Tartu Library, with FOSTER training courses on October 15 and October 23.
More information on the homepage of the conference: http://www.folklore.ee/dh/en/events/dh_conference_estonia_2015/
Kalev H. Leetaru (via Skype)
Looking Across Languages: Mass Translation of the World’s News
Imagine a world where language was no longer a barrier to information access, where anyone can access real-time information from anywhere in the world in any language, seamlessly translated into their native tongue and where their voice is equally accessible to speakers of all the world’s languages …
Kalev H. Leetaru is one of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, a Senior Fellow at the George Washington University Center for Cyber & Homeland Security and a member of its Counterterrorism and Intelligence Task Force. In his speech he introduces his GDELT Project, supported by Google Ideas, which monitors a considerable cross-section of the world’s broadcast, print, and web news media from nearly every country each day in over 100 languages and identifies the people, organizations, locations, themes, and events driving global society.
Signals in Stylometry: What Numbers Tell Us About Literary Works
Believe it or not, a computer can tell authors apart by counting the frequencies of some of the most frequent words they use. But the very same authorship attribution methods, based on multivariate nearest-neighbor analysis of vocabulary statistics, can also group authors by chronology, genre, or gender. While the exact mechanism behind this phenomenon remains unknown, it is worthwhile to observe how it persists in a variety of literary text collections in several languages.
Jan Rybicki is Assistant Professor at the Institute of English Studies, Jagiellonian University in Kraków; he also taught at Rice University in Houston. His research combines translation studies, comparative literature and computational stylistics to produce quantitative and qualitative analyses of literary language in the original and in translations. He has also translated ca. 30 novels into Polish by authors such as Amis, Coupland, Fitzgerald, Golding, Gordimer, Ishiguro, le Carré or Winterson.
Call for papers
We welcome contributions from the following areas:
- data-mining (incl visual, multimedia and other data)
- working with data in various languages, incl small or extinct languages
- translingual analysis
- computational ontologies
- cross-linguistic and -cultural research in the field of digital humanities
- applications targeted at (usable with) various languages
- the role of English as lingua franca in digital humanities, assets and drawbacks
- compiling multilingual data collections (e.g. by crowdsourcing)
- ideas, outlooks, projects and developments in Estonian digital humanities
The language of the conference is English. Participation fee of the conference is €50 (please contact the organisers if you would like to apply for a waiver).
Please submit a proposal that contains your full name, institutional and disciplinary affiliation with a very brief academic CV, the title of your paper and an abstract of 200-250 words by September 15, 2015. Send your proposals to: firstname.lastname@example.org
Mari Sarv & Liisi Laineste
Estonian Literary Museum
Workshop “Stylo: a Tool for Computational Text Analysis” on October 20 (afternoon)
by Jan Rybicki
Stylo is a package written for R, the open-source and cross-platform statistical programming environment. It performs a number of text-analytical workflows from text input through token recognition to graphing the results that are useful in authorship attribution and computational stylistics. Thanks to its powerful graphic user interface, it can be used by beginners and non-programmers after short instruction. In this workshop, the participants will be able to conduct their first attributive or stylistic analysis, perhaps even on their own text corpora.
Workshop can accommodate 15 participants, who should bring their own laptops (Windows, preferably).
Please register email@example.com
Text Reuse Workshop on October 21 (full day)
by eTRAP team: Marco Büchler, Emily Franzini, Greta Franzini, Maria Moritz
(GCDH, University of Göttingen)
In addition to the regular conference program, the eTRAP Team from the Göttingen Centre for Digital Humanities (Germany) is offering candidates the opportunity to participate in a workshop on Text Reuse.
If you are interested in exploring text reuse between two or among multiple texts (written in the same language) and would like to learn how to identify reuse yourself semi-automatically, this workshop is for you. The workshop seeks to teach participants to independently understand, use and run the TRACER tool, created by Dr. Marco Büchler, in order to detect and visualize text reuse in multifarious corpora, be those prose, poetry, in Arabic or Estonian.
To provide everyone with adequate (technical) assistance, the workshop can only accommodate 10 participants. If you would like to apply, please send your CV and a motivation letter to firstname.lastname@example.org by September 15, 2015.