Abstracts 2023 Edition – Computational Approaches to Ancient Greek and Latin Workshop

March 14th, 2023

17.00-18.00 Martina Rodda (Oxford) Should philologists think computationally? Some Homeric thoughts about what AI can bring us (plus discussion)
Computational thinking and Homeric studies have often walked hand in hand, with regular cycles of excitement and disillusion. The diffusion of Parry’s ideas and the understanding of their linguistic implications led to attempts to use formular analysis to work out the chronology of the epic language, solve the problem of dialectal phases, or even reconstruct the whole relative chronology of Greek epic; but these soon gave way to a more nuanced understanding of what formulae do and what they bring to the aesthetics of oral performance, with some amount of side-eye to the rigid quantitative approaches of the past. In the past couple of decades, a dramatic increase in access to both digitised materials and the technology to process them has given rise to a different question: what can digital approaches do for us Homer scholars, and can we perhaps even give something back? Recent work has highlighted, among other key points, how ancient Greek (and Latin), with their small, highly curated corpora, can offer an unusual testing ground for methods that were developed for the massive scales of living languages. My talk will reflect on how we can develop a third outlook – not just new computational tools for old philological questions, or philology as a test for computational approaches, but a philological outlook that will grow together with the advance of digital technologies.

March 15th, 2023

09.00-09.45 Francesco Mambrini (Milan, online) The syntax of the Homeric heroes. A treebank based investigation
In my intervention I intend to discuss how we can explore the characterization of the heroes in the Iliad focusing on the syntax of the poem. The work applies the methods pioneered by J.F. Burrows (1987) in his seminal work on the idiolects of the characters in the novels of Jane Austen, but tries to extend it in two directions. Firstly, by pursuing the same methodology that I have already applied to the Sophoclean characters and choruses, I investigate whether and to what extent these quantitative analyses can be applied to Greek literary texts, and to Homer in particular.
Secondly, instead of focusing on the lexicon, I investigate the distribution of the most frequent syntactic structures among the characters. The data are taken from the complete mopho-syntactic annotation encoded in my Daphne treebank, a revision of Perseus’ Ancient Greek Dependency Treebank based on the “Universal Dependencies” formalism. I propose to investigate the distribution of the 30 most frequent syntactic arcs (represented as triples: part of speech of head, part of speech of dependent, label of the relation) in the direct speeches of the Iliad. Following Burrows, I then create correlation matrixes and apply Principal Component Analysis to investigate the relations and differences among the characters. This program entails a series of crucial methodological questions, concerning both practical and theoretical aspects, which I will discuss during the presentation.

09.45-10.30 Barbara McGillivray (King’s College London) Semantic change and semantic variation in Latin: lessons learnt from computational methods
The past decade has seen a growing interest in automatic methods for semantic change detection from large corpus data in Natural Language Processing (NLP) research. Such systems perform well on large datasets spanning the last few centuries and have been tested on various modern languages, enabling preliminary quantitative studies on lexical semantic change processes at scale. Despite these encouraging results, there is still a lot of work to do to gain new in-depth insights into the mechanisms of semantic change using these methods, particularly on ancient languages.
In this talk I will present my research on developing computational models for semantic change detection, with particular focus on Latin. I will share my experience of working at different scales and in a range of interdisciplinary projects and the lessons learnt. I will touch on some of the open questions in this research area, including accounts for semantic variation alongside change and the relation between NLP research in this area and historical linguistics research.

11.00-11.45 Marco Passarotti (Milan) and Rachele Sprugnoli (Parma) Interoperability and Sentiment Analysis in the LiLa Knowledge Base
Although the main applications of resources and tools for sentiment analysis typically fall in fields such as social media and customer experience monitoring, there is an increasing interest in extending their range to texts written in ancient and historical languages. Such interest mirrors the substantial growth of the area dedicated to building and using linguistic resources for these languages, which are essential for accessing and understanding the Classical tradition. In this talk, we will present the development and publication as Linked Open Data of interoperable sentiment lexicons for Latin carried out in the context of the ERC project “LiLa: Linking Latin”, we will describe different use cases and we will report on the results of a few preliminary experiments of automatic sentiment analysis on Horace poems.

11.45-12.30 Evelien de Graaf (RUG/ KU Leuven) Evaluation of a Method for Automated Sentiment Analysis for Latin epic
Sentiment analysis, a method to mine texts for sentiment, is used to determine the opinion of certain groups towards all kinds of things in a vast amount of texts, such as, for example, a collection of reviews or tweets. In the study of history and more specifically historical literature, the method has recently been receiving increased attention, with a similar goal often to track (public) sentiment towards a certain topic or person over time. In this talk, I will discuss the development and evaluation of one possible method of applying Sentiment Analysis on Latin Literature in order to examine the sentiment towards concepts and characters. This method is indebted to the creation of the LatinAffectus Sentiment Lexicon in the context of the Linking Latin ERC.

13.45-14.30 Paschalis Agapitos and Andreas van Cranenburgh (Groningen) A Stylometric Analysis of Seneca’s Disputed Plays: Authorship Verification of Octavia and Hercules Oetaeus
Contemporary scholars claim that the plays Octavia and Hercules Oetaeus may not have been written by Seneca but by an imitator. We address this disputed authorship problem using three computational stylometric methods, provided by the Stylo package (Eder et al., 2016). In order to capture information below the word level, we use character n-grams rather than word frequencies.
First, we explore the stylistic similarities and differences within the Senecan corpus. A visualization based on Principal Component Analysis (Burrows and Craig, 2001) shows that Octavia and Phoenissae differ stylistically from other Senecan texts.
Second, we probe this potential anomaly further using a more robust method: the Bootstrap Consensus Tree (Eder, 2012). We compare the Senecan texts with several contemporary distractor authors of the Neronian literature, including Statius. Based on the results we cannot rule out Seneca as the author of the disputed texts, since (a) they are clustered within the Senecan branch, and (b) they are stylistically more different from the distractor authors than from Seneca.
Finally, we apply the General Imposters method (Koppel and Winter, 2014), in which the authorship of a disputed text is verified by repeatedly picking the stylistically most similar text from a line-up of texts by the candidate author (Seneca) and imposters (distractors). The success rate of picking the candidate author from the line-up indicates the confidence of the authorship verification. To increase the robustness and reliability of the final result, we use a larger number of distractor authors (31) and repeat the procedure many times. Our results suggest that Seneca the Younger can be verified as the author for both disputed plays with a high degree of confidence, which contradicts the current philological consensus.

14.30-15.15 Lukas Fischer (Zürich) Nunc profana tractemus. Detecting Code-Switching in a Large Corpus of 16th Century Letters
Our research is based on a collection of 16th century letters from and to the Zurich reformer Heinrich Bullinger. Around 12,000 letters of this exchange have been preserved, out of which 3100 have been professionally edited, and another 5500 are available as provisional transcriptions. We have investigated code-switching in these 8600 letters, first on the sentence-level and then on the word-level. We give an overview of the corpus and its language mix (mostly Early New High German and Latin, but also French, Greek, Italian and Hebrew). We report on our experiences with a popular language identifier and present our results when training an alternative identifier on a very small training corpus of only 150 sentences per language. We use the automatically labeled sentences in order to bootstrap a word-based language classifier which works with high accuracy. Our research around the corpus building and annotation involves automatic handwritten text recognition, text normalisation for ENH German, and machine translation from medieval Latin into modern German.

15.45-16.30 Alek Keersmaekers and Wouter Mercelis (KU Leuven) Lemmatization for Ancient Greek: recent experiments to improve the state-of-the-art
While lemmatization is a crucial task for many applications in Classics, the current state-of-the-art is not entirely satisfying, as shown by the relatively low accuracies reported by Vatri & McGillivray (2020). The aim of this paper is therefore to give a systematic overview of categories of words that are particularly difficult to lemmatize with current methods and suggest possible ways to tackle these problems, based on a series of experiments. For this purpose, we will start from the system described in Keersmaekers (2021) that we had been using in the past (viz. a combination of the Morpheus analyzer and a log-linear model of lemmatization using edit tree classification), and assess which specific components of this system should be replaced in order to maximize lemmatization accuracy for Ancient Greek.
References:
Vatri, Alessandro & Barbara McGillivray (2020): Lemmatization for Ancient Greek: An experimental assessment of the state of the art. Journal of Greek Linguistics. Brill. 20(2). 179–196.
Keersmaekers, Alek (2021): The GLAUx corpus: methodological issues in designing a long term, diverse, multi-layered corpus of Ancient Greek. Proceedings of the 2nd International Workshop on Computational Approaches to Historical Language Change 2021, 39–50. Online: Association for Computational Linguistics. doi:10.18653/v1/2021.lchange-1.6.

16.30-17.15 Vojtěch Kaše (West Bohemia) A Distributional Semantic Approach to the Religious and Moral Dynamics in the Ancient Greek Texts
In this paper, I employ the methods of distributional semantics to analyze changes in the semantic association between two subset of words (one from the domain of religion, another from the domain of morality) in ancient Greek texts from the 8th c. BCE to the 3rd c. CE. The strength of the association between the two groups of words and how it changes over time is here measured by means of cosine similarity of vector representations of individual words within models trained on subcorpora of texts from different periods (e.g. archaic, classical etc.). The results of these measurements serve here as a proxy for the extent of moralization of the religious ideology from the respective periods. Drawing on recent theorizing concerning cultural evolution of moralizing religions in the ancient Mediterranean, I relate the textual findings to the socio-economic development of the Ancient Mediterranean and interpret them in terms of the so-called Affluence Hypothesis.