Abstracts 2021 Edition
1 – Alek Keersmaekers, Wouter Mercelis Improving morphological analysis of Greek with Transformer-based approaches: first results with Electra
Transformer-based deep learning models (e.g. BERT, XLNet) have shown to significantly improve to quality of various NLP tasks. A BERT model was also successfully applied to Latin by Bamman and Burns (2020). In this paper, we apply ELECTRA (“Efficiently Learning an Encoder that Classifies Token Replacements Accurately”), a recent transformer-based approach, to Ancient Greek, and discuss how this model can cause a marked increase in labeling accuracy for morphological analysis.
2 – Margherita Fantoli Parsing Latin mathematical texts: concepts embedded in structures
This talk wants to present a joint starting project of Miryam de Lhoneux and Margherita Fantoli. The final goal of the project is to study the evolution of the syntax of mathematical texts from Greek to Latin, by focussing on the Latin humanistic translations of the Archimedean corpus. The first step of the project consists in automatically parsing the Greek and Latin texts to be able to compare their syntactic structure. Here I will discuss the application by Miryam de Lhoneux of the dependency parsing model to the Latin translation of the Quadrature of the Parabola and comment on the first linguistic features emerging from this step.
3 – Martina Rodda Can Distributional Semantics tell us something about Homer?
Distributional Semantics is currently one of the most promising approaches to the study of meaning and language change in Computational Linguistics. The wider availability of big data and of reproducible algorithms for analysis has boosted its application to living languages in recent years. But can we use DS to study a dead language with a limited corpus like ancient Greek? And can this approach tell us something about such vexed questions in classical studies as the language and composition of the Homeric poems? My talk explores some answers to these questions, highlighting both promising results and current challenges.
4 – Rachele Sprugnoli and Marco Passarotti Latin Embeddings and the LiLa Knowledge Base of Interlinked Resources for Latin
In our talk, we present the structure and the linguistic resources currently included in the LiLa Knowledge Base, i.e. a collection of multifarious textual and lexical resources for Latin described with the same vocabulary of knowledge description and interlinked according to the principles of the Linked Data paradigm. We also present a set of lemma embeddings for Latin and a couple of experiments using such embeddings for inducing sentiment lexicons and for analyzing diachronical language change in two Latin corpora.
5 – Silvia Stopponi, Evelien de Graaf, Malvina Nissim and Saskia Peels (RUG) Creating an Open-Access Benchmark for the Evaluation of Distributional Semantic Models of Ancient Greek & Automatic lemmatization of Ancient Greek Inscriptions
The first part of the talk presents a work-in-progress aiming at building from human judgments an open-access benchmark for the evaluation of semantic models of ancient Greek, which will be freely accessible. We address the theoretical and practical problems encountered in the whole process, from the design of the two surveys used for data collection, to the post-processing and analysis of results. We focus in particular on the results of the first survey, and we will discuss some issues that arose during their analysis. We show how this data can be used to evaluate semantic models, but also to compare the knowledge of humans and machines about semantic relatedness.
The second part of our talk focuses on our project regarding the Automatic lemmatization of Ancient Greek Inscriptions. Most existing lemmatizers for Ancient Greek have been trained on literary Greek and will thus perform worse on the highly varied Greek of inscriptions (regional dialect, spelling mistakes etc.). We present our analysis of the performance of three available lemmatizers for ancient Greek: GLEM (Bary et al., 2017), the Classical Language Toolkit lemmatizer (Burns, 2020) and the UDPipe lemmatizer (Straka, 2019) on the manually lemmatized corpus Collection of Greek Ritual Norms project (CGRN, Carbon et al., 2016). Besides this, we present our efforts in the creation of a lemmatizer specifically trained for Ancient Greek inscriptions.