ELEXIS: Technical and social infrastructure for lexicography

Since the European lexicographic community was brought together by the European Network of e-Lexicography (ENeL) COST action in 2013–2017, the following needs have become apparent: the flow of broader and more systematic exchange of expertise, the establishment of common standards and solutions, the development and integration of lexicographic resources, and the wide-scale application of these quality resources to wider research communities. This has resulted in launching the four-year H2020 infrastructure project ELEXIS, European Lexicographic Infrastructure in February 2018 (extended for six months until July 2022).

ELEXIS brings together research and industrial partners from various fields, such as the Semantic Web, Artificial Intelligence, Natural Language Processing (NLP) and Digital Humanities, thus supporting developments in (e-)lexicography in order to open up dictionary data and enable access to lexicographic standards, methods, data and tools.

Among the most obvious outputs of the project are the tools and services it offers. In its first two years, ELEXIS has been enriched by seven different tools which were either developed as part of the project or made freely accessible through its infrastructure, and by the end of the project, the ELEXIS infrastructure is planned to enable and support the whole dictionary creation process. The tools and services already available include:

Sketch Engine. This corpus query system, which existed prior to the project, was one of the first tools made freely accessible to academics and observer institutions in ELEXIS. It includes over 500 preloaded corpora and analysis functions, such as concordancing, building wordlists, compiling word sketches, thesauri and automatic dictionary drafting. https://sketchengine.eu/elexis/ 

Lexonomy. Another infrastructure component which already existed before ELEXIS but whose further comprehensive development continues within the project. This is a cloud-based dictionary-writing and online-publishing system that interacts closely with Sketch Engine. For example, Sketch Engine can push lexicographic data into Lexonomy to create automatically generated dictionary drafts and Lexonomy can pull data from Sketch Engine’s corpora during the entry editing process. https://lexonomy.eu/ 

Elexifier. A brand new cloud-based dictionary conversion service, using advanced XML parsing and machine learning techniques to help convert PDF and XML dictionary data into a standardized machine-readable format. Users can upload PDF and custom XML dictionaries, define mapping rules for XML transformation or create a machine learning training set for PDF conversion and download the transformed XML or PDF dictionary in a TEI-compliant file format based on the Elexis Data Modelhttps://elexifier.elex.is/ 

VerbAtlas. A novel large-scale manually-crafted semantic resource for wide-coverage, intelligible and scalable semantic role labeling. The goal of VerbAtlas is to manually cluster WordNet synsets that share similar meanings into sets of semantically-coherent frames, available both for download and via a RESTful API, featuring resources such as PropBank and BabelNet. http://verbatlas.org/  

SyntagNet. A manually-curated large-scale lexical-semantic combination database which associates pairs of concepts with pairs of co-occurring words. The goal of SyntagNet is to capture sense distinctions evoked by syntagmatic relations, hence providing information which complements the essentially paradigmatic knowledge shared by currently available lexical knowledge settings such as WordNet. http://syntagnet.org/ 

Elexifinder. A search tool dedicated to helping lexicographers and researchers find scientific output in lexicography and related fields. Elexifinder enables users to search through papers and videos, using concepts, that is words or sets of words with a Wikipedia page, and various other conditions, for example source (conference, etc.), author, language, etc. Each paper/video is linked to its page where the user can download or view it. https://elex.is/tools-and-services/elexifinder/http://er.elex.is/. 

Lexicographic news feed. A service using the Event Registry API to extract recent news articles related to lexicography. Articles are extracted from 30,000 news sources, supporting over 35 languages. https://elex.is/tools-and-services/lexicographic-news/. 

Image 1. ELEXIS offers a user-friendly way to create dictionaries or edit and publish existing ones

Besides this extensive technical infrastructure, ELEXIS provides a social infrastructure to foster cooperation and support knowledge exchange among lexicographic communities. Additionally, it is bridging the gap between lesser-resourced languages and those with higher e-lexicographic expertise. One aspect of this social infrastructure is organizing training sessions and workshops at conferences as well as summer/spring schools all over Europe. Due to COVID-19 restrictions, several events had to be canceled this year, but we have managed to overcome the obstacle prohibiting face-to-face interaction for community building by moving several activities online. As part of the GlobaLex 2020 Workshop on Linked Lexicography at the Language Resource and Evaluation Conference (LREC 2020), ELEXIS organized the first shared task on monolingual word-sense alignment (MWSA). While the workshop itself had to be cancelled, the papers and the results are available as part of the proceedings. The goal was to find senses in two monolingual dictionaries (in the same language), that describe the same concept. The MWSA task made use of data in 15 languages from ELEXIS partners and observers. The participants developed strong systems with the overall best system scoring 84% accuracy in sense alignment. 

Furthermore, ELEXIS supports individual researchers and research teams via trans-national access, enabling them to reach facilities and lexicographic resources which are not fully or easily accessible online or where professional on-site expertise is needed. Researchers, scholars and students are invited to apply for a fully-funded short- or long-term research visit to leading lexicographic institution partners. Calls for visiting grants are launched twice a year, in summer and in winter, amounting to seven calls in total during the project period. The travel grant reports as well as mini-interviews with the respective winners from various countries all over Europe are available at https://elex.is/travel-grant-reports/.  

While individual researchers can participate through travel grants, institutions are invited to join the ELEXIS network via observer status. Observing institutions may request new customized lexicographic data or have their existing data enriched and expanded with both monolingual and multilingual information, Moreover, they can access the ELEXIS cloud, tools and open-access resources as well as resources in the partner and observer’s area of the cloud. Observers are notified about newly developed tools, services and activities (e.g. hackathons, tool demo sessions, etc.) aimed at improving and enriching their own lexicographic data. To keep up a sustainable infrastructure after the end of the project in 2022, the observer status guarantees the possibility to participate actively in the post-project stage.

The first Lexonomy Hackaton took place in Brno on 23-25 April 2019

To this end, ELEXIS organized an Observer Event in early 2019, dedicated to inform representatives of various lexicographic institutions on its activities. Institutions from all over Europe (and beyond) have been joining the network: as of June 2020, the Elexis community is made up of 17 partner and 50 observer institutions from 35 different countries (cf. Image 2). In addition, ELEXIS is running a campaign on social media, describing the characteristics of each observing institution – all portraits are collected in the #elexisobserver moment on Twitter.

Image 2. Overview of the ELEXIS network in June 2020

Since community building is a key factor for ELEXIS, it is important to assess the experience and opinions regarding the project’s intermediate outcomes. This is a way to reflect on the work done so far as well as to fine-tune the final outcomes to respond best to the needs of the community. Thus, the ELEXIS impact survey was launched in May 2020, containing 16 questions on different aspects of the technical and social infrastructures. The results have shown that 79% (n=123) of the respondents already knew ELEXIS or were following its activities actively. For most respondents the most important aspects of ELEXIS are the tools and services as well as open access and open data, followed by training and education, knowledge exchange and community building (cf. Image 3). 

Although some respondents did not know ELEXIS before, we were interested to find out how useful specific aspects of the infrastructure might be to them. These turned out to include access to the corpus query tool Sketch Engine, open data and open access, as well as knowledge exchange, training and education (cf. Image 4).

Image 3. Usefulness of ELEXIS services for those who are familiar with the network (Q12, N=97)
Image 4. Potential usefulness of ELEXIS services for those who don’t know the network (Q6, N=26)

The full survey as well as other project reports are available at https://elex.is/deliverables/. 

Additionally, all conference papers, peer-reviewed articles and journal articles published in the course of the project with ELEXIS are available on Zenodo.

Partners

Observers

A Horizon 2020 project
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731015.

Click here for the PDF version of this article.

Anna Woldrich

Anna Woldrich is communication expert at the Austrian Academy of Sciences (ACDH-CH) and has worked previously as social editor and campaign manager. She graduated at Universität Wien/Universitá degli Studi di Siena in mass media and communication studies, focusing on radio, broadcasting, marketing, communication research and communication theory. As a part of ELEXIS project, she is responsible for planning, managing and monitoring on- and offline communication activities, and manages the social media channels and content-management on the ELEXIS website.

Teja Goli

Teja Goli is an assistant at the Artificial Intelligence Laboratory at Jožef Stefan Institute and at the University of Ljubljana, where she has finished her master’s degree in Translation at the Faculty of Arts. Her research interests include translation, corpus linguistics and lexicography. In the ELEXIS project, she is mainly responsible for contact with observers and managing website content.

Iztok Kosem

Iztok Kosem (PhD) is Research Associate at Jožef Stefan Institute and at the University of Ljubljana. His main areas of research are lexicography and lexicogrammar, corpus linguistics, crowdsourcing, and computer-aided language learning and teaching. In ELEXIS, he has the role of Community Manager, and he is heavily involved in the development of Elexifinder, Lexonomy and games with a purpose (gamification).

Ondřej Matuška

Ondřej Matuška oversees sales and marketing activities and external communication at Lexical Computing, and is the main point of contact for information about and user support for Sketch Engine.

Tanja Wissik

Tanja Wissik is a senior scientist and project leader at the Austrian Centre for Digital Humanities and Cultural Heritage of the Austrian Academy of Sciences and she teaches at the University of Graz and University of Vienna. She holds a PhD in Translation Studies from the University of Vienna and in the last couple of years she has been working in a number of European and national research projects in the field of language resources, text technologies and DH methods.