GIVE metadata project

Much of the content in meemoo’s archive is currently not adequately described. The fourth project in GIVE, Gecoördineerd Initiatief voor Vlaamse Erfgoeddigitalisering (Coordinated Initiative for Flemish Heritage Digitisation), is therefore completely dedicated to metadata. In this project, we’re investigating the possibilities of an automatic description process – a crucial step to improve findability and re-use.

This project is part of the Flemish Government’s 'Resilience Recovery Plan' and has been made possible thanks to support from the European Regional Development Fund (ERDF) [links in Dutch].

The project in 4 minutes

Don't see a video? Please check your cookie settings so we can show this content to you too.

Edit your cookie preferences here

Can’t see the video? Please check that your cookie settings allow us to show you this content. You can change your cookie settings at the bottom of this page. Click on ‘Change your consent’ and select ‘Preferences’.

Challenge

At meemoo, we archive a vast number of audio and video files from cultural, media and heritage organisations. At the end of 2022, the counter was at over 6.5 million items in total, with 2 million items consisting of audiovisual content. Where do all these files come from? We’ve successfully digitised a large proportion of the audiovisual carriers in Flemish cultural archives over recent years, and meemoo’s archive system also accommodates born-digital content.

This mass of content has not or not always been annotated properly, however, and is therefore not easily searchable, which has a negative effect on encouraging its re-use. A file that isn’t described cannot be found and so also not re-used.

The solution is found by adding and expanding metadata, but catching up on all this work manually is a hopeless task – processing metadata by hand takes a long time. That’s why we’re focusing on an automatic description process using techniques such as artificial intelligence (AI), machine learning (link in Dutch) and computer vision.

Our role

Meemoo is responsible for organising and coordinating the GIVE metadata project. We’re opting for services and algorithms that have already been developed as much as possible for this, and cooperating with external suppliers for the implementation. This means we will not need to train or roll out any or only a limited number of new models, unless there is no other option available.

Approach

What are we planning?

Given its funding, this project has an impact on all the collections stored by meemoo, except for those from our media partners. Their collections will be enriched in the project Shared AI. In order to add metadata to the collections of our culture and government partners, we’re launching three activities around metadata creation over the next two and a half years (July 2021-2023). We are focusing on mature techniques for this and are developing workflows that can continue to be used after the project.

Activity 1: speech recognition

In this first activity, we’re focusing on recognising the Dutch language used in some 130,000 audio and video files. This means providing metadata for a staggering amount of over 170,000 hours of content. The speech in the audio and video files will be converted into searchable text with time stamps using existing and commercially available tooling. We're relying on the tooling by Speechmatics.

Photo: Nieuwe televisiezender te Lopik, Jack de Nijs / Anefo, CC0

Activity 2: entity recognition in text

We will then start named entity recognition (NER) on the texts generated in the speech recognition activity. This is how we search for names of people, organisations or locations, for example. Where possible, some of these entities will be linked to existing files in linked open data sources. The underlying technology used in entity recognition is NLP – software that ‘understands’ written texts.

Rob Croes / Anefo, CC0

Photo: Gebouw Arbeiderspers Hekelveld, Rob Croes / Anefo, CC0

Activity 3: face detection and face recognition

We’re enriching some 88,000 video files - or 124.000 hours of video material - in the third and final activity, and want to start by detecting faces without immediately naming them. Each face that appears in a video isn’t necessarily a face that we need to attach a name to, after all. Building further, we’ll apply face recognition to the detected faces – opting for a fixed set of faces that we will link to existing public figures. Where possible, we will link to existing data sources such as VIAF, Wikidata and ODIS. In this activity, we will build upon the insights gained in the FAME project. By doing so, we guarantee scaling up the processing of the video content.

Photo: The process of face recognition applied to a photo of actor Josse De Pauw and dancer Fumiyo Ikeda (ca. 1979), Michiel Hendryckx, CC0

Need for legal and ethical framework

We must not lose sight of privacy and a proper legal and ethical framework in this process, especially for face detection and recognition. That’s why we took the first step by carrying out a Data Protection Impact Assessment (DPIA) [link in Dutch] in 2021 already. Besides, we built on a sound ethical framework together with the Knowledge Centre Data & Society and several stakeholders.

We fully acknowledge that technologies like face recognition need to be handled with care. Meemoo-collegues Bart Magnus and Rutger Goeminne wrote a tech blog about the legal and ethical challenges within the FAME-project and the first phase of the GIVE metadataproject.

Ready for re-use

A final, essential step is to make the acquired metadata accessible. The metadata gained in the three activities will be shared and made usable through our content partners’ applications and by meemoo, and we will also store this metadata in our metadata infrastructure. This will make the content more searchable for the general public. Furthermore, we’re getting started with data mining – an automatic analysis technique to extract information and knowledge from metadata.

Meemoo is also contributing to other elements in the Flemish Government’s 'Resilience Recovery Plan', in particular for Flemish heritage databases, supervising cultural organisations in their digital collection registration projects and the digital leap in education.

Partners

We’re working with some 120 content partners from the cultural sector in the GIVE metadata project. Go to our partner page to find out which organisations are involved.

Do you have a question?
Contact Matthias Priem
Manager Archiving
This page is loading...