GIVE metadata project

Home
Projects
GIVE metadata project

Access & re-use Metadata

Partners

EFRO

Period

July 2021 - December 2023

Related publications

Facial recognition: what are the legal and ethical aspects?

Want to read more?

Much of the content in meemoo’s archive is currently not adequately described. The fourth project in GIVE, Gecoördineerd Initiatief voor Vlaamse Erfgoeddigitalisering (Coordinated Initiative for Flemish Heritage Digitisation), is therefore completely dedicated to metadata. In this project, we’re investigating the possibilities of an automatic description process – a crucial step to improve findability and re-use.

This project is part of the Flemish Government’s 'Resilience Recovery Plan' and has been made possible thanks to support from the European Regional Development Fund (ERDF) [links in Dutch].

The project in 4 minutes

Don't see a video? Please check your cookie settings so we can show this content to you too.

Edit your cookie preferences here

Can’t see the video? Please check that your cookie settings allow us to show you this content. You can change your cookie settings at the bottom of this page. Click on ‘Change your consent’ and select ‘Preferences’.

Challenge

At meemoo, we archive a vast number of audio and video files from cultural, media and heritage organisations. At the end of 2022, the counter was at over 6.5 million items in total, with 2 million items consisting of audiovisual content. Where do all these files come from? We’ve successfully digitised a large proportion of the audiovisual carriers in Flemish cultural archives over recent years, and meemoo’s archive system also accommodates born-digital content.

This mass of content has not or not always been annotated properly, however, and is therefore not easily searchable, which has a negative effect on encouraging its re-use. A file that isn’t described cannot be found and so also not re-used.

The solution is found by adding and expanding metadata, but catching up on all this work manually is a hopeless task – processing metadata by hand takes a long time. That’s why we’re focusing on an automatic description process using techniques such as artificial intelligence (AI), machine learning (link in Dutch) and computer vision.

Our role

Meemoo is responsible for organising and coordinating the GIVE metadata project. We’re opting for services and algorithms that have already been developed as much as possible for this, and cooperating with external suppliers for the implementation. This means we will not need to train or roll out any or only a limited number of new models, unless there is no other option available.

Approach

What are we planning?

Given its funding, this project has an impact on all the collections stored by meemoo, except for those from our media partners. Their collections will be enriched in the project Shared AI. In order to add metadata to the collections of our culture and government partners, we’re launching three activities around metadata creation over the next two and a half years (July 2021-2023). We are focusing on mature techniques for this and are developing workflows that can continue to be used after the project.

Activity 1: speech recognition

In this first activity, we’re focusing on recognising the Dutch language used in some 130,000 audio and video files. This means providing metadata for a staggering amount of over 170,000 hours of content. The speech in the audio and video files will be converted into searchable text with time stamps using existing and commercially available tooling. We're relying on the tooling by Speechmatics.

Photo: Nieuwe televisiezender te Lopik, Jack de Nijs / Anefo, CC0

Activity 2: entity recognition in text

We will then start named entity recognition (NER) on the texts generated in the speech recognition activity. This is how we search for names of people, organisations or locations, for example. Where possible, some of these entities will be linked to existing files in linked open data sources. The underlying technology used in entity recognition is NLP – software that ‘understands’ written texts.

Photo: Gebouw Arbeiderspers Hekelveld, Rob Croes / Anefo, CC0

Activity 3: face detection and face recognition

We’re enriching some 88,000 video files - or 124.000 hours of video material - in the third and final activity, and want to start by detecting faces without immediately naming them. Each face that appears in a video isn’t necessarily a face that we need to attach a name to, after all. Building further, we’ll apply face recognition to the detected faces – opting for a fixed set of faces that we will link to existing public figures. Where possible, we will link to existing data sources such as VIAF, Wikidata and ODIS. In this activity, we will build upon the insights gained in the FAME project. By doing so, we guarantee scaling up the processing of the video content.

Photo: The process of face recognition applied to a photo of actor Josse De Pauw and dancer Fumiyo Ikeda (ca. 1979), Michiel Hendryckx, CC0

Need for legal and ethical framework

We must not lose sight of privacy and a proper legal and ethical framework in this process, especially for face detection and recognition. That’s why we took the first step by carrying out a Data Protection Impact Assessment (DPIA) [link in Dutch] in 2021 already. Besides, we built on a sound ethical framework together with the Knowledge Centre Data & Society and several stakeholders.

We fully acknowledge that technologies like face recognition need to be handled with care. Meemoo-collegues Bart Magnus and Rutger Goeminne wrote a tech blog about the legal and ethical challenges within the FAME-project and the first phase of the GIVE metadataproject.

Read it here

Ready for re-use

A final, essential step is to make the acquired metadata accessible. The metadata gained in the three activities will be shared and made usable through our content partners’ applications and by meemoo, and we will also store this metadata in our metadata infrastructure. This will make the content more searchable for the general public. Furthermore, we’re getting started with data mining – an automatic analysis technique to extract information and knowledge from metadata.

More GIVE projects?

The GIVE metadata project is one of four within the GIVE initiative. In addition to metadata enrichment, the digitisation of newspapers (Primeur), glass plates and Flemish masterpieces is also on the agenda. You can read how we selected these four projects here.

Meemoo is also contributing to other elements in the Flemish Government’s 'Resilience Recovery Plan', in particular for Flemish heritage databases, supervising cultural organisations in their digital collection registration projects and the digital leap in education.

Partners

We’re working with some 120 content partners from the cultural sector in the GIVE metadata project. Go to our partner page to find out which organisations are involved.

Do you have a question?

Contact Matthias Priem

Manager Archiving

T • +32 9 298 05 01
M • matthias.priem@meemoo.be

Partners

Period

Related news

Related publications

Want to read more?

GIVE metadata project

The project in 4 minutes

Challenge

Our role

Approach

What are we planning?

Activity 1: speech recognition

Activity 2: entity recognition in text

Activity 3: face detection and face recognition

Need for legal and ethical framework

Ready for re-use

More GIVE projects?

Partners

Related items

SIRDUKE: pioneering and innovative project to digitise lacquer disks

Enriching performing arts collections with metadata