Shared AI: metadata enrichment for the media sector

Home
Projects
Shared AI: metadata enrichment for the media sector

Metadata

Period

October 2023 - December 2024

Want to read more?

GIVE metadata project

The digital meemoo archive contains a huge wealth of audiovisual content. We’re addressing the fact that this content is not easily searchable, due to a lack of descriptive metadata, in the Gecoördineerd Initiatief voor Erfgoeddigitalisering (GIVE - ‘Coordinated Initiative for Heritage Digitisation’) for the government and cultural sector. Building on the systems established there, the innovative Shared AI project sees us joining forces with various players from the media sector for the automatic annotation of media content in a regional context.

This project is part of the Flemish Government’s Resilience Recovery Plan.

Challenge

The inception of Shared AI stems from the difficulty in searching the archival content within the meemoo archive system. This content is often not described well enough or at all, so it is hard to find or completely unsearchable, which in turn makes its reuse harder to stimulate. The solution lies in the addition and expansion of metadata. But catching up on all this work manually is a hopeless task – manual metadata entry is incredibly time-consuming. We are therefore focusing on an automated description process using artificial intelligence (AI) and machine learning techniques.

Deploying artificial intelligence on your own as a small organisation is quite challenging, which is why one of the core elements of Shared AI is the level of collaboration. By combining the archives of regional broadcasters with the VRT archive, we are ensuring:

the scale necessary to employ AI technology efficiently;
the required technical expertise;
discussions around editorial processes, e.g. who do we want to recognise, and who not?

Pictured: Dutch lessons at Arlington House for English girls and women working in the offices of the Dutch government, unknown/public domain, licence: CC0 1.0, via Wikimedia Commons.

Our role

We are collaborating with VRT and regional broadcasters in this project. Given the knowledge gained from the GIVE project – and because meemoo has been digitising, archiving and making the archival materials of these media partners accessible for years – we are taking the lead in this collaborative initiative. We are responsible for the organisation and coordination, as well as for processing the audiovisual materials.

Project partner VRT has already built up lots of knowledge about metadata enrichment via artificial intelligence (AI) and will therefore also focus on implementing shared authorities or sources. Regional broadcasters including AVS, BRUZZ, De Buren, RING TV, RMM and RTV will contribute to discussions about the editorial process and the privacy aspects related to it, e.g. who do we want to recognise or not, and who decides this.

Approach

This project encompasses the collections of all meemoo media partners. We will work with a large portion of the regional broadcasters’ archives and process at least 65,000 hours of audio and video from the VRT archive. To provide this vast amount of archival content with metadata, we will be engaging in the same AI activities as in the GIVE metadata project from October 2023 to the end of 2024. These workflows will remain usable even after the project – building a solid foundation for uniform metadata across media players.

Three AI activities

Speech recognition (speech to text)

Using a third-party system, we’re converting Flemish audio and video content into searchable, high-quality transcripts.

Named entity recognition (NER)

Following on from the quality transcripts produced by speech recognition, we’re using a third-party system to extract personal names, places and organisations – linking them to Wikidata and other authentic sources (also known as authorities) where possible.

Face detection and recognition

Comparing detected faces in video files with a reference set allows us to identify known and relevant individuals. Recognised individuals therefore become part of the descriptive metadata for the video, making them easier to find!

Prefer video?

Don't see a video? Please check your cookie settings so we can show this content to you too.

Edit your cookie preferences here

Bridge builders thanks to authorities

Describing the items of different partners simultaneously is unique and extremely valuable. This approach makes it possible, for example, to identify individuals, locations and organisations across content and even across partners. Combining the metadata and linking it to external sources such as VIAF, Wikidata and the VRT thesaurus also makes the content even more searchable. VRT will leverage their expertise to oversee this final step.

Privacy and ethics

Building on the GIVE project, we will also consider legal and ethical issues with our partners in this project. For instance, who do we want to recognise, and who not? And who makes these decisions? The regional aspect also brings new challenges: is someone who is regionally relevant also relevant at the Flemish level for identification? The different organisations face the same dilemmas and challenges, which is why we will also be tackling this step together.

Reuse

The generated metadata will not only be preserved sustainably at meemoo; the participating broadcasters will also be able to integrate the metadata into their own media management systems and platforms – the ideal basis for further enrichment projects! We are also looking at whether and how to make the metadata we obtain accessible to the general public.