FAME: facial recognition as a tool for metadata creation
What if metadata were so complete that you could find all the images of someone in your archive if you knew that person’s name? Doing all the necessary metadata creation and enrichment work manually would be a painstaking task, which is why we’re researching how we can use facial recognition to help.
Digital photo and video collections managed by cultural organisations often lack good-quality substantive metadata. There are various reason for this:
describing these collections manually is very time-consuming;
it often involves large collections which aren’t part of the organisation’s main collection, and so, aren’t a priority;
the nature of the content means there are additional technical requirements: videos need to be played to be able to manually describe the people shown – which is very time-consuming;
videos can be difficult or impossible to play if they have not been digitised, making it impossible to describe their content.
Limited metadata makes these digital collections difficult to find and search through online, so they are harder to look up and re-use. At the start of 2021, we received a project grant from the Minister of Culture Jan Jambon for the FAME project, in which we're developing best practices for identifying people in photos and videos using (semi-)automated facial recognition. We're also investigating how we can use existing metadata to improve facial recognition accuracy.
Metadata and linked (open) data are two of the five knowledge domains we have identified at meemoo. In 2023, we’re aiming to set up a proof-of-concept for the large-scale application of facial recognition technology in the meemoo archive system, for the benefit of metadata creation and enrichment. The pilot projects we’re running as part of FAME will help us prepare for this.
Following our previous collaborations with FOMU (Photo Museum of Antwerp) and MoMu (Fashion Museum of Antwerp) on visual recognition projects, we’ve even researched possibilities for a specific type of visual recognition, i.e. facial recognition, together with several partners.
We’re running three pilot projects in the FAME project:
pilot project for performing artists in collaboration with Flanders Arts Institute
8,300 pictures of productions and 700 (group)photographs
pilot project for sportspeople in collaboration with KOERS (the Museum of Cycle Racing)
a selection of 41,000 photographic negatives by Maurice Terryn from 1969-1978
pilot project for politicians and activists in collaboration with ADVN (Archive for National Movements) and the Flemish Parliament Archive
ADVN: 8,300 digitised pictures from the magazine WIJ and 533 digitised videos by VNOS
Flemish Parliament Archive: 5,300 pictures from the Flemish Parliament and 300 videos of their sessions from 1992-1994 and 2001-2021
Our technical partner for each of these projects is IDLab from Ghent University.
In order to prepare for the FAME project, Liesbeth Bogaert worked with us as a trainee in the summer of 2020. She is a student of Information Engineering Technology at Ghent University, had just completed her third bachelor’s degree, and was supervised by meemoo colleagues Nastasia and Miel, and Prof. Steven Verstockt from Ghent University, during her work placement at meemoo. She worked with a set of around 9,100 digitised photos and 19,500 born-digital photos from Flanders Arts Institute.
Metadata-driven facial recognition
We’re using metadata-driven facial recognition in this project, and – in an attempt to improve and enrich the facial recognition results – researching how we can use our content partners’ metadata and linked open data sources to train the facial recognition algorithms and post-process the software output. In the process, we make use of reference sets.
What's happened so far?
We’ve collected metadata sources for the relevant partners:
File names and wikidata for Kunstenpunt (Flanders Arts Institute).
For the Flemish Parliament’s archives, we’re using the metadata from photos they’ve created themselves. We’ve also collected descriptions of videos from the meemoo archive system and the Flemish Parliament’s open API (link in Dutch). We can use this data to retrieve information about Flemish MPs and Parliament meetings.
For ADVN (Archive for National Movements), we used descriptions of photos and the (limited) metadata from videos from the meemoo archive system.
For KOERS (Museum of Cycle Racing), we used a batch inventory with limited metadata.
We also created reference sets.
Reference photos are portraits of people we can identify on the basis of the metadata sources collected. We were able to identify the people on the photos because we know who’s been a Flemish MP or involved in a production. We used this as the basis for collecting the portrait photos. This was partly done by hand and partly automated using APIs from open data sources such as Wikimedia Commons. Institutions that manage collections – such as Amsab (Institute for Social History), KADOC (Documentation and Research Centre on Religion, Culture and Society from KU Leuven) and Liberas (Liberal Archive) – supplied us with their own reference sets. Note: a lack of metadata meant it wasn’t always possible to create a good reference set for all collections.
IDLab at Ghent University developed a face recognition pipeline and validation tool.
This combination of software makes it possible to identify people on photos. All photos have been processed through the pipeline. The partners involved will validate the results in January 2022 before the pipeline is refined further.
Alongside the many benefits of using metadata, it brings some challenges. Metadata spring from different sources which means they exist in many different shapes. Writing errors, abbreviations and inconsistencies occur regularly, the metadata is not structured or not even present. Several manual actions must be taken to solve these issues.
Technology such as facial recognition should be treated with caution. We’re taking this into account in the project and making sure we heed specialist legal advice, among other things. We also have a legal obligation to process the personal data of the people depicted responsibly. On January 18, we organised the first FAME study day in which we dove deeper into the legal and ethical aspects of face recognition technology.
On February 22 the second FAME study day took place. During this day, we looked into the upscaling of the infrastructure for automatic metadata creation.
The third FAME study day is just around the corner. It takes place on March 29 and will look into the impact of face recognition technology on collection registration.