Metadata and more: a second major GIVE update30 Jun 2022
Together with a whole host of partners, we’ve spent two years getting behind GIVE - Gecoördineerd Initiatief voor Vlaamse Erfgoeddigitalisering (Coordinated Initiative for Flemish Heritage Digitisation). This project is all about taking steps to automate metadata creation and digitising masterpieces, glass plates and newspapers We last provided a detailed update in April. Please read on to find out where we are as we enter summer.
This sub-project is focusing on how we can use artificial intelligence and machine learning to help to create and enrich metadata. We’re getting started with content from 120 cultural partners and delving into:
speech recognition in audio and video files
entity recognition in text
facial detection and recognition in video
Where are we with these three aspects?
We’re using speech recognition (also called speech-to-text or STT) on more than 160,000 hours of audio in this project, and are currently busy selecting an external partner. We published the tendering procedure in March 2022, with 12 candidates submitting applications. We’ll be making a decision this summer.
How? Over recent months, we’ve been collecting all sorts of audio and video files that we’ve rubber-stamped as ground truth or reference materials. An external agency has already transcribed these clips, and we will now use a benchmarking tool, which we’ve developed on the basis of the EBU benchmark STT tool, to compare these transcriptions with the speech recognition generated by the candidates in the tendering procedure. Other quality characteristics and the price will also help to determine our ultimate decision.
We will start the named entity recognition (NER) aspect with an internal assessment. We’re currently working on an exploratory phase, with the real start getting underway in autumn 2022.
Metadata working group gets started
In the meantime, the working group which we briefed you about last time met together for the first time on 7 June. Our aim for this working group is to achieve good interaction with the relevant content partners and to align the project activities as closely as possible with their needs and concerns, e.g. with regard to privacy, ethical issues and reuse. This second topic was the subject of a special tech blog, which you can read here. We’ve also appointed an external party (IFORI) to outline the legal framework within which we can and need to work.
The grants for all the sub-projects in the GIVE masterpieces project (2D, gigapixel, 3D and masterpieces on paper or parchment ) have now been awarded. They’ve been allocated to Cedric Verhelst (2D), Rik Klein Gotink (gigapixel), De Logi & Hoorne - Erfgo3D (3D) and GMS (masterpieces on paper or parchment). You can find all the tender applications in the knowledge database.
The grant for the GIVE glass plates project was also awarded to GMS at the start of June. Read the tender application here.
Digitisation (test) phase gets started
The actual production phase of the Primeur newspapers project will start next month! Following the awarding of the grant in April, and the test and pilot phase in May and June, we’re getting started with digitising the first newspaper pages.
In the GIVE masterpieces project, we make a distinction between 2D masterpieces, 3D masterpieces, gigapixel photography and masterpieces on paper or parchment. The 2D and gigapixel aspects have been awarded, and work has officially started so shooting can begin.
The other two aspects of the GIVE masterpieces project – 3D masterpieces and masterpieces on paper or parchment – and the GIVE glass plates project need just a bit more patience. The test and pilot phases are scheduled to start this summer.