GIVE update: digitisation and high-speed metadata creation

26 Apr 2023

The GIVE project (Coordinated Initiative for Flemish Heritage Digitisation) has made rapid progress over recent months. The project, which involves mass digitisation and metadata enrichment, will end later this year. Read about how we’re preparing thousands of newspapers, glass plates, masterpieces and hours of archival content for future (re)use.

To find out what’s happening behind the scenes, follow the project on our social media channels. You can expect updates on milestones and digitisation processes, and learn about how we’re adding and enriching metadata automatically.

Have you already used our identification tool, knowyourcarrier.com? This website has been up and running for identifying old video and audio content since 2018. But now you can also go there to find out about your photographic materials, get tips on how to preserve and digitise them, and discover whether they have any heritage value. This expansion to include photographs has been made possible with help from photographic experts and is part of the GIVE project.

The process of immortalising Flemish masterpieces in 2D, 3D and gigapixel has been underway for some time. At the end of last year, we let you know that the digitisation phase for newspapers, glass plates and masterpieces on paper and parchment was about to begin. Four months later, our digitisation partners are now busy digitising, photographing and scanning. We have made great progress already, but there is still a lot more work to do.

Newspapers on track

Around a quarter of the total number of newspapers have already been digitised. We’re consistently monitoring the quality of each page to ensure they meet our strict requirements, and so far, this has always been the case.

Usability is also a focus in our Primeur newspapers project: we’re using optical character recognition (OCR) to make the digitised content machine-readable. This means that not only will you be able to browse the newspapers from your computer, but also easily search for relevant passages.

Pictured: OCR applied to the Vooruit socialist newspaper, 25/09/1914, via nieuwsvandegrooteoorlog.hetarchief.be/en

Flemish masterpieces: a big challenge

Digitisation partner GMS finished configuring a custom-made digitisation set-up for the Flemish masterpieces on paper and parchment last week. This set-up ensures we can digitise of a wide range of valuable artefacts as safely as possible. The handwriting of Guilliam Caudron from the Aalst city archives was first to go in front the lens. We will then relocate the set-up to digitise the remaining 39 valuable masterpieces on paper and parchment in ten other locations.

The photography and 3D scanning of paintings, prints and sculptures from museums and churches is no longer in its infancy: we have already digitised over 80% of all the works. But it still remains a challenging task. Some valuable artefacts are suspended as high five meters, while others require assistance from a professional art handling firm.

Pictured: project manager Lobke with Caudron’s handwriting on the digitisation set-up, Aalst city archives collection, photo by meemoo, licence: CC BY-SA 4.0

Pictured: gigapixel capture in St. Salvator’s Cathedral in Bruges, with colour calibration chart, photo by meemoo, licence: CC BY-SA 4.0

We ventured into unfamiliar territory in this project: how to make a 3D copy of a sculpture? We are now happy to be in a position to share the knowledge we gained.

Glass plates on location

We started digitising over 170,000 glass plates in March. Now, after months of preparatory work, GMS is digitising images in all colours and sizes at a rapid pace. Two-thirds of all the glass plates are going to Sliedrecht, with our digitisation partner taking a mobile studio to two further locations for the remaining carriers. They are currently spending several months at the Boekentoren (Ghent University Library’s ‘Book Tower’), before moving on to the Fotomuseum collection.

Pictured: the digitisation at work in the Boekentoren, GMS, photo by meemoo, licence: CC BY-SA 4.0

We’re digitising an impressively large volume of content in our Primeur newspaper project (together with Flanders Heritage Library) and the GIVE glass plates project. Carefully preparing and registering all these thousands of newspapers and glass plates is an important intermediate step to ensuring a smooth mass digitisation process later on. We started this process in February last year, and are now on the final stretch. We are right on track to safely transport the final newspapers and photos. 

We would once again like to thank all of our content partners and colleagues who have dedicated months to this painstaking work!

Did you know: damage registration

Despite being stored in good conditions, paper and photographic materials often suffer from various diseases and ailments. Broken, torn, mouldy, acidic, discoloured, dirty, with a separating layer of emulsion... all these blemishes are due to the self-destructive nature of the materials, and pose significant challenges when digitising. We need to take note of any signs of damage in order to ensure a smooth process and gain a good overview of the condition of the various organisations’ carrier materials.

Pictured: glass plate with separating layer of emulsion, from Vereniging Ons Tehuis Coulembier, Ieper city archives collection, licence: CC BY-SA 4.0

Interested in damage registration? Flanders Heritage Library conducted research into the state of historical newspapers in Flemish institutions. They confirm the importance of newspaper digitisation on the basis of five case studies.

We store a vast amount of digitised and born-digital content in the meemoo archive system, but a lack of or inadequate annotations make it difficult to search, so it cannot easily be reused. Manually adding metadata to thousands of hours of video and audio is not realistic, which is why we’re turning to methods within the realm of artificial intelligence and machine learning.

Detecting and recognising faces in videos

With input from our content partners, we are currently refining facial detection and recognition software – testing the new face recognition model and adjusting parameters to recognise the people depicted as accurately as possible.

Pictured: facial recognition applied to a video from KADOC KU Leuven

Many ethical issues arise when you let an application such as this loose on a large amount of content. We are therefore building a robust framework together with Knowledge Centre Data & Society and several stakeholders. A second session in January provided a lot more relevant input, including about the role of our content partners. More details to follow at the end of the project!

Converting audio and video into searchable transcripts

In addition to matching faces to names, using speech recognition to create metadata (speech-to-text or STT) and named-entity recognition (NER) are also on the agenda. What does this entail? An external service translates audio files into ready-made text (transcripts), from which we can later extract relevant names of places, people, organisations and other entities.

For speech recognition, we decided to purchase an existing service and launched a public tender. We assessed all the proposed solutions that we received in terms of price and quality, and compared the different options with content that we transcribed manually in order to remain objective. We ultimately selected Speechmatics as our partner and are currently putting the finishing touches on integrating their service into our architecture.

For the next step, we are diligently researching the best model for named-entity recognition (NER) on the transcripts – comparing various open source and commercially available options – and will make a choice before the start of summer.

Pictured: application of NER to an excerpt from a cassation ruling, © IT Daily

A few more final adjustments, and the three applications will be able to start enriching 160,000 hours of audio and 120,000 hours of video files from the meemoo archive system...

What next?

The GIVE project will be completed by the end of 2023. We will then gradually make the digitised masterpieces, newspapers and glass plates accessible on our own platforms, and on our participating partners’ platforms if they wish. We’re also preparing the generated metadata so that it can be re-used easily and efficiently.

This page is loading...