About digitisation and automated metadata creation: a first major GIVE update

29 Apr 2022

Together with a host of partners, for two years we’ve been working on the Coordinated Initiative for Flemish Heritage Digitisation (Gecoördineerd Initiatief voor Vlaamse Erfgoeddigitalisering - GIVE). Our aim is to digitise a mountain of historical newspapers, glass plates and masterpieces, and take some big steps forward in terms of automated metadata creation. We’ve already done a lot of work since we got started in 2021, which you can read all about in this first update.

The GIVE projects have been made possible thanks to support from the European Regional Development Fund (ERDF) and are part of the Flemish Government’s ‘Resilience Recovery Plan’ (links in Dutch).

Image: 'Men reading newspaper in Nanjing, China', Stougard, 2008, CC BY-SA 3.0

We joined forces with Flanders Heritage Library to digitise more than 600,000 newspaper pages over the space of two years. This is what we’ve done in recent months:

  • In November 2021, Flanders Heritage Library completed the selection procedure for the titles to be digitised in this project. They had to meet three basic criteria: the selected titles are newspapers (1) which were published in Flanders (2) and haven’t been digitised before (3). Newspapers that urgently need to be digitised because they’re at risk of becoming damaged are prioritised.

  • Last month, we found the right digitisation partner. We announced the public tender in January, and made our official selection three months later. We selected Dutch service provider Picturae for this newspaper digitisation, with whom we’ve already collaborated on digitisation projects 2, 4, 6 and 9 in the past.

  • In February and March, together with Flanders heritage Library, we went on a quick registration course with the content partners involved. Every one of the roughly 100,000 newspapers needs to be registered, which means they are each given their own barcode, and a note is made of all the technical characteristics. This all ran smoothly, with the first registration deadline met well in advance with help from some BIS interns.

Now what?

The next step is a very exciting one: we’re putting our heads together with Picturae to get the digitisation underway this month. We’ll start testing in May and June, with the actual production beginning in July.

Image: newspaper from the collection of ADVN

The need to digitise photographic materials has been on our radar for a while, and we’re taking a next step with this glass plates project. Over recent months, we’ve been diligently gathering knowledge, drawing up a logistical process with help from advisory working groups, and fine-tuning our tools and processes. We’ve also finalised our selection of how many glass plates there are and the content partners involved.

Image: cart with glass plates ready for digitisation, UGent.

Smooth preparation

These content partners are working hard to prepare their content. They started registering and packaging their glass plates in February with help from BIS interns Marthe and Ingrid, and this will keep them busy at least until August. In the meantime, the registrars will get to learn all about glass plates at a workshop which we’re organising in collaboration with the Photo Museum in Antwerp.

Tender published again

Because the cost of digitisation was higher than anticipated, we published the tender for a second time last month. The awarding of the contract, which was originally planned for the end of March, has therefore been moved to the start of June. This means the digitisation will take place a bit later than expected, but fortunately this does not have any impact on the general project progress. In the meantime, we’re taking advantage of this slight delay to get everything finalised for the subsequent phases, while our content partners continue to package and register the glass plates.

In the masterpieces project, we make a distinction between 2D masterpieces, 3D masterpieces, gigapixel photography and finally also Flemish masterpieces on paper or parchment – with each requiring their own approach and timing.

The following stages are already complete for 2D, 3D and gigapixel photography:

  • First and foremost, we set up a database with an overview of all masterpieces per category.

  • In consultation with the Flemish Masterpieces Board and managers of the works involved, we made a final selection for digitisation. We’re digitising 11 3D masterpieces, one of which is comprised of 116 terracotta sculptures – the impressive Van Herck Collection.

  • As always, external photographers will take care of the digitisation. In February, we sent the tender for 2D and gigapixel photography to four potential candidates. The contract will be awarded very soon. The tender for the 3D masterpieces is also being sent out this month, and the contract is scheduled to be awarded at the end of June.

We’re digitising 2D works by having them photographed, and 3D objects can be captured using 3D scanning or photogrammetry – two techniques that require a very different approach: photogrammetry involves photographing the object from all angles, with up to 200 photos per object, which special software then uses to create a 3D model; scanning takes less time, with the object captured in situ using a scanner device. Following discussions with a consultant and meemoo’s team of experts, we opted for the scanning technique in this project, which is also a learning process for us.

Image: photography in Memorial Museum Passchendaele 1917, www.artinflanders.be

Now what?

We’re currently finalising plans for doing the photography and scanning, and both will start in May. This isn’t easy because the shoots can normally only take place on Mondays when most museums are closed. Some of the works are out on loan too, and we need to take scheduled restoration works into account as well. The objects in the permanent KMSKA (Royal Museum of Fine Arts Antwerp) exhibition are the priority so that we can scan and photograph them this summer before the museum’s grand reopening.

Image: manuscript from the masterpiece 'De middeleeuwse poortersboeken van Oudenaarde en Pamele', Oudenaarde City Archive

Final preparations are in sight for the Flemish masterpieces on paper or parchment. We’ve put the final touches on the selection and digitised 40 paper or parchment objects which are part of the 26 masterpieces. Various city archives, heritage libraries and a number of church institutions are taking part in this project alongside eight of our content partners.

We’ve also brought a consultant on board. As an expert in conservation and management, restorer Martine Eeckhout is drawing up condition reports for each masterpiece. She is advising us on:

  • the general condition of the pieces;

  • whether it’s allowed and possible to digitise them in their current condition;

  • where the concerns with regard to digitisation are, to ensure any damage to the masterpieces is kept to an absolute minimum;

  • what restoration or conservation works can take place outside of the GIVE project.

All of this is being done with a view to improving the digitisation quality as much as possible.

In the meantime, we’re working hard on the tender which will be sent out at the end of the month. We’ve worked closely with Hans van Dormolen, the driving force behind the Metamorfoze guidelines (link in Dutch), to work out the technical aspects for the tender, because this is the quality standard we want to use in this project.

Now what?

While the remaining condition reports are being drawn up, we’re making further agreements with the masterpiece managers. And in the meantime we’re also improving the metadata model with a view to sustainable archiving and access. We will know who our digitisation partner will be just before the summer, and they will begin testing at the end of June. The actual digitisation will start in September.

Good metadata is essential for accessing and searching digitised content. In the GIVE metadata project, we’re investigating how artificial intelligence and machine learning can help us make progress in creating and enriching metadata. Specifically, we’re getting to work on:

  • speech recognition in audio and video files (also called speech-to-text or STT);

  • named entity recognition in text (NER);

  • face detection and recognition in video.

Image: 'Facial Recognition applied to the production ‘Ik wist niet dat Engeland zo mooi was’ from theatre company Radeis', photograph by Michiel Hendryckx, CC BY-SA 4.0

It’s quite an undertaking, and we’ve been very busy over recent months. For example, we’ve carried out a Data Protection Impact Assessment (DPIA) and prepared for the tender procedure, among other things. You can read more about all this next time. We’re also currently setting up a working group of content partners to serve as a sounding board in this stage of the GIVE project. We want this to help us facilitate good interactions and align the project activities with our content partners’ requirements, and alleviate their concerns as much as possible.

Data Protection Impact Assessment (DPIA)

Activities such as face recognition raise a whole host of ethical and legal issues, which is why we’re proceeding with our projects with caution. In order to satisfy regulations with regard to personal data protection, we carried out a Data Protection Impact Assessment (DPIA) for the GIVE metadata project in close consultation with a Data Protection Officer (DPO) (link in Dutch). We look at this in detail in our tech blog on the legal and ethical aspects of face recognition. We also share insights and issues that arose in the FAME project, in which we’ve been looking at how face recognition can contribute to automatic metadata creation and enrichment.

This page is loading...