Putting image recognition into registration practice
This project investigated the usability of automated image recognition as an alternative or complement to the manual descriptions of cultural heritage objects. The results showed that automatic tagging or categorisation by visual recognition services can enhance descriptions that have been added manually, but not (yet) fully replace them.
Manually registering objects and archive items is a very labour-intensive process. Registrations are often limited to a number of formal and administrative characteristics, even though unlocking the content in more detail can make collections more searchable and easier to find. Artificial intelligence (AI) offers powerful solutions for automatically recognising objects, people and even emotions, and this technology is already available on several online platforms (including Google Vision, Clarifai and Microsoft Azure) in the form of Visual Recognition Services (VRS).
This project investigated whether heritage institutions can use this software for the basic registration and content descriptions of heritage objects. An earlier VR4CH – ran by MoMu (Fashion Museum of Antwerp), Datable and meemoo – showed that Visual Recognition Services were very good at creating basic descriptions for images, but we wanted to find out if they could also be used for other applications.
Meemoo took care of the substantive management of this project initiated by FOMU (Photo Museum of Antwerp). We developed a methodology for comparing VRS in registration processes, which we implemented and supported in collaboration with IT partner Datable. The participating content partners then evaluated the test results before meemoo helped to publish FOMU’s final report on CEST.
We formulated four use cases based on our partner requirements:
recognising people on photos from the post-CoBrA art movement (FOMU - Photo Museum of Antwerp)
recognising types of documents and forms from World War II (Netwerk Oorlogsbronnen - War Resources Network)
classifying photos by Beeldbank Brugge (Bruges Image Database) categories (Erfgoedcel Brugge en Stadsarchief Brugge - Bruges Heritage Centre and City Archive)
identifying aesthetically ‘attractive’ photos for communication purposes (MoMu - Fashion Museum of Antwerp)
The steps per use case were:
Collecting a set of test images.
images available via a URL were used as much as possible;
if this wasn’t the case, the images were placed on a temporary server;
metadata that were already available were collated and structured.
Training the VRS for some use cases.
Setting up an architecture with various software components for automating the methodology, i.e.
tagging or categorising the images by one or more VRS (training & classification): images and metadata from DAM and registration systems, among others, are made available temporarily on an FTP server to various Visual Recognition Services (Google Vision, Microsoft Azure, Clarifai, Everypixel)
collating, structuring and validating the results: interim storage of the results in MongoDB, communication between the various components and data processing via KNIME, evaluating results in a viewer (Fotorama), vvalidation (evaluating and manually processing the data) in Google Sheets.
importing the results in the registration systems: validated results of automatic classification are imported in the existing registration systems using KNIME.
The content partners compare the automatic tagging results with their own metadata to assess the tag relevancy and accuracy.
The added value of VRS compared to manual registration comes from its efficiency at dealing with large (and uniform) volumes such as Netwerk Oorlogsbronnen (War Resources Network). It produced perfect results with only minimal training, and was therefore much cheaper and faster than manual registration.
The VRS had to be trained in other cases, and setting up workflows per use case and validating still required a lot of human intervention. Full automation is therefore not a suitable solution here, but the combination of man and machine does provide added value. For example, FOMU managed to save time because the VRS grouped photos of the same person together, after which volunteers were able to indicate who that person was. This made it possible to integrate useful results in the registration systems. You can find the results per pilot project in our final report on CEST.
The technology used is publicly available and affordable, although there’s quite a steep learning curve. Lots of museums will therefore need to recruit external help to develop a model per use case in a trial-and-error process. Nonetheless, the technology is very promising and institutions that manage collections could benefit from looking at how it can be embedded in their activities.
FOMU (initiator), MoMu, Netwerk Oorlogsbronnen, Erfgoedcel Brugge & Stadsarchief Brugge, Datable bvba. The project was made possible thanks to a cultural heritage project grant from the Flemish Government.