Best practices for archiving social media in Flanders and Brussels

What sources will future scientists consult when researching our times? Social media have become an integral part of the answer to this question. But hardly any of this – eminently fleeting – social media content is currently being archived, mainly because there’s not much knowledge available for how to go about it. What's more: social media and existing archiving tools are constantly evolving, with platforms themselves creating technical obstacles that hinder or even make archiving impossible, and the legal framework is complex and inadequate.

In 2020, meemoo conducted a survey on archiving content about the coronavirus crisis on websites and social media. From 2021 onwards, we took this one step further in this project by building more expertise on capturing and archiving individual social media accounts and archiving hashtags.

Screenshot of the PACKED Twitter account's web archive

Challenge

Expertise on archiving social media is currently very limited – both in archive institutions and at meemoo itself. However, social media play a very important role in communicating about various social topics, and do so by a wide range of actors. As such, the channels serve as a valuable source for research, but are still rarely archived.

This same observation about websites was the starting point for the PROMISE project on website archiving (2017-2019) a couple of years ago, and from 2019 until September 2022, the BESOCIAL project focused on archiving and preserving social media in Belgium. Our major national institutions such as the Royal Library have been appointed, alongside universities, to develop strategies and practices on the subject.

Our role

Meemoo is well placed in the cultural heritage sector in Flanders and Brussels to gather and share knowledge on this subject. In collaboration with archive institutions, we’re helping build expertise throughout the entire sector.

KADOC (Documentation and Research Centre on Religion, Culture and Society from KU Leuven), as the applicant for the project grant, has involved 13 other partners alongside us in this project.

The first phase (1/9/2020 - 31/8/2021) of the project focused on developing best practices for capturing and archiving individual social media accounts. Preliminary research for phase two was also conducted. We were instrumental in:

  • preparing the capture of individual accounts;

  • capturing individual accounts in a series of pilot projects;

  • the influx and preservation of captured accounts;

  • preliminary research into capturing social media using APIs and paid tools, which was developed further in a subsequent phase;

  • project management, communication and dissemination.

After a lot of preparatory research during the first project phase, the focus for meemoo in the second phase (1/9/2021 - 31/8-2022) was on:

  • the further preparation and implementation of the capture of (individual) accounts and social media streams, based on a keyword (event based activity streams) using APIs and third party paid tools;

  • the preparation and development of a sustainable legal model for the capture and archiving of social media;

  • the evaluation and comparison of methodologies, including preparation;

  • the preparation of phase three: access to and re-use of archived social media;

  • project management, communication and dissemination.

In the third and final phase (1/9/2022 - 31/8/2023), we shifted our focus from mere archiving to unlocking, through:

  • research on how to achieve access to and reuse of archived social media content;

  • user research and investigation into tools to meet expectations;

  • research on legal aspects;

  • pilot projects on access and reuse, including testing the use of social media account profile pictures as a reference set for facial recognition;

  • project management, communication, and dissemination.

Finally, the project concluded with a conference where we shared the findings of the three phases.

Approach

Specification

The technical and legal challenges are complex, and there isn’t much practical experience available in Belgium. We therefore decided to only research a selection of the most commonly used channels in this project – Facebook, Instagram, Twitter/X and YouTube. The results form a basis for expanding to other channels in the future.

Pilot projects

The project was developed using specific use cases. Based on the needs of various target audiences, data was captured from a selection of social media channels before being archived and ultimately made available in the different phases of the project. We selected different pilot projects to capture the widest possible variety of requirements and ensure that the project results are relevant for the widest possible group of stakeholders. As well as the pilot project partners, other partners with additional expertise were also involved in a focus group for the project, i.e. KBR (Royal Library of Belgium) and Ghent University.

Workflows

In this project we've developed workflows:

  • capturing individual social media profiles;

  • ingesting and preserving individual social media profiles;

  • archiving hashtags.

For this, we tested the practicality of several tools (both free and paid). We also made use of the export functionalities of the social media platforms themselves. The tools with the top results were chosen and approaches were created around them that were tested in the pilot projects. Our expertise is now available online on CEST in the shape of workflows (only in Dutch) and manuals for the open source tools (only in Dutch).

The contentpartners active in this project used our workflows and evaluated them. Furthermore, an intern at meemoo tested the several workflows using the old Twitter and YouTube account of PACKED, and the artinflanders.be Facebook page. You can read about the results of her internship here (only in Dutch).

Conclusions? Within the range of free open source tools, no single tool can capture all the different types of social media. However, there are some viable options, depending on specific user needs. Further tests with commercial tools revealed that even paid versions do not provide a comprehensive solution. Therefore, we looked for a suitable tool tailored to each social media platform, rather than one tool for all platforms. Additionally, we noticed that the influence of social media platforms and their behind-the-scenes operations impacts the functioning of capture tools. Paid tools Webpreserver and Archive-It emerged as two viable options. But in practice, open source tools were preferred, as the cost of paid tools far exceeded the partners' budget.

Best practices

The pilot project reporting formed the foundation for developing good practices which were then published on CEST, TRACKS and the relevant partners’ knowledge platforms. We also published a report with results from the preliminary research into capturing social media using APIs and paid tools. This report formed the starting point for executing phase 2.

In the second project phase, additional attention was given to the capture of social media streams by keyword. In addition to KADOC and meemoo, project partners ADVN, Amsab-ISG, AMVB, CAVA, Liberas, M HKA and VAi conducted pilot projects in which social media streams were captured based on keywords. They paid attention to the usability of the tested tools and to the quality of the capture itself.

Throughout the course of the project, we also researched the legal framework. This resulted in a couple of recommendations and model documents.

Getting started with archiving social media?

This practical guide (only in Dutch) brings together several practical manuals and workflows.

Partners

  • KADOC – Documentation and Research Centre on Religion, Culture and Society from KU Leuven (project coordinator, together with meemoo);

  • KBR – Royal Library of Belgium (library expert, and partner in the BESOCIAL research project);

  • State Archives of Belgium (public archive expert, and partner in the BESOCIAL research project);

  • Ghent University (digital humanities expert, and partner in the BESOCIAL research project);

  • The Flemish Centre for Art Archives / Museum of Contemporary Art Antwerp (content provider);

  • partners from Overleg Landelijke Archieven Vlaanderen (content providers):

    • ADVN – Archive for National Movements;

    • Amsab-ISG – Institute for Social History;

    • AMVB – Archive and Museum for the Flemish Living in Brussels;

    • CAVA – Centre for Academic and Liberal Archives;

    • Letterenhuis – ‘House of Literature’;

    • Liberas;

    • VAi – Flanders Architecture Institute

  • IMS – KU Leuven Institute for Media Studies (content provider and digital humanities expert).

Do you have a question?
Contact Rony Vissers
Manager Expertise
This page is loading...