Best practices for archiving social media in Flanders and Brussels
What sources will future scientists consult when researching our times? Social media has become an integral part of the answer to this question. But hardly any of this – eminently fleeting – social media content is currently being archived, mainly because there’s not much knowledge available for how to go about it.
In 2020, meemoo conducted a survey on archiving content about the coronavirus crisis on websites and social media. From 2021 onwards, we’re taking this one step further in this project by building more expertise on capturing and archiving individual social media accounts.
Expertise on archiving social media is currently very limited – both in archive institutions and at meemoo itself. Social media is however being used more and more for sharing audiovisual content. It’s mainly textual context that’s created with primary content on social media platforms, which forms a new type of digital-born heritage.
Moreover, social media play a very important role in communicating about various social topics, and do so by a wide range of actors. As such, the channels serve as a valuable source for (scientific) research, but are still rarely archived.
This same observation about websites was the starting point for the PROMISE project on website archiving (2017-2019) a couple of years ago, and now the BESOCIAL project (2019 - 2022) is continuing this work. Our major national institutions such as the State Archives and Royal Library have been appointed, alongside universities, to develop strategies and practices on the subject.
Meemoo is well placed in the cultural heritage sector in Flanders and Brussels to gather and share knowledge on this subject. In collaboration with archive institutions, we’re helping build expertise throughout the entire sector.
KADOC (Documentation and Research Centre on Religion, Culture and Society from KU Leuven), as the applicant for the project grant, has involved 13 other partners alongside us in this project, which has links to four of meemoo’s five knowledge domains, i.e. preserving different and new formats, metadata, rights and privacy, and digital strategy.
The first phase (1/9/2020 - 31/8/2021) of the project focused on developing best practices for capturing and archiving individual social media accounts. Preliminary research for phase two was also conducted. We were instrumental in:
preparing the capture of individual accounts;
capturing individual accounts in a series of pilot projects;
the influx and preservation of captured accounts;
preliminary research into capturing social media using APIs and tools for third parties, which will be developed further in a subsequent phase;
project management, communication and dissemination.
After a lot of preparatory research during the first project phase, the focus for meemoo in the second phase (1/9/2021 - 31/8-2022) is on:
the further preparation and implementation of the capture of (individual) accounts and social media streams, based on a keyword (event based activity streams) using APIs and third party paid tools;
the preparation and development of a sustainable legal model for the capture and archiving of social media;
the evaluation and comparison of methodologies, including preparation;
the preparation of phase three: access to and re-use of archived social media;
project management, communication and dissemination.
The technical and legal challenges are complex, and there isn’t much practical experience available in Belgium. We therefore decided to only research a selection of the most commonly used channels in this project – Facebook, Instagram and Twitter. The results will form a basis for expanding to other channels in the future.
The project was developed using specific use cases. Based on the needs of various target audiences, data is captured from a selection of social media channels before being archived and ultimately made available in the different phases of the project. We selected eleven different pilot projects to capture the widest possible variety of requirements and ensure that the project results are relevant for the widest possible group of stakeholders. As well as the pilot project partners, other partners with additional expertise are also involved in a focus group for the project, i.e. State Archives of Belgium, KBR (Royal Library of Belgium) and Ghent University.
We've developed workflows in the pilot projects. A workflow is a protocol with a description of:
the successive steps for executing a certain process (e.g. capture, ingest or preservation);
the tools and systems used for this;
stakeholders with a task or responsibility in the workflow.
Specifically, in the first phase we've developed workflows for:
capturing individual social media profiles;
ingesting and preserving individual social media profiles.
At the start of 2021, we tested the practicality of several free open source tools over the course of three months. We also made use of the export functionalities of the social media platforms themselves. The tools with the top results were chosen and approaches were created around them that were tested in the pilot projects. Our expertise is now available online on CEST in the shape of workflows (only in Dutch) and manuals (only in Dutch).
March 2021, the pilot projects were initiated. The contentpartners active in this project used our workflows and evaluated them. Afterwards, an intern at meemoo tested the several workflows using the old Twitter and YouTube account of PACKED. You can read about the results of her internship here (only in Dutch).
Conclusions after the first project phase? Within the range of free open source tools, no single tool can capture all the different types of social media. However, there are some viable options, depending on specific user needs. For phase 2, we selected several paid tools. Further testing will reveal whether there are some paid tools that can solve some of the issues of free open source tools.
The pilot project reporting formed the foundation for developing good practices which were then published on CEST, TRACKS and the relevant partners’ knowledge platforms. We also published a report with results from the preliminary research into capturing social media using APIs and tools for third parties, which will be developed further in a subsequent phase. This report forms the starting point for executing phase 2. In this second project phase, additional attention is given to the capture of social media streams by keyword.
At the end of the process, we will have a concluding study day where we will share results from the project with particular attention for further specific steps for collaboration within the sector. The results from phase 1 are a first step towards this study day. The interim results from phase 1 and 2 will be shared at relevant study days held by third parties in the sector.
KADOC – Documentation and Research Centre on Religion, Culture and Society from KU Leuven (project coordinator, together with meemoo);
KBR – Royal Library of Belgium (library expert, and partner in the BESOCIAL research project);
State Archives of Belgium (public archive expert, and partner in the BESOCIAL research project);
Ghent University (digital humanities expert, and partner in the BESOCIAL research project);
The Flemish Centre for Art Archives / Museum of Contemporary Art Antwerp (content provider);
partners from Overleg Landelijke Archieven Vlaanderen (content providers):
ADVN – Archive for National Movements;
Amsab-ISG – Institute for Social History;
AMVB – Archive and Museum for the Flemish Living in Brussels;
CAVA – Centre for Academic and Liberal Archives;
Letterenhuis – ‘House of Literature’;
VAi – Flanders Architecture Institute
IMS – KU Leuven Institute for Media Studies (content provider and digital humanities expert).