In the aftermath of the Capitol siege, journalists, researchers, police, and archivists are racing to gather evidence as platforms purge content and accounts in record numbers. Although the scramble is reducing the capacity of Trump supporters to stage a second attack, it is also preventing others from identifying and collecting evidence for the trials of those involved in the first one.

This moment shows the need for international data preservation laws that would require technology companies to create processes and protocols that make information accessible for journalists, civil society organizations, law enforcement, and researchers. As platform companies delete an incredible amount of content while the FBI calls on these companies to hold onto the information, it is clear that the absence of clear regulations benefits those who tried to overthrow the U.S. government, and serves authoritarians who use social media to misinform the public.

The internet is a crime scene in the specific sense that its major platforms were used to connect, organize, and coordinate #StopTheSteal. As such, the technologies were not just reimagined, but also took on new meanings last week, even though their features remained largely unchanged. Social media on a lazy Sunday afternoon is of course different from social media during an insurrection. That’s why the context of use—who, what, when, where—is so important to identifying when it is being used for actionable offenses.

For many years, OSINT (open-source intelligence) researchers and journalists have developed methods for the analysis of networked data that has led to a better understanding of the identities of criminals and their motives. Police and journalists are increasingly using social media as a platform for investigations, gathering potential evidence, witness accounts, and other clarifying information, hoping the digital traces they find on social media can provide clues for both legal action and rapid-response reporting. During this last week, some public social media users have become active participants in these investigations, engaging in crowd-sourced research and using both verified facts and misinformation to theorize narratives and sort evidence.

Crowdsourced investigations require swarm intelligence, which relies on a particular design feature—threaded conversations, where groups can gather intel and verify it over time. Forums such as Reddit, Twitter threads, Facebook groups, and anonymous message boards allow large groups of individuals to gather evidence and marshal resources during a breaking news incident, communally building a single narrative about an event. Popular posts on these forums attract increased participation from users, and thus greater visibility on these sites, enrolling more and more individuals in the process. Using those and other intelligence, investigators working in parallel can tie together very different pieces of an event or crime.

Users on the subreddit /Datahoarder, for example, began archiving and uploading content related to the siege for public access soon after the attack. Bellingcat, an investigative journalism outlet known for its open-source investigations, shared a spreadsheet organizing user-submitted videos and images. Even private companies like Intelligence X, which specializes in archiving, created its own publicly accessible datasets. The FBI put out a call to the public asking for any digital media that may help in their investigation. And researchers, the authors of this article included, have saved countless terabytes of media, taken screenshots, and relied on the archival work of others to ensure that disinformation campaigns are preserved.

While the siege on the Capitol has focused attention on this kind of work, it was far from an isolated event. Globally, evidence scraped, archived, verified, and analyzed from social media has aided investigations into alleged war crimes, human-rights abuses, and other criminal activity, providing the evidentiary basis for advocacy work, legal proceedings, and social science research. In 2017, when the International Criminal Court issued an arrest warrant for Mahmoud al-Werfalli, a member of the Libyan Arab Armed Forces, for the war crime of murder, it did so largely based on seven videos of the killings that were posted to social media.

Amnesty’s Digital Verification Corps, meanwhile, has used verified video footage and other open-source data collection techniques to document the deaths of 304 men, women, and children in the crackdown on the November 2019 protests in Iran. The Syrian Archive has a dedicated project for preserving content removed from platforms in their effort to document human-rights violations in Syria. And within academia, organizations like UC Berkeley’s Human Rights Center have used OSINT techniques to verify, investigate, and document human-rights violations and potential war crimes in Morocco, Myanmar, and Syria.

Despite the value of archiving and sifting through data, balancing preservation with the need for technology companies to remove content that may be illegal, dehumanizing, or otherwise potentially harmful has remained a challenge. This tension is born out in Myanmar where the sheer volume of hate speech targeting the Rohingya population on Facebook has forced the platform to admit in 2018 that its services were used to “foment division and incite offline violence.” As such, social media posts have become crucial pieces of evidence for investigators. A statement by the International Fact-Finding Mission, recommended that all platforms “retain indefinitely copies of material removed for use by judicial bodies and other credible accountability mechanisms addressing serious human rights violations committed in Myanmar.” However, when Gambia, which brought the case to the International Court of Justice, filed a suit against Facebook to compel the company to hand over documents and communications from Myanmar officials’ profiles and posts that the platform had previously removed, Facebook filed an objection.

One idea for addressing this challenge is the “human-rights locker” (also known as a “digital locker” or “evidence locker”), where publicly shared content—including content and accounts that have been removed by the platform—is collected, preserved, and verified for future research and investigation by select individuals and groups, like social scientists, researchers, advocacy organizations, historians, journalists, and human-rights investigators. Although many platforms have specific procedures for data requests, they are inconsistent, can take a long time, may be costly, and may differ by jurisdiction.

A locker would try to remedy some of this, while continuing to allow platforms to do the necessary work of removing hateful and dangerous content out of circulation where it could otherwise be amplified by trending or recommendation algorithms. Ideally, a set of standards would apply across platforms to address how digital information is stored, how to preserve a digital chain of custody, who can access the information, a credentialing process for those wanting access, and what safeguards should be in place to prevent potential abuse of data. This dataset would contain only public posts and accounts, not private messages, and pertain to significant events. Furthermore, social media companies should provide information on why the content was removed, whether it was manually or automatically flagged for removal, and whether appeals were made to reinstate the accounts or content.

A comprehensive approach to managing hate speech and disinformation is desperately needed to protect communities from the harms caused by large-scale abuse of social media. A human-rights locker, in addition to consistent and transparent enforcement of company content moderation policies, would be part of this. It would obviate the need for some search warrants, which have often resulted in stalemates between law enforcement agencies and platforms. And further, it would provide equal access to researchers studying disinformation campaigns and violent incitement, and their many impacts on elections, public health, and safety. Finally, such a system would allow independent audits of removed accounts and content, so that tech companies can be held accountable for their mistakes or for not being thorough enough.

As the human-rights scholar Jay Aronson puts it, “Archives are not neutral. They exert social and political power.” Depending on who has access and how they’re used, archival data can hold people to account, reveal crucial information about how our society works, and become a tool for advocacy. Following news that social media platforms would begin taking down misinformation related to COVID-19, more than 40 organizations including the Committee to Protect Journalists, WITNESS, Article19, and AccessNow signed a letter asking these companies to preserve all data on content removal so that researchers and journalists could study how online information flows ultimately affected health outcomes. Law enforcement agencies shouldn’t be the only ones with access, nor should only those who have brokered backdoor deals. Researchers, journalists, advocates, and civil-society organizations also play a role in ensuring a just future, and they should be given the means to do so.