Software Heritage,
the persistent source-code archive:
What is it and how to use it?

13 November 2023
Atelier Data Univ Eiffel - Les plateformes pour développer, partager et archiver les logiciels.

Joenio Marques da Costa

Research Software Engineer at LISIS lab
CorTexT platform
Research Infrastructure for STI (RISIS)

Software Heritage Ambassador - November 22, 2022
Introducing our newest ambassador, Joenio Marques da Costa

Universal software source code archive

🛈 More than 150M projects, almost 10 billion unique source files as of January 2021

Software Heritage archive

17 billion unique source files as of November 2023


  • 2015, Google Code and shutdown
  • 2019, BitBucket announces Mercurial VCS sunset
  • 2020, BitBucket erases 250.000+ repositories
  • 2021, Inria’s old was shutdown
  • 2022, considers erasing all projects that are inactive for a year
    GitLab U-turns on deleting dormant projects after backlash

Software Heritage


GitHub, GitLab, BitBucket, Google Code … ?

Version Control with Git

Why not?

Zenodo, HAL, figshare… ?

Source: Image from imgflip meme generator


Software is not Data

Source: Image from imgflip meme generator


Software Heritage archive

The long term source code archive.

Software Heritage Hash identifier (SWHID)

Intrinsic identifiers for digital objects.

One type of identifier can’t answer all use cases, we need both intrinsic identifiers and extrinsic identifiers for software research outputs.

What can be identified with a SWHID?

SWHID howtos:

Examples of SWHID use:

Source code:

SWHID Approved Specification

Source: team mailing list
The SWHID Specification Version 1.1.
The next step is the submission to ISO in order to become a standard.

Extra references:


This presentation is available at:


Licença Creative Commons

Presentation history

Where and when this presentation was done