Software Heritage,
the persistent source-code archive:
What is it and how to use it?


13 November 2023
Atelier Data Univ Eiffel - Les plateformes pour développer, partager et archiver les logiciels.

Joenio Marques da Costa

Research Software Engineer at LISIS lab
CorTexT platform
Research Infrastructure for STI (RISIS)

Software Heritage Ambassador - November 22, 2022
Introducing our newest ambassador, Joenio Marques da Costa

Universal software source code archive softwareheritage.org



🛈 More than 150M projects, almost 10 billion unique source files as of January 2021

Software Heritage archive

17 billion unique source files as of November 2023

why?

  • 2015, Google Code and Gitorious.org shutdown
  • 2019, BitBucket announces Mercurial VCS sunset
  • 2020, BitBucket erases 250.000+ repositories
  • 2021, Inria’s old gforge.inria.fr was shutdown
  • 2022, GitLab.com considers erasing all projects that are inactive for a year
    GitLab U-turns on deleting dormant projects after backlash

Software Heritage

=

GitHub, GitLab, BitBucket, Google Code … ?

Version Control with Git

Why not?

Zenodo, HAL, figshare… ?


Source: Image from imgflip meme generator

Cause

Software is not Data


Source: Image from imgflip meme generator

Features

Software Heritage archive

The long term source code archive.

archive.softwareheritage.org

Software Heritage Hash identifier (SWHID)

Intrinsic identifiers for digital objects.

www.swhid.org

One type of identifier can’t answer all use cases, we need both intrinsic identifiers and extrinsic identifiers for software research outputs.

softwareheritage.org/2020/07/09/intrinsic-vs-extrinsic-identifiers

What can be identified with a SWHID?

softwareheritage.org/faq/#32_What_can_be_identified_with_a_SWHID

SWHID howtos:

Examples of SWHID use:

Source code: github.com/joenio/swhid-citation-example

SWHID Approved Specification


Source: team mailing list
The SWHID Specification Version 1.1.
The next step is the submission to ISO in order to become a standard.

Extra references:

Thanks!

joenio@joenio.me


This presentation is available at:

http://joenio.me/software-heritage-uge-data

(source-code: https://gitlab.com/joenio/joenio.gitlab.io)

Licença Creative Commons

Presentation history

Where and when this presentation was done