CurateCamp 2010: Version Control Breakout

The Version Control session at CurateCamp focused mainly on using version control systems for managing data instead of code. The primary approaches to version control for archival data that were discussed included:
  • the Reverse Directory Deltas (ReDD) microservice specification, and its implementation in the CDL Storage Service.
  • systems like Git, Mercurial and Subversion to version data.
  • versioning by copying, and then linking copies together in metadata (METS, etc).
  • archiving received content, and then allowing the data to change in external systems which were backed up as business data.
The general issues discussed during the session included:
  • simplicity of the repository formats like ReDD compared with Git/Mercurial
  • ability to leverage existing tools which have lots of active development and eyes on them looking for bugs/improvements
  • "server-less" distributed revision control (Git, Mercurial, etc) ability to easily move repositories to tape, etc.
  • scalability of version control system's that are designed for textual code instead of binary "big data"
  • persistence of repository formats over time, backwards incompatible changes, that require upgrading the repository format to work with latest tools
  • potential idea to write plugins that allow you to export a Git or Mercurial repositories as ReDD for preservation purposes (similar to existing `git archive` command, but with full versioning instead of just the tip).
Ed Summers also wrote a blog post with some additional information about the rationale for the breakout session.