RDF in version control

TL;DR: Write as N-Triples and then sort the lines.

You have a RDF graph? You want to commit it to git in order to keep a history of changes? You’re generating it from another data source and want to see how changes in your script affects the output?

RDF as a data model is unordered. So when you dump the graph to a file, be it in Turtle, JSON or XML, you don’t really know whether the changes as seen by git will be meaningful to you.

The solution:

  1. Serialize to the N-Triples format
  2. Sort the lines

N-Triples is just lines of <subject> <predicate> <object> . There are no namespace prefixes and no header. This makes it easy to sort, and the sorting makes it easy to diff.

Yes, there’s a lot of ugly repetition, and this might be cumbersome if your data is large. You could maybe throw in some string replacements if size is an issue?

(Photo credit: Robin Mathlener)