feat(generators): add --deterministic flag for reproducible output (pyoxigraph RDFC-1.0)#1
Open
feat(generators): add --deterministic flag for reproducible output (pyoxigraph RDFC-1.0)#1
Conversation
bdb0f7a to
6544b72
Compare
Add a --deterministic flag to OWL, SHACL, JSON-LD, and JSON-LD Context generators that produces stable, reproducible output suitable for version-controlled artifacts. When enabled, the flag activates: 1. **RDFC-1.0 blank-node canonicalization** via pyoxigraph (W3C Recommendation) for Turtle serialisation of OWL and SHACL graphs. 2. **Deterministic Collection ordering** — RDF Collections (owl:oneOf, sh:in, sh:ignoredProperties) are sorted so that enum members and property lists appear in a stable order. This intentionally changes the RDF graph (Collections encode order at the triple level) and is therefore opt-in. 3. **Deterministic JSON key ordering** for JSON-LD and JSON-LD Context output, with structure-aware sorting that preserves JSON-LD conventions (@context directives first, then prefixes, then terms). The flag defaults to False to preserve backward compatibility. Four tests are marked xfail(strict=True) to document that deterministic Collection sorting intentionally produces non-isomorphic output. New dependency: pyoxigraph >= 0.4.0 (Rust-based, W3C RDFC-1.0). Refs: - W3C (2024) RDF Dataset Canonicalization (RDFC-1.0) https://www.w3.org/TR/rdf-canon/ Signed-off-by: jdsika <carlo.van-driesten@bmw.de>
6544b72 to
6ca468c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Review PR
This PR is for internal review of the --deterministic\ flag changes before submitting upstream to \linkml/linkml.
Key changes from the closed upstream PR linkml#3295:
Replaced custom Weisfeiler-Lehman algorithm with pyoxigraph RDFC-1.0 (W3C standard, Rust implementation) — addresses the core concern raised by maintainers about rolling our own canonicalization.
*Collection sorting gated behind --deterministic* — \owl:oneOf, \sh:in, \sh:ignoredProperties\ items are sorted only when the flag is set. This preserves existing behaviour by default.
*\deterministic_json()* — recursive deep-sort for JSON output, gated behind --deterministic.
Covering axiom fix — abstract classes with single child no longer get reversed hierarchy triples.
Test results
df:first/
df:rest\ triples) while preserving OWL/SHACL semantics.
New dependency
See .playground/pr-3295-description.md\ for the full upstream PR description draft.