[FEATURE] Extensible tag classification model discovery through Entry Points#463
[FEATURE] Extensible tag classification model discovery through Entry Points#463Roel Bollens (RoelBollens-TomTom) wants to merge 9 commits intodevfrom
Conversation
b8ca2fe to
c00022c
Compare
| filters = [] | ||
|
|
||
| if tags: | ||
| filters.append(lambda key: all(tag in key.tags for tag in tags)) |
There was a problem hiding this comment.
all() here means --tag foo --tag bar requires BOTH tags (AND). Is that the intended semantics, or should repeated --tag flags use OR?
There was a problem hiding this comment.
This was intentional, but I guess this should still be formalized so I'll leave this open for discussion.
from the coding session minutes:
--tag semantics are AND
The mockups assume--tag overture --tag system:featuremeans "must have both tags" (AND). This contradicts the design doc's stated OR semantics. The group implicitly operated with AND throughout the exercise and no one objected. This should be formalized.
There was a problem hiding this comment.
Ah, great. I'm good with that; it didn't track with how I'd been thinking about it earlier is all.
There was a problem hiding this comment.
I'll still leave this open. Maybe after refactoring the other CLI commands opinions might change.
There was a problem hiding this comment.
What we discussed this morning in the coding session: AND makes the most sense for CLI list, but there are other use cases where OR fits better.
I did some very cursory LLM-driven research about common CLI paradigms that support both AND and OR in filtering. The best example I could come up with were kubectl label selectors with -l and docker filters, but none of them fit our exact use case.
I thought about it a bit and came up with some priorities:
- CLI should support both AND and OR.
- Should not use any special shell characters, especially characters like
!that can get expanded even within double quotes. - The most common use cases should be simple.
- Should not require a lot of typing.
The best idea I can come up with is to allow --tag to express an "OR of ANDs" by supporting multiple --tags values.
--tagcan be repeated as many times as you want.- Each
--tagsupports as plus-sign-separated list. (Alternative: comma.) - Within a
--tagthe plus-separated values form an AND expression. - The multiple
--tagarguments are unioned together as a big OR expression. - This allows you to express an "OR of ANDs" which is probably powerful enough for most CLI use cases.
- I would also suggest allowing
*wildcards to further strengthen the CLI power.
Examples
Get all models matching single tag, foo.
--tag foo
Get all models that are both foo AND bar.
--tag foo+bar
Get all models that are in an Overture theme AND also bar.
--tag overture:theme=*+bar
Get all models that are either foo OR bar.
--tag foo --tags bar
Get all models that are either Overture transportation features or Overture places features.
--tag feature+overture:theme=transportation --tag feature+overture:theme=places
There was a problem hiding this comment.
--tag foo+bar I'm less a fan of, because it introduces a query language in the value. I'd rather keep --tag as a simple repeatable flag. But I like adding the wildcard for namespace:* and ns:predicate=*.
A git log style --all-match is something I considered to switch --tag values from OR logic to AND. My concern with that is that some commands may want AND by default, and that still doesn't give a clean way to support both AND and OR on the same command, if that's at all desired.
There was a problem hiding this comment.
Alternative ideas:
--tagto include,--filterto exclude-i/--includeto include,-e/--excludeto exclude- existing is
--tagsand--exclude-tags
| } | ||
| TAG = r"[a-z0-9][a-z0-9_-]*" | ||
| NAMESPACE_TAG = r"[a-z0-9]+:[a-z0-9]+(?:=[a-z0-9_.-]+)?" | ||
| TAG_RE = re.compile(rf"^(?:{TAG}|{NAMESPACE_TAG})$") |
There was a problem hiding this comment.
TAG allows hyphens and underscores ([a-z0-9_-]), but NAMESPACE_TAG's key segment doesn't — acme:my-ext would fail validation even though my-ext is a valid plain tag. Intentional?
There was a problem hiding this comment.
The tag definition/validation needs some more love. I added a regex to have some validation but very little thought went into it. Same where TAG doesn't allow for decimal points/periods were a value in a namespace tag would (3.14 would fail but acme:pi=3.14 will pass).
There was a problem hiding this comment.
That handling of numbers makes sense to me. Raw tags with decimals feels somewhat confusing, so while it's restrictive, I could justify it.
There was a problem hiding this comment.
Well a standalone number might be odd, but i support it could also be something else like overturemaps.org as a tag, which it currently doesn't allow.
There was a problem hiding this comment.
TAG allows hyphens and underscores (
[a-z0-9_-]), butNAMESPACE_TAG's key segment doesn't —acme:my-extwould fail validation even thoughmy-extis a valid plain tag. Intentional?
To Seth Fitzsimmons (@mojodna)'s point I would suggest the same pattern should be used for both.
There was a problem hiding this comment.
Discussed in the coding sesh and ...
- Namespace and predicate have the same pattern.
- Which is: all lowercase, start alphanumeric, then can be alphanumeric or
-or_or.. - Value can include uppercase and need not start alphanumeric.
- Tag matching/combining/filtering/jiggering is case sensitive.
a2461e3 to
61bb58f
Compare
b602345 to
1501cc9
Compare
Victor Schappert (vcschapp)
left a comment
There was a problem hiding this comment.
Let some comments, but I'm generally aligned and would merge once Roel Bollens (@RoelBollens-TomTom) and Seth Fitzsimmons (@mojodna) are jointly aligned on merging.
Left some thoughts on the AND/OR issue in the CLI, probably above there somewhere. 👆
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| RESERVED_TAGS: dict[str, set[str]] = { |
There was a problem hiding this comment.
Are these various CONSTANTS meant to be part of the public interface or are they internal use only?
If internal use only, can we give them the appropriate underscore prefix?
|
|
||
| from overture.schema.system.feature import Feature | ||
|
|
||
| logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
This seems private, should i get the appropriate underscore prefix?
| RESERVED_TAGS: dict[str, set[str]] = { | ||
| "overture": {"overture-schema-core"}, | ||
| "feature": {"overture-schema-system"}, | ||
| } |
There was a problem hiding this comment.
I thought feature_provider was intentionally emitting feature rather than system:feature to be less clunky, per the PR description.
I understood from the PR description that you can reserve both plain tags and namespaces.
Personally I like this approach.
| } | ||
| TAG = r"[a-z0-9][a-z0-9_-]*" | ||
| NAMESPACE_TAG = r"[a-z0-9]+:[a-z0-9]+(?:=[a-z0-9_.-]+)?" | ||
| TAG_RE = re.compile(rf"^(?:{TAG}|{NAMESPACE_TAG})$") |
There was a problem hiding this comment.
TAG allows hyphens and underscores (
[a-z0-9_-]), butNAMESPACE_TAG's key segment doesn't —acme:my-extwould fail validation even thoughmy-extis a valid plain tag. Intentional?
To Seth Fitzsimmons (@mojodna)'s point I would suggest the same pattern should be used for both.
| ModelKeyFilter: TypeAlias = Callable[[ModelKey], bool] | ||
|
|
||
|
|
||
| def generate_tags( |
There was a problem hiding this comment.
_private?
| TagProvider: TypeAlias = Callable[[type[BaseModel], ModelKey, set[str]], set[str]] | ||
|
|
||
| ModelDict: TypeAlias = dict[ModelKey, type[BaseModel]] | ||
|
|
||
| TagProviderDict: TypeAlias = dict[TagProviderKey, TagProvider] | ||
|
|
||
| ModelKeyFilter: TypeAlias = Callable[[ModelKey], bool] |
There was a problem hiding this comment.
Are any of these intended to be _Private, e.g. maybe TagProviderDict?
| key = replace( | ||
| key, | ||
| tags=frozenset(generate_tags(model_class, key, tag_providers)), | ||
| ) |
There was a problem hiding this comment.
Basically switching the return type to tuple[str] from set[str]?
I have no opinion on this.
You could go the other way and return collections.abc.Iterable[str] - that way anything that can produce a sequence of strings via iteration would suffice.
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Co-authored-by: Seth Fitzsimmons <sethfitz@amazon.com> Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Co-authored-by: Seth Fitzsimmons <sethfitz@amazon.com> Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
… filtering logic - Removes overture tag provider (was deferred) - Simplified tags - Reserved tags instead of reserved namespaces - Fixes small issue introduced in earlier commit Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
… CLI commands Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
1501cc9 to
e5ad4c5
Compare
🗺️ Schema reference docs preview is live!
Note ♻️ This preview updates automatically with each push to this PR. |
e5ad4c5 to
3daf759
Compare
3daf759 to
f19416c
Compare
f19416c to
c8aa891
Compare
Signed-off-by: Roel <75250264+RoelBollens-TomTom@users.noreply.github.com>
c8aa891 to
e9eabf3
Compare
There was a problem hiding this comment.
Pull request overview
This PR replaces the legacy hardcoded namespace/theme/type-based model discovery with an entry-point-driven discovery system in overture.schema.system that classifies models via extensible, provider-generated tags. It also updates CLIs/docs/tests and package entry-point registrations to align with the new tag-based filtering and grouping model.
Changes:
- Introduces a new
overture.schema.system.discoverymodule that discovers models via entry points and attaches tags from pluggable tag providers (with reserved tag/namespace enforcement). - Updates schema CLI and codegen CLI to filter/group models by
--tag/--exclude-tagrather than--theme/--namespace-style options. - Simplifies
overture.modelsentry-point keys across theme packages (e.g.,building,segment) and adds core/system tag providers viaoverture.tag_providers.
Reviewed changes
Copilot reviewed 36 out of 36 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/overture-schema/src/overture/schema/init.py | Switches discovery import to the new system discovery module. |
| packages/overture-schema-transportation-theme/pyproject.toml | Updates model entry-point keys to simple type names. |
| packages/overture-schema-system/tests/test_tags.py | Adds tests for tag validation and tag parsing helpers. |
| packages/overture-schema-system/tests/test_tag_providers.py | Adds tests for reserved tag/namespace filtering and feature_provider. |
| packages/overture-schema-system/src/overture/schema/system/typing_util.py | Adds collect_types() helper used by tag providers. |
| packages/overture-schema-system/src/overture/schema/system/discovery/types.py | Defines discovery-related type aliases (TagProvider, ModelDict, etc.). |
| packages/overture-schema-system/src/overture/schema/system/discovery/tag_providers.py | Adds the built-in system feature_provider tag provider. |
| packages/overture-schema-system/src/overture/schema/system/discovery/tag.py | Defines tag format regex + helpers (is_valid_tag, get_namespace, get_values_for_key). |
| packages/overture-schema-system/src/overture/schema/system/discovery/models.py | Introduces ModelKey and TagProviderKey dataclasses. |
| packages/overture-schema-system/src/overture/schema/system/discovery/discovery.py | Implements entry-point discovery + tag generation + filtering APIs. |
| packages/overture-schema-system/src/overture/schema/system/discovery/init.py | Exposes the new discovery API surface for downstream consumers. |
| packages/overture-schema-system/pyproject.toml | Registers the built-in feature tag provider entry point. |
| packages/overture-schema-places-theme/pyproject.toml | Updates model entry-point keys to simple type names. |
| packages/overture-schema-divisions-theme/pyproject.toml | Updates model entry-point keys to simple type names. |
| packages/overture-schema-core/tests/test_approved_models.py | Adds a test asserting “feature” models also have the “overture” authority tag. |
| packages/overture-schema-core/src/overture/schema/core/tag_providers.py | Adds core tag providers for authority (overture) and theme (overture:theme=...). |
| packages/overture-schema-core/src/overture/schema/core/discovery.py | Removes the old discovery implementation (now replaced by system discovery). |
| packages/overture-schema-core/pyproject.toml | Registers core tag providers under overture.tag_providers. |
| packages/overture-schema-codegen/tests/test_integration_real_models.py | Updates imports to use system discovery. |
| packages/overture-schema-codegen/tests/test_cli.py | Updates codegen CLI tests to use --tag filtering. |
| packages/overture-schema-codegen/tests/conftest.py | Updates imports to use system discovery. |
| packages/overture-schema-codegen/tests/codegen_test_support.py | Updates discovery usage and introduces helpers for theme extraction via tags. |
| packages/overture-schema-codegen/src/overture/schema/codegen/cli.py | Replaces --theme filtering with --tag/--exclude-tag filtering. |
| packages/overture-schema-codegen/README.md | Updates usage docs from --theme to --tag. |
| packages/overture-schema-cli/tests/test_resolve_types.py | Updates resolve-types tests to use tag-based filtering and mocks discovery. |
| packages/overture-schema-cli/tests/test_error_formatting.py | Updates validate invocation to use --tag. |
| packages/overture-schema-cli/tests/test_cli_functions.py | Updates resolve_types usage to pass tags instead of themes. |
| packages/overture-schema-cli/tests/test_cli_commands.py | Updates CLI tests for list-types/json-schema/validate to use tags. |
| packages/overture-schema-cli/src/overture/schema/cli/types.py | Removes discovery-specific type aliases that no longer belong to the CLI package. |
| packages/overture-schema-cli/src/overture/schema/cli/commands.py | Implements tag-based filtering/grouping in list-types and updates validate/json-schema to use tags. |
| packages/overture-schema-cli/src/overture/schema/cli/init.py | Removes re-export of ModelDict from CLI types. |
| packages/overture-schema-buildings-theme/pyproject.toml | Updates model entry-point keys to simple type names. |
| packages/overture-schema-base-theme/pyproject.toml | Updates model entry-point keys to simple type names. |
| packages/overture-schema-annex/pyproject.toml | Updates model entry-point key to simple type name. |
| packages/overture-schema-addresses-theme/pyproject.toml | Updates model entry-point key to simple type name. |
| README.pydantic.md | Updates docs/examples to reference system discovery and new entry-point key style (but needs a fix to example module paths). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| building = "overture.schema.buildings.building.models:Building" | ||
| building_part = "overture.schema.buildings.building_part.models:BuildingPart" |
| class TestPlainTagRegex(unittest.TestCase): | ||
| def test_valid_plain_tags(self) -> None: | ||
| valid_tags = [ | ||
| "v", | ||
| "valid", | ||
| "valid1", | ||
| "valid_tag", | ||
| "valid-tag", | ||
| "0valid", | ||
| "42", | ||
| ] | ||
| for tag in valid_tags: | ||
| self.assertTrue( | ||
| re.fullmatch(PLAIN_TAG, tag), f"PLAIN_TAG should match: {tag}" | ||
| ) | ||
| self.assertTrue(TAG.fullmatch(tag), f"TAG should match: {tag}") | ||
| self.assertTrue( | ||
| is_valid_tag(tag), f"is_valid_tag should return True for: {tag}" | ||
| ) | ||
|
|
||
| def test_invalid_plain_tags(self) -> None: | ||
| invalid_tags = [ | ||
| "", | ||
| "_invalid", | ||
| "-invalid", | ||
| "Invalid", | ||
| "invalid!", | ||
| "invalid ", | ||
| "in.valid", | ||
| "3.14", | ||
| ] | ||
| for tag in invalid_tags: | ||
| self.assertFalse( | ||
| re.fullmatch(PLAIN_TAG, tag), f"PLAIN_TAG should not match: {tag}" | ||
| ) | ||
| self.assertFalse(TAG.fullmatch(tag), f"TAG should not match: {tag}") | ||
| self.assertFalse( | ||
| is_valid_tag(tag), f"is_valid_tag should return False for: {tag}" | ||
| ) | ||
|
|
||
|
|
||
| class TestNamespaceTagRegex(unittest.TestCase): | ||
| def test_valid_namespace_tags(self) -> None: | ||
| valid_tags = [ |
| tags.update(filtered_tags) | ||
| except Exception as e: | ||
| log.warning( | ||
| f"Error in tag provider {provider.__name__} for model {key.name}: {e}" |
Extensible tag classification model discovery through Entry Points
This replaces the hardcoded model classification system with tag-based classification model discovery through Entry Points. This is based on #440 by Seth and several schema (ad-hoc) coding sessions where Seth, Vic, Dana, Tristan and Roel participated in.
Model discovery moved into
system, eliminating assumptions about Overture in the process. The hardcodednamespaceconcept ("overture","annex") and theModelKindclassifier is replaced with tags -- string labels derived by tag providers. Tags become the filtering, grouping, and classification mechanism for model discovery, driven by introspection and package metadata rather than central coordination.systemprovides generic tag-based grouping without understanding what any particular tag means. Any package can register tag providers that classify models without special support in the discovery layer.Purpose
Tags serve three roles:
--tag system:feature,--tag draft)These roles overlap -- a tag like
overture:theme=buildingsserves both filtering and taxonomy. The design accommodates this overlap through structured tags that encode both ownership and dimension.Tag Format
Tags are strings following the pattern
[prefix:]key[=value]:overture,draft,featuresystem:extension--:separates ownershipoverture:theme=buildings:signals ownership and enables prefix reservation (see Privileged Packages and Tag Reservation).=signals a dimension with a value (groupable via--group-by). One level of each -- no nested colons or multiple=signs.Minimal launch set
feature(was:system:feature)overture:theme=<theme>buildings,transportation)overture(was:overture:official)Reserved tags
Tags can be reserved either as simple tags or by namespaces. These are the tags and namespaces that are currently reserved:
featureoverture-schema-systemovertureoverture-schema-coreoverture:*overture-schema-coresystem:*overture-schema-systemExtensions
Additional extensions and accompanied tags will be introduced in a future PR. Extensions allows to augment existing types with new fields (columns).
CLI
The
list-typescommand has been updated to support filtering and grouping by tags. Currently, it no longer displays the description or fully qualified class name. Thejson-schemaandvalidatecommands from the overture-schema cli andgeneratecommand from the overture-codegen cli have been updated to be able to filter on tags instead of filtering by theme and type. Further changes can be introduced in a future update.Examples
Deviations