Skip to content

fix(ml): populate subscriptionId and resourceGroup for CLI-created datastores#9753

Open
fmabroukmsft wants to merge 1 commit intoAzure:mainfrom
fmabroukmsft:fix/ml-datastore-subscription-resourcegroup
Open

fix(ml): populate subscriptionId and resourceGroup for CLI-created datastores#9753
fmabroukmsft wants to merge 1 commit intoAzure:mainfrom
fmabroukmsft:fix/ml-datastore-subscription-resourcegroup

Conversation

@fmabroukmsft
Copy link
Copy Markdown
Member

Problem

When creating Azure storage datastores via az ml datastore create, the resulting datastore is missing subscriptionId and resourceGroup in its properties. This causes downstream operations like sharing data assets to registries to fail with a 400 error.

UI-created datastores correctly have these fields populated. Only CLI-created datastores are affected.

Before (CLI-created datastore)

{
  "subscriptionId": null,
  "resourceGroup": null,
  "datastoreType": "AzureBlob",
  "accountName": "mystorageaccount",
  "containerName": "mycontainer"
}

After (with this fix)

{
  "subscriptionId": "8f338f6e-...",
  "resourceGroup": "my-resource-group",
  "datastoreType": "AzureBlob",
  "accountName": "mystorageaccount",
  "containerName": "mycontainer"
}

Root Cause

The SDK entity classes (AzureBlobDatastore, AzureFileDatastore, AzureDataLakeGen2Datastore) do not include subscription_id or resource_group in their _to_rest_object() serialization, even though the underlying REST models (RestAzureBlobDatastore etc.) accept these optional fields and the service stores them correctly when provided.

Fix

This PR adds a helper _create_or_update_with_arm_scope() in the CLI extension's datastore command handler that:

  1. Builds the REST object via the SDK's _to_rest_object()
  2. Injects the workspace's subscriptionId and resourceGroup into the REST body when the entity doesn't provide them
  3. Calls the service directly

This is a targeted CLI-extension-level fix. A more comprehensive fix should be made in the azure-ai-ml SDK to add these fields to the YAML schema, entity constructors, and _to_rest_object() methods.

Affected Commands

  • az ml datastore create
  • az ml datastore update

Testing

  • ✅ Manually tested: created a blob datastore via CLI, verified subscriptionId and resourceGroup are now populated in the ARM API response
  • ✅ Verified existing datastores are unaffected
  • ✅ Verified the fix applies to all three Azure storage types (Blob, File, ADLS Gen2)

Related

  • ICM 716428613 — customer-reported issue with malformed CLI-created datastores

…tastores

When creating Azure storage datastores via 'az ml datastore create', the
CLI was not populating subscriptionId and resourceGroup in the REST
request body. This caused the created datastore to lack ARM scope,
breaking downstream operations such as sharing data assets to a registry
(400 error).

The REST client model supports these fields (they are optional on the
AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeGen2Datastore
REST models), and the service stores and returns them correctly when
provided. The issue was that the SDK entity's _to_rest_object() method
never passed them.

This fix intercepts the REST object after _to_rest_object() builds it,
and injects the workspace's subscription and resource group when the
entity does not carry its own values. This ensures CLI-created
datastores have the same ARM scope as UI-created ones.

Affected commands: az ml datastore create, az ml datastore update
Related ICM: 716428613

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 22:01
@azure-client-tools-bot-prd
Copy link
Copy Markdown

Validation for Breaking Change Starting...

Thanks for your contribution!

@azure-client-tools-bot-prd
Copy link
Copy Markdown

Hi @fmabroukmsft,
Please write the description of changes which can be perceived by customers into HISTORY.rst.
If you want to release a new extension version, please update the version in setup.py as well.

@yonzhan
Copy link
Copy Markdown
Collaborator

yonzhan commented Apr 1, 2026

Thank you for your contribution! We will review the pull request and get back to you soon.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

The git hooks are available for azure-cli and azure-cli-extensions repos. They could help you run required checks before creating the PR.

Please sync the latest code with latest dev branch (for azure-cli) or main branch (for azure-cli-extensions).
After that please run the following commands to enable git hooks:

pip install azdev --upgrade
azdev setup -c <your azure-cli repo path> -r <your azure-cli-extensions repo path>

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Hi @fmabroukmsft

Release Suggestions

Module: machinelearningservices

  • Please log updates into to src/machinelearningservices/HISTORY.rst
  • Update VERSION to 2.42.1 in src/machinelearningservices/setup.py

Notes

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an Azure ML CLI parity issue where datastores created/updated via az ml datastore create/update are missing ARM-scope fields (subscriptionId, resourceGroup), causing downstream operations (e.g., sharing to registries) to fail.

Changes:

  • Added _create_or_update_with_arm_scope() to serialize a datastore to a REST object, backfill subscriptionId/resourceGroup for Azure storage datastores, and call the service.
  • Routed ml_datastore_create and ml_datastore_update through the new helper.

Comment on lines 13 to +21
from azure.ai.ml.entities import Datastore
from azure.ai.ml.entities._datastore.azure_storage import AzureBlobDatastore, AzureDataLakeGen2Datastore, AzureFileDatastore
from azure.ai.ml.entities._load_functions import load_datastore

from .raise_error import log_and_raise_error
from .utils import _dump_entity_with_warnings, get_ml_client, modify_sys_path_for_rslex_mount


_AZURE_STORAGE_DATASTORE_TYPES = (AzureBlobDatastore, AzureDataLakeGen2Datastore, AzureFileDatastore)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a dependency on azure.ai.ml.entities._datastore.azure_storage which is a private SDK module (leading underscore). That makes the CLI extension fragile across azure-ai-ml upgrades. Consider detecting Azure storage datastores via public surface (e.g., the datastore entity type string like azure_blob / azure_file / azure_data_lake_gen2) instead of importing private classes.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +58
"""Create or update a datastore, backfilling subscription and resource group.

The SDK's ``_to_rest_object`` does not populate ``subscriptionId`` and
``resourceGroup`` in the request body for Azure storage datastores. When
these fields are missing the created datastore lacks ARM scope, which
breaks downstream operations such as sharing data assets to a registry.

This helper builds the REST object, injects the workspace's subscription
and resource group when the datastore entity does not carry them, and
then calls the service directly.
"""
ds_request = datastore._to_rest_object() # pylint: disable=protected-access

if isinstance(datastore, _AZURE_STORAGE_DATASTORE_TYPES):
subscription_id = ml_client._operation_scope.subscription_id # pylint: disable=protected-access
resource_group = ml_client._operation_scope._resource_group_name # pylint: disable=protected-access

props = ds_request.properties
if props is not None:
if not getattr(props, 'subscription_id', None):
props.subscription_id = subscription_id
if not getattr(props, 'resource_group', None):
props.resource_group = resource_group

datastore_resource = ml_client.datastores._operation.create_or_update( # pylint: disable=protected-access
name=datastore.name,
resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access
workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access
body=ds_request,
skip_validation=True,
)
return Datastore._from_rest_object(datastore_resource) # pylint: disable=protected-access


Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This helper relies on private MLClient internals (_operation_scope._resource_group_name, datastores._workspace_name, and datastores._operation). Since ml_datastore_create/update already receive resource_group_name and workspace_name, it would be more robust to pass those into this helper (or use the public ml_client.datastores.create_or_update path) to avoid breaking when SDK internals change.

Suggested change
"""Create or update a datastore, backfilling subscription and resource group.
The SDK's ``_to_rest_object`` does not populate ``subscriptionId`` and
``resourceGroup`` in the request body for Azure storage datastores. When
these fields are missing the created datastore lacks ARM scope, which
breaks downstream operations such as sharing data assets to a registry.
This helper builds the REST object, injects the workspace's subscription
and resource group when the datastore entity does not carry them, and
then calls the service directly.
"""
ds_request = datastore._to_rest_object() # pylint: disable=protected-access
if isinstance(datastore, _AZURE_STORAGE_DATASTORE_TYPES):
subscription_id = ml_client._operation_scope.subscription_id # pylint: disable=protected-access
resource_group = ml_client._operation_scope._resource_group_name # pylint: disable=protected-access
props = ds_request.properties
if props is not None:
if not getattr(props, 'subscription_id', None):
props.subscription_id = subscription_id
if not getattr(props, 'resource_group', None):
props.resource_group = resource_group
datastore_resource = ml_client.datastores._operation.create_or_update( # pylint: disable=protected-access
name=datastore.name,
resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access
workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access
body=ds_request,
skip_validation=True,
)
return Datastore._from_rest_object(datastore_resource) # pylint: disable=protected-access
"""Create or update a datastore using the public datastores client.
This helper avoids relying on private MLClient internals by delegating to
``ml_client.datastores.create_or_update``.
"""
return ml_client.datastores.create_or_update(datastore)

Copilot uses AI. Check for mistakes.
resource_group_name=ml_client._operation_scope._resource_group_name, # pylint: disable=protected-access
workspace_name=ml_client.datastores._workspace_name, # pylint: disable=protected-access
body=ds_request,
skip_validation=True,
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip_validation=True is a behavior change from the previous implementation (ml_client.datastores.create_or_update(datastore) did not force this flag). If skip_validation disables service-side validation, this could allow invalid datastore definitions or produce less actionable errors. Consider preserving the prior default (omit the flag / set it explicitly to the previous default) or plumb it as an optional parameter so existing behavior is unchanged unless needed.

Suggested change
skip_validation=True,

Copilot uses AI. Check for mistakes.
Comment on lines 116 to +119

try:
datastore = load_datastore(file, params_override=params_override)
return ml_client.datastores.create_or_update(datastore)._to_dict() # pylint: disable=protected-access
return _create_or_update_with_arm_scope(ml_client, datastore)._to_dict() # pylint: disable=protected-access
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change affects the HTTP request body for az ml datastore create/update (injecting subscriptionId/resourceGroup). There are existing datastore scenario tests with recordings, but none assert the new fields. Please update/add a scenario test assertion that the created datastore response includes subscription_id and resource_group (or the expected keys in CLI output), and refresh the relevant recordings if playback matching depends on request bodies.

Copilot uses AI. Check for mistakes.
Comment on lines 129 to +132

try:
datastore = Datastore._load(parameters) # pylint: disable=protected-access
return ml_client.datastores.create_or_update(datastore)._to_dict() # pylint: disable=protected-access
return _create_or_update_with_arm_scope(ml_client, datastore)._to_dict() # pylint: disable=protected-access
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as create: this alters the request body for az ml datastore update by injecting ARM scope fields. Please ensure scenario coverage includes an update case that verifies the updated datastore has subscription_id and resource_group populated, and refresh recordings if needed.

Copilot uses AI. Check for mistakes.
@fmabroukmsft
Copy link
Copy Markdown
Member Author

Companion SDK fix submitted: Azure/azure-sdk-for-python#46067

The SDK fix adds subscription_id and resource_group natively to the entity classes, YAML schema, and round-trip serialization. Once the SDK fix ships, the workaround in this CLI PR can be simplified to just call ml_client.datastores.create_or_update() directly (the current behavior) — but the CLI-level backfill will still serve as defense-in-depth for users on older SDK versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants