Skip to content

feat: migrate POST /datasets/untag endpoint#268

Open
Jayant-kernel wants to merge 3 commits intoopenml:mainfrom
Jayant-kernel:issue-20-datasets-untag
Open

feat: migrate POST /datasets/untag endpoint#268
Jayant-kernel wants to merge 3 commits intoopenml:mainfrom
Jayant-kernel:issue-20-datasets-untag

Conversation

@Jayant-kernel
Copy link

Description

This PR migrates the legacy POST /data/untag endpoint to FastAPI as POST /datasets/untag.

Implemented behavior includes:

  • Added /datasets/untag router endpoint with authenticated access.
  • Added legacy-aligned untag error handling:
  • 475 when the tag is not attached to the dataset.
  • 476 when the tag exists but is owned by another user (admin override supported).
  • Added database helpers to fetch/delete dataset tag records.
  • Added endpoint tests for success, auth failures, ownership restrictions, missing tags, and validation errors.
  • Added migration parity test comparing /data/untag (PHP) with /datasets/untag (Python).

Fixes: #20
Related: #6

Checklist

Please check all that apply. You can mark items as N/A if they don't apply to your change.

  • I have performed a self-review of my own code
  • I have commented my code in hard-to-understand areas, and provided or updated docstrings as needed
  • I have made corresponding changes to the documentation pages (/docs)
  • I have added tests that cover the changes
  • Tests pass locally.

nb. If tests pass locally but fail on CI, please try to investigate the cause. If you are unable to resolve the issue, please share your findings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Warning

Rate limit exceeded

@Jayant-kernel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 19 minutes and 40 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8c7ceb42-5e2a-4fd7-a9f0-9f04db5e1085

📥 Commits

Reviewing files that changed from the base of the PR and between fb9d1ce and a81a21a.

📒 Files selected for processing (2)
  • src/routers/openml/datasets.py
  • tests/routers/openml/dataset_tag_test.py

Walkthrough

Adds dataset untagging end-to-end: two new async DB helpers to list and delete dataset tags, a new POST /datasets/untag API endpoint with authentication and ownership checks that uses TagNotFoundError and TagNotOwnedError, and tests covering unauthorized access, successful untag, ownership denial, non‑existent tag error, invalid-tag validation, and migration parity checks.

Possibly related PRs

  • openml/server-api PR 246 — Implements an "untag" flow for a different resource and shares the same TagNotFoundError/TagNotOwnedError error types, indicating a code-level similarity in untag handling.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title accurately describes the main change: migrating the POST /datasets/untag endpoint.
Description check ✅ Passed The description clearly explains the implemented behavior, including endpoints, error handling, database helpers, and tests added.
Linked Issues check ✅ Passed The PR successfully implements the untag endpoint migration with error handling (475, 476 codes), authentication, ownership checks, and comprehensive tests matching issue requirements.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the untag endpoint migration. No out-of-scope modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • In database.datasets.get_tag_for, consider selecting only the required columns (e.g. uploader) instead of SELECT * to avoid unnecessary coupling to the full row shape and reduce data transfer.
  • The untag_dataset flow performs a read (get_tag_for) followed by a separate delete; if legacy parity allows, you could collapse this into a single conditional DELETE (e.g. constrained by uploader) and branch based on the affected row count to avoid a race between the two operations.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `database.datasets.get_tag_for`, consider selecting only the required columns (e.g. `uploader`) instead of `SELECT *` to avoid unnecessary coupling to the full row shape and reduce data transfer.
- The `untag_dataset` flow performs a read (`get_tag_for`) followed by a separate delete; if legacy parity allows, you could collapse this into a single conditional DELETE (e.g. constrained by `uploader`) and branch based on the affected row count to avoid a race between the two operations.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/database/datasets.py (1)

69-98: Make untag matching semantics explicit instead of collation-dependent.

tag_dataset already treats tag equality case-insensitively before insert, but these helpers do exact tag = :tag matches. That means /datasets/untag now depends on the database collation to decide whether Study_14 and study_14 are the same tag. Please normalize tags consistently across create/lookup/delete, or make the SQL predicate case-insensitive in both helpers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database/datasets.py` around lines 69 - 98, The tag match is currently
collation-dependent because get_tag_for and untag use exact `tag = :tag`; update
both helpers to use consistent case-insensitive matching (or normalize the tag
parameter) so create/lookup/delete behave the same as tag_dataset. For a minimal
change, change the WHERE predicates in get_tag_for and untag to compare
LOWER(tag) = LOWER(:tag) (or use an explicit case-insensitive COLLATE) and/or
call .lower() on the tag_ parameter before passing it to the SQL parameters;
ensure the same approach is applied in both get_tag_for and untag so matching
semantics are identical to tag_dataset.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/routers/openml/dataset_tag_test.py`:
- Around line 105-141: The setup POST to "/datasets/tag" in test_dataset_untag
and test_dataset_untag_rejects_other_user must be asserted so failures or
pre-existing tags don't mask test intent: after calling py_api.post(...
"/datasets/tag" ...) assert the response.status_code is HTTPStatus.OK and/or
response.json() indicates success, then assert the tag appears via
get_tags_for(id_=dataset_id, connection=expdb_test) (or add that call in the
second test too). Also add a deterministic admin-bypass case (new test or extend
test_dataset_untag_rejects_other_user) that tags with ApiKey.SOME_USER then
attempts to untag with an admin API key (use ApiKey.ADMIN or the project admin
constant) and assert the admin untag returns HTTPStatus.OK and removes the tag;
keep cleanup assertions as needed. Ensure references to test names
(test_dataset_untag, test_dataset_untag_rejects_other_user), helper
get_tags_for, and the "/datasets/tag" and "/datasets/untag" endpoints are used
to locate changes.

In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 222-226: The PHP /data/tag POST is currently fire-and-forget
(php_api.post) so if it fails the Python test's precondition is invalid; change
the call to capture the response (resp = php_api.post(...)) and assert that
resp.status_code indicates success (e.g., HTTPStatus.OK) or, matching the
handling in the tag migration test, detect the same transient error conditions
and call pytest.skip for those cases; update the block that uses
original.status_code and php_api.post and reference dataset_id, api_key, and tag
when constructing the check so the test either fails loudly on a real failure or
skips on known transient PHP errors.

---

Nitpick comments:
In `@src/database/datasets.py`:
- Around line 69-98: The tag match is currently collation-dependent because
get_tag_for and untag use exact `tag = :tag`; update both helpers to use
consistent case-insensitive matching (or normalize the tag parameter) so
create/lookup/delete behave the same as tag_dataset. For a minimal change,
change the WHERE predicates in get_tag_for and untag to compare LOWER(tag) =
LOWER(:tag) (or use an explicit case-insensitive COLLATE) and/or call .lower()
on the tag_ parameter before passing it to the SQL parameters; ensure the same
approach is applied in both get_tag_for and untag so matching semantics are
identical to tag_dataset.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5cb43695-55c7-4a36-bd3c-0c46ce5f2c50

📥 Commits

Reviewing files that changed from the base of the PR and between 41f1973 and 3eebed2.

📒 Files selected for processing (4)
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • tests/routers/openml/dataset_tag_test.py
  • tests/routers/openml/migration/datasets_migration_test.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/database/datasets.py (1)

71-83: Select only needed columns in get_tags.

Line 75 currently pulls SELECT *, but the caller only needs tag and uploader. Narrowing this reduces coupling and payload size.

♻️ Proposed refactor
 async def get_tags(id_: int, connection: AsyncConnection) -> list[Row]:
     row = await connection.execute(
         text(
             """
-    SELECT *
+    SELECT `tag`, `uploader`
     FROM dataset_tag
     WHERE id = :dataset_id
     """,
         ),
         parameters={"dataset_id": id_},
     )
-    return list(row.all())
+    return row.all()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/database/datasets.py` around lines 71 - 83, In get_tags change the query
to select only the required columns (tag and uploader) instead of SELECT * to
reduce payload and coupling; update the SQL in the async function get_tags(id_:
int, connection: AsyncConnection) to SELECT tag, uploader FROM dataset_tag WHERE
id = :dataset_id (keeping parameter name and return behavior) so callers receive
only the needed fields and database load is minimized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 81-83: The current code only calls database.datasets.get(data_id,
expdb_db) and raises DatasetNotFoundError, which leaks existence; instead,
invoke the router's standard visibility/access guard when retrieving the dataset
(i.e., use the access-aware fetch rather than a raw existence check) before
performing untag checks—replace or augment database.datasets.get(data_id,
expdb_db) with the access-checked retrieval used elsewhere in the router (call
the same access/visibility helper used across this router with data_id and
expdb_db) and only raise DatasetNotFoundError if that access-aware fetch returns
no dataset.

---

Nitpick comments:
In `@src/database/datasets.py`:
- Around line 71-83: In get_tags change the query to select only the required
columns (tag and uploader) instead of SELECT * to reduce payload and coupling;
update the SQL in the async function get_tags(id_: int, connection:
AsyncConnection) to SELECT tag, uploader FROM dataset_tag WHERE id = :dataset_id
(keeping parameter name and return behavior) so callers receive only the needed
fields and database load is minimized.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 43322e6b-e7b9-41f5-a497-18fff3ea1cad

📥 Commits

Reviewing files that changed from the base of the PR and between 3eebed2 and fb9d1ce.

📒 Files selected for processing (4)
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • tests/routers/openml/dataset_tag_test.py
  • tests/routers/openml/migration/datasets_migration_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/routers/openml/dataset_tag_test.py
  • tests/routers/openml/migration/datasets_migration_test.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

POST /data/untag

1 participant