feat: migrate POST /datasets/untag endpoint#268
feat: migrate POST /datasets/untag endpoint#268Jayant-kernel wants to merge 3 commits intoopenml:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds dataset untagging end-to-end: two new async DB helpers to list and delete dataset tags, a new POST /datasets/untag API endpoint with authentication and ownership checks that uses TagNotFoundError and TagNotOwnedError, and tests covering unauthorized access, successful untag, ownership denial, non‑existent tag error, invalid-tag validation, and migration parity checks. Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip You can customize the tone of the review comments and chat replies.Configure the |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- In
database.datasets.get_tag_for, consider selecting only the required columns (e.g.uploader) instead ofSELECT *to avoid unnecessary coupling to the full row shape and reduce data transfer. - The
untag_datasetflow performs a read (get_tag_for) followed by a separate delete; if legacy parity allows, you could collapse this into a single conditional DELETE (e.g. constrained byuploader) and branch based on the affected row count to avoid a race between the two operations.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `database.datasets.get_tag_for`, consider selecting only the required columns (e.g. `uploader`) instead of `SELECT *` to avoid unnecessary coupling to the full row shape and reduce data transfer.
- The `untag_dataset` flow performs a read (`get_tag_for`) followed by a separate delete; if legacy parity allows, you could collapse this into a single conditional DELETE (e.g. constrained by `uploader`) and branch based on the affected row count to avoid a race between the two operations.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/database/datasets.py (1)
69-98: Make untag matching semantics explicit instead of collation-dependent.
tag_datasetalready treats tag equality case-insensitively before insert, but these helpers do exacttag = :tagmatches. That means/datasets/untagnow depends on the database collation to decide whetherStudy_14andstudy_14are the same tag. Please normalize tags consistently across create/lookup/delete, or make the SQL predicate case-insensitive in both helpers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/datasets.py` around lines 69 - 98, The tag match is currently collation-dependent because get_tag_for and untag use exact `tag = :tag`; update both helpers to use consistent case-insensitive matching (or normalize the tag parameter) so create/lookup/delete behave the same as tag_dataset. For a minimal change, change the WHERE predicates in get_tag_for and untag to compare LOWER(tag) = LOWER(:tag) (or use an explicit case-insensitive COLLATE) and/or call .lower() on the tag_ parameter before passing it to the SQL parameters; ensure the same approach is applied in both get_tag_for and untag so matching semantics are identical to tag_dataset.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/routers/openml/dataset_tag_test.py`:
- Around line 105-141: The setup POST to "/datasets/tag" in test_dataset_untag
and test_dataset_untag_rejects_other_user must be asserted so failures or
pre-existing tags don't mask test intent: after calling py_api.post(...
"/datasets/tag" ...) assert the response.status_code is HTTPStatus.OK and/or
response.json() indicates success, then assert the tag appears via
get_tags_for(id_=dataset_id, connection=expdb_test) (or add that call in the
second test too). Also add a deterministic admin-bypass case (new test or extend
test_dataset_untag_rejects_other_user) that tags with ApiKey.SOME_USER then
attempts to untag with an admin API key (use ApiKey.ADMIN or the project admin
constant) and assert the admin untag returns HTTPStatus.OK and removes the tag;
keep cleanup assertions as needed. Ensure references to test names
(test_dataset_untag, test_dataset_untag_rejects_other_user), helper
get_tags_for, and the "/datasets/tag" and "/datasets/untag" endpoints are used
to locate changes.
In `@tests/routers/openml/migration/datasets_migration_test.py`:
- Around line 222-226: The PHP /data/tag POST is currently fire-and-forget
(php_api.post) so if it fails the Python test's precondition is invalid; change
the call to capture the response (resp = php_api.post(...)) and assert that
resp.status_code indicates success (e.g., HTTPStatus.OK) or, matching the
handling in the tag migration test, detect the same transient error conditions
and call pytest.skip for those cases; update the block that uses
original.status_code and php_api.post and reference dataset_id, api_key, and tag
when constructing the check so the test either fails loudly on a real failure or
skips on known transient PHP errors.
---
Nitpick comments:
In `@src/database/datasets.py`:
- Around line 69-98: The tag match is currently collation-dependent because
get_tag_for and untag use exact `tag = :tag`; update both helpers to use
consistent case-insensitive matching (or normalize the tag parameter) so
create/lookup/delete behave the same as tag_dataset. For a minimal change,
change the WHERE predicates in get_tag_for and untag to compare LOWER(tag) =
LOWER(:tag) (or use an explicit case-insensitive COLLATE) and/or call .lower()
on the tag_ parameter before passing it to the SQL parameters; ensure the same
approach is applied in both get_tag_for and untag so matching semantics are
identical to tag_dataset.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5cb43695-55c7-4a36-bd3c-0c46ce5f2c50
📒 Files selected for processing (4)
src/database/datasets.pysrc/routers/openml/datasets.pytests/routers/openml/dataset_tag_test.pytests/routers/openml/migration/datasets_migration_test.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/database/datasets.py (1)
71-83: Select only needed columns inget_tags.Line 75 currently pulls
SELECT *, but the caller only needstaganduploader. Narrowing this reduces coupling and payload size.♻️ Proposed refactor
async def get_tags(id_: int, connection: AsyncConnection) -> list[Row]: row = await connection.execute( text( """ - SELECT * + SELECT `tag`, `uploader` FROM dataset_tag WHERE id = :dataset_id """, ), parameters={"dataset_id": id_}, ) - return list(row.all()) + return row.all()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/database/datasets.py` around lines 71 - 83, In get_tags change the query to select only the required columns (tag and uploader) instead of SELECT * to reduce payload and coupling; update the SQL in the async function get_tags(id_: int, connection: AsyncConnection) to SELECT tag, uploader FROM dataset_tag WHERE id = :dataset_id (keeping parameter name and return behavior) so callers receive only the needed fields and database load is minimized.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/routers/openml/datasets.py`:
- Around line 81-83: The current code only calls database.datasets.get(data_id,
expdb_db) and raises DatasetNotFoundError, which leaks existence; instead,
invoke the router's standard visibility/access guard when retrieving the dataset
(i.e., use the access-aware fetch rather than a raw existence check) before
performing untag checks—replace or augment database.datasets.get(data_id,
expdb_db) with the access-checked retrieval used elsewhere in the router (call
the same access/visibility helper used across this router with data_id and
expdb_db) and only raise DatasetNotFoundError if that access-aware fetch returns
no dataset.
---
Nitpick comments:
In `@src/database/datasets.py`:
- Around line 71-83: In get_tags change the query to select only the required
columns (tag and uploader) instead of SELECT * to reduce payload and coupling;
update the SQL in the async function get_tags(id_: int, connection:
AsyncConnection) to SELECT tag, uploader FROM dataset_tag WHERE id = :dataset_id
(keeping parameter name and return behavior) so callers receive only the needed
fields and database load is minimized.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 43322e6b-e7b9-41f5-a497-18fff3ea1cad
📒 Files selected for processing (4)
src/database/datasets.pysrc/routers/openml/datasets.pytests/routers/openml/dataset_tag_test.pytests/routers/openml/migration/datasets_migration_test.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/routers/openml/dataset_tag_test.py
- tests/routers/openml/migration/datasets_migration_test.py
Description
This PR migrates the legacy
POST /data/untagendpoint to FastAPI asPOST /datasets/untag.Implemented behavior includes:
/datasets/untagrouter endpoint with authenticated access.475when the tag is not attached to the dataset.476when the tag exists but is owned by another user (admin override supported)./data/untag(PHP) with/datasets/untag(Python).Fixes: #20
Related: #6
Checklist
Please check all that apply. You can mark items as N/A if they don't apply to your change.
/docs)nb. If tests pass locally but fail on CI, please try to investigate the cause. If you are unable to resolve the issue, please share your findings.