Skip to content

fix: scan rules from disk + dedup flat-md scanners#200

Merged
tylergraydev merged 2 commits intotylergraydev:mainfrom
prefrontalsys:fix/scan-rules-and-dedup-flat-md-scanners
Apr 21, 2026
Merged

fix: scan rules from disk + dedup flat-md scanners#200
tylergraydev merged 2 commits intotylergraydev:mainfrom
prefrontalsys:fix/scan-rules-and-dedup-flat-md-scanners

Conversation

@prefrontalsys
Copy link
Copy Markdown
Contributor

Summary

Rules in ~/.claude/rules/ and {project}/.claude/rules/ never appeared in the UI because no scanner populated the rules table — it was only ever written by in-UI create_rule. Every other behavioural primitive (commands, skills, agents, hooks) has both a global and a project scanner wired into run_startup_scan. This PR closes that scope asymmetry: an existing .md rule on disk now shows up in the library like everything else, with source=auto-detected.

While there, the PR pays down mechanical duplication. Four flat-.md scanners — scan_global_commands, scan_global_agents, scan_project_commands, scan_project_agents — shared ~150 LOC of copy-pasted skeleton (read_dir.md filter → parse_frontmatter → upsert-by-unique-name). They're migrated to a small shared helper, walk_md_dir, which the two new rule scanners also use. Skills (directory-based with a skill_files child table), hooks (JSON in settings.json), and the vendor scanners (opencode/codex/copilot/cursor/gemini) are legitimately different shapes and are left untouched.

Two adjacent bugs also fall out of the scope:

  1. The rule frontmatter writer emitted paths: a, b, c as a comma-joined plain string, but the DB reader deserialised paths/tags via serde_json::from_str. A UI-created rule round-tripped through the new scanner would have its paths silently corrupted. The writer now emits JSON arrays; the scanner accepts both JSON and legacy comma-separated form so users' existing .md files on disk continue to parse. The writer also previously dropped tags entirely — now it emits them.
  2. delete_rule removed only the DB row. With a scanner in place, any rule with a surviving disk file would resurrect as auto-detected on next launch. delete_rule now also removes the backing .md file, factored behind a testable delete_rule_inner helper.

The previously dead-lettered is_symlink and symlink_target columns in the rules schema are now populated by the global/project scanners via fs::symlink_metadata + fs::read_link.

Files touched

File Change
src-tauri/src/utils/paths.rs Add rules_dir field + populate
src-tauri/src/services/scanner.rs Add SOURCE_AUTO_DETECTED + walk_md_dir; migrate four flat-.md scanners; add ParsedRule/parse_rule_file/parse_list_value/upsert_rule_from_file/scan_global_rules/scan_project_rules; wire into run_startup_scan and per-project loop
src-tauri/src/services/rule_writer.rs paths/tags emitted as JSON arrays
src-tauri/src/services/config_writer.rs Add rules_dir field to 14 test-only ClaudePathsInternal literals to keep existing tests compiling
src-tauri/src/commands/rules.rs delete_rule removes backing .md; factored behind delete_rule_inner

No schema migration. No new dependencies.

Test plan

  • cargo fmt --check passes.
  • cargo test --lib — 2010 pass, 0 fail. 18 new tests cover walk_md_dir (empty/non-md/error), parse_rule_file (full/no-frontmatter/invalid), parse_list_value (JSON/comma/empty), scan_global_rules (empty/insert/idempotent/backfill/symlink), scan_project_rules (assignment + dedup), rule_writer JSON emission for paths and tags, and delete_rule_inner with disk-present and disk-absent.
  • Existing generate_rule_markdown tests updated to assert the new JSON-array shape for paths and tags.
  • End-to-end: drop 10 existing .md files in ~/.claude/rules/ → relaunch Tauri dev → Rules page populates with description + paths + auto-detected badge; second launch no duplicates; UI-created rule round-trips through disk as JSON-array form; UI-delete removes disk file and survives relaunch. (I've tested this locally; maintainer may want to verify too.)

Not in scope (candidates for follow-up PRs)

  1. Refresh button rescans disk. Header.svelte:21-36 handleRefresh is a pure in-memory reload; commands/scanner.rs:7 scan_claude_directory only runs scan_claude_json + scan_plugins, not the global library scanners. Same limitation exists for commands/skills/agents/hooks today. A separate UX PR.
  2. Skill/command/agent writers use comma-separated tags in frontmatter too. Latent; no live bug because no scanner re-ingests them for DB writes. Separate PR if desired.
  3. Vendor scanner dedup. Five vendor scanners (opencode/codex/copilot/cursor/gemini) follow an almost-identical "load JSON config → INSERT mcps with distinct source literal" pattern. Not touched here; different enough from the flat-.md helper that a second small helper would be warranted — separate PR.
  4. settings.local.json hook scanning. No scanner today reads .local variants for any primitive. Separate PR.

Clippy note

cargo clippy -- -D warnings fails on this branch, but the failures are all pre-existing on main (119 errors: vec! in tests, unused variables in docker_hosts.rs, snake_case nits, PathBuf vs Path in existing helpers, etc.). None of the errors touch files/lines added by this PR. Happy to strip -D warnings from CI or fix them in a separate cleanup PR — let me know the preference.

@tylergraydev
Copy link
Copy Markdown
Owner

Thanks for this — clear writeup and well-scoped work. Did a review pass locally.

What's good

  • Root cause is right. Rules being the only primitive without a disk scanner was a real scope asymmetry, and wiring into run_startup_scan + the per-project loop is the correct fix.
  • walk_md_dir abstraction is clean. Migrating the four existing flat-.md scanners onto it alongside the new rule scanners pays down real duplication without overreaching (good call leaving skills/hooks/vendor scanners alone — they genuinely differ in shape).
  • Both latent bugs caught along the way are legitimate finds. The paths/tags round-trip (comma-string writer vs. serde_json::from_str reader) would've silently corrupted UI-created rules once the scanner started ingesting them. And delete_rule resurrecting via auto-detect on next launch would've been a confusing UX regression. Nice that parse_list_value accepts both forms so existing on-disk files don't break.
  • Test coverage is thorough. 18 tests including idempotency, source_path backfill, symlinks, project assignment + dedup, and both parse forms. delete_rule_inner taking home: &Path for testability is the right factoring.
  • Populating is_symlink/symlink_target — nice bonus, those columns were dead-lettered.

Minor things

  1. Branch is stale — diverged from main at 30c8e79 (v3.7.2) and carries ~9 extra commits that aren't in main (including container-templates work that's since landed). Could you rebase onto current main? Should be mostly clean since your five-file footprint doesn't overlap the container work.

  2. CI: 4/10 checks failing. You mention the clippy -D warnings failures are pre-existing on main — I believe you, but could you link to or name which checks are the pre-existing ones vs. anything new on your branch? Just to make it easy to verify before merge.

  3. delete_rule_inner swallows load errors silently:

    let rule: Option<Rule> = conn
        .prepare(&query)
        .ok()
        .and_then(|mut s| s.query_row([id], row_to_rule).ok());

    If the SELECT fails for any reason other than "row doesn't exist" (e.g. transient lock, corruption), we DB-delete but leave the disk file. The comment says "best-effort" so probably intentional — but worth considering whether a real DB error should surface instead of falling through to the delete. Not a blocker.

On the follow-ups

The four items in your "not in scope" section are all correctly scoped as separate PRs — especially the refresh-button one (same limitation exists for every primitive today, deserves its own UX-level discussion). Happy to take those as follow-ups.

On the clippy note

Prefer fixing the pre-existing clippy warnings in a separate cleanup PR rather than stripping -D warnings from CI. If you'd like to take that on, that'd be welcome; otherwise I can track it separately.

Once rebased, I think this is good to merge. Thanks again for the careful work on this.

Rules in ~/.claude/rules/ and {project}/.claude/rules/ never appeared in
the UI because no scanner populated the `rules` table — it was only ever
written by in-UI create_rule. Every other behavioural primitive has both
global and project scanners; this closes the scope asymmetry.

Changes:

* utils/paths.rs: add rules_dir to ClaudePathsInternal.
* services/scanner.rs:
  - Introduce SOURCE_AUTO_DETECTED constant and walk_md_dir helper.
  - Migrate scan_global_commands, scan_global_agents, scan_project_commands,
    and scan_project_agents to use the helper — behaviour unchanged.
  - Add ParsedRule, parse_rule_file, parse_list_value (accepts both JSON-
    array and legacy comma-separated forms), upsert_rule_from_file,
    scan_global_rules, scan_project_rules. Populate is_symlink and
    symlink_target (previously dead-lettered). Wire into run_startup_scan
    and the per-project loop.
* services/rule_writer.rs: emit paths and tags as JSON arrays so they
  round-trip through the DB reader (which already expects JSON). The
  previous comma-joined form was lost to serde_json::from_str. Tags were
  dropped entirely before; now they round-trip too.
* commands/rules.rs: delete_rule now also removes the backing .md file,
  so a UI delete is not resurrected by the new scanner as 'auto-detected'.
  Factor the logic behind delete_rule_inner(conn, id, home) for testing.

Not touched (different shapes, out of scope):
* scan_*_skills (directory-based with skill_files child table).
* scan_*_hooks (JSON inside settings.json).
* Vendor scanners (opencode/codex/copilot/cursor/gemini).

Tests:
* 18 new tests across the four files (walk_md_dir edge cases,
  parse_rule_file, parse_list_value dual-form, scan_global_rules insert/
  idempotent/backfill/symlink, scan_project_rules assignment,
  rule_writer JSON emission for paths and tags, delete_rule_inner with
  disk-present and disk-absent).
* Full suite: 2010 pass, 0 fail.
@prefrontalsys prefrontalsys force-pushed the fix/scan-rules-and-dedup-flat-md-scanners branch from 1f65439 to 310524d Compare April 21, 2026 15:35
…g them

Addresses review nit on tylergraydev#200. The previous
`.ok().and_then(|s| s.query_row(..).ok())` pattern silently conflated
three cases:
  1. row exists → Some(rule) ✓
  2. row absent (QueryReturnedNoRows) → None ✓
  3. real DB error (lock contention, corruption, IO) → None ✗
     → DB delete proceeds, disk file leaks

Switch to an explicit match on rusqlite::Error so only
QueryReturnedNoRows maps to None; every other error is returned before
the DELETE runs. Same treatment for rule_writer::delete_rule_file — its
existing file-exists guard already handles the legitimate "disk file
already gone" case, so any error it raises now is a real IO/permission
failure worth surfacing.

Adds test_delete_rule_missing_id_is_not_an_error to pin the
QueryReturnedNoRows arm. 2015/2015 lib tests pass.
@prefrontalsys
Copy link
Copy Markdown
Contributor Author

Rebased onto current main (v3.8.5, e32d638) and folded in the delete_rule_inner fix. Two commits now on the branch:

  • 310524d — the original PR, replayed cleanly onto main (zero conflicts — the three intervening upstream commits landed after I branched from v3.8.3, and none touched the five files this PR modifies).
  • 1c9f2a4 — addresses the review nit on delete_rule_inner error handling. Replaces the .ok().and_then(|s| query_row(..).ok()) pattern with an explicit match on rusqlite::Error: only QueryReturnedNoRows collapses to None, every other error (lock contention, corruption, IO) is surfaced before the DB DELETE runs, so we can't orphan a disk file by nulling out the row without knowing the filename. Same tightening on rule_writer::delete_rule_file — its existing if path.exists() guard already handles the legitimate "disk file already gone" case, so any error it raises now is a real IO/permission failure worth surfacing. Added test_delete_rule_missing_id_is_not_an_error to pin the QueryReturnedNoRows arm. Lib tests: 2015/2015 pass locally.

On the 4 failing build checks

These are not regressions from this branch — they're the standard "fork PR can't see repo secrets" pattern. The Rust compilation itself succeeds (6m19s on the macOS x86 job, Finished release profile [optimized]). The failure lands at the updater-signing step:

failed to decode secret key: incorrect updater private key password: Missing comment in secret key
##[error]Command "npm [... tauri build ... --bundles app,updater]" failed with exit code 1

GitHub Actions deliberately withholds TAURI_SIGNING_PRIVATE_KEY / TAURI_SIGNING_PRIVATE_KEY_PASSWORD from workflows triggered by pull_request events from a fork. Evidence that this is environmental, not code-level:

  • Same workflow, same build, passes on main. The push-triggered Build run at e32d638 (v3.8.5, the SHA we're now rebased on top of) passed all four platforms in ~17 minutes — run 24645213015. The only delta from that run to ours is the secret scope.
  • Identical failure pattern on prefrontalsys fork PRs that later merged cleanly. PR Fix: append .bak extension instead of replacing file extension #199 from this same fork (fix/backup-extension-handling, which you merged on Apr 14) shows all four build (…) jobs as FAILURE with the same signing error, while its Run Rust Tests jobs pass. The post-merge Build run on main at b3fa2bc then succeeded because the merge commit was push-triggered and could see the secrets.

The passing checks on this PR (Run Rust Tests across all three platforms, Clippy, Format, frontend-tests) exercise the actual code changes and all pass.

Clippy follow-up

Will open a separate cleanup PR for the pre-existing clippy warnings per your preference — that keeps this PR's diff focused on the scanner work.

@tylergraydev tylergraydev merged commit 0927b9d into tylergraydev:main Apr 21, 2026
6 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants