Skip to content

Add SEO infrastructure and AI discoverability#210

Merged
igerber merged 3 commits intomainfrom
seo
Mar 18, 2026
Merged

Add SEO infrastructure and AI discoverability#210
igerber merged 3 commits intomainfrom
seo

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 18, 2026

Summary

  • Switch Sphinx theme from RTD to PyData Sphinx Theme (breadcrumbs, mobile, dark mode)
  • Add sphinxext-opengraph for OG meta tags, auto-generated social cards, and meta descriptions on every page
  • Add sphinx-sitemap for sitemap.xml generation
  • Add nbsphinx to integrate all 15 Jupyter tutorials into the Sphinx toctree as searchable HTML pages
  • Add .. meta:: directives with targeted descriptions/keywords to 7 high-value pages
  • Add docs/llms.txt and docs/llms-full.txt for AI crawler discoverability
  • Add CITATION.cff for GitHub "Cite this repository" and Google Scholar indexing
  • Add Schema.org JSON-LD structured data via docs/_templates/layout.html
  • Expand PyPI keywords (5 → 15), improve description, add classifiers
  • Add README badges (PyPI, Python versions, license, downloads, docs) and citation section
  • Update GitHub repo description, topics, and homepage URL
  • Migrate RTD version-aware warning banner to PyData's native announcement bar
  • Add Sphinx duplicate-object-description warning cleanup to TODO.md

Methodology references

  • N/A — no methodology changes

Validation

  • Tests added/updated: No test changes (docs/config only)
  • Full test suite passes: 1807 passed, 67 skipped, 59 deselected
  • Docs build verified locally: sphinx-build -b html docs docs/_build/html succeeds
  • Verified: meta descriptions, OG tags, JSON-LD, sitemap.xml, social cards (140 images), all 15 tutorial HTML pages, llms.txt served at root

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Switch to PyData Sphinx Theme, add sphinxext-opengraph (OG meta tags,
social cards, meta descriptions), sphinx-sitemap, and nbsphinx. Add
llms.txt/llms-full.txt for AI crawlers, CITATION.cff for academic
citation, Schema.org JSON-LD structured data, and meta directives on
high-value pages. Integrate all 15 Jupyter tutorials into the Sphinx
toctree. Expand PyPI keywords/classifiers and add README badges.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

Overall Assessment

⚠️ Needs changes

The PR does not modify estimator implementations, weighting, or variance code, but it does introduce new public AI-discoverability documentation that now diverges from the implemented methodology for ContinuousDiD and TROP. Those are unmitigated P1 issues.

Executive Summary

  • No estimator code changed; the blocking issues are in the new llms discoverability content.
  • ContinuousDiD is documented as supporting covariates and aggregate modes that the library does not implement.
  • TROP is documented as taking a generic time-varying treatment indicator, but the method requires an absorbing-state treatment path.
  • The new llms-full.txt also already drifts from actual result schemas, which indicates the file will be hard to keep correct without generation or validation.
  • The new SEO config hardcodes/inconsistently hardcodes canonical URLs, which is non-blocking but likely to emit mixed metadata across versioned docs builds.

Methodology

No estimator implementation changed in this PR; the affected methods are ContinuousDiD and TROP in the new discoverability docs.

  • Severity P1. docs/llms-full.txt documents ContinuousDiD.fit() with a covariates argument and aggregate values None/"simple"/"dose"/"event_study", but the implementation in diff_diff/continuous_did.py accepts no covariates parameter and only allows aggregate in {None, "dose", "eventstudy"}. The Methodology Registry explicitly says covariate support is deferred in docs/methodology/REGISTRY.md. Impact: the new AI-facing source materially overstates implemented methodology and will generate failing or methodologically incorrect user/LLM code. Concrete fix: make the ContinuousDiD section match the live signature/options exactly and state that covariate support is deferred.
  • Severity P1. docs/llms-full.txt describes TROP.fit(..., treatment=...) as accepting a treatment indicator that “can vary over time,” but both the implementation diff_diff/trop.py and the registry docs/methodology/REGISTRY.md require an absorbing-state treatment path and call out non-absorbing D as bias-inducing misuse. The same block also drifts from the actual constructor defaults by omitting the 10.0 entry from lambda_nn_grid shown in diff_diff/trop.py. Impact: this weakens a core identification/input assumption in a file specifically intended for AI consumption. Concrete fix: document treatment as an absorbing-state indicator and sync the defaults table to TROP.__init__.

Code Quality

  • No additional code-quality findings in the changed code/config beyond the documentation mismatches above.

Performance

  • No performance findings in the changed paths.

Maintainability

  • Severity P2. The new discoverability URL configuration is hardcoded and inconsistent: docs/conf.py and docs/conf.py force stable URLs for sitemap/OpenGraph, while docs/_templates/layout.html hardcodes the unversioned docs root in JSON-LD. Impact: versioned docs builds can emit mixed canonical/discoverability metadata. Concrete fix: compute one per-build base URL from the RTD build context and reuse it for html_baseurl, ogp_site_url, and JSON-LD.

Tech Debt

  • Severity P3. The duplicate-object-description warning cleanup added in TODO.md is appropriately tracked deferred work. Impact: none for merge gating. Concrete fix: no action required in this PR.

Security

  • No findings in the changed files.

Documentation/Tests

  • Severity P2. docs/llms-full.txt already misstates ContinuousDiDResults as following the generic staggered-results pattern (overall_se, group_effects, event_study naming), but the real dataclass in diff_diff/continuous_did_results.py exposes separate ATT/ACRT inference fields and no group_effects. Impact: the new public AI-facing schema will generate incorrect attribute access patterns. Concrete fix: give ContinuousDiD its own results block or generate/validate llms*.txt from source so signatures, enums, and result fields cannot drift silently.

Path to Approval

  1. Fix the ContinuousDiD section in docs/llms-full.txt so it matches the implemented API and registry: remove covariates, use the exact aggregate names None, "dose", and "eventstudy", and correct the ContinuousDiDResults description.
  2. Fix the TROP section in docs/llms-full.txt so treatment is documented as an absorbing-state indicator, not generic time variation, and sync the documented defaults to TROP.__init__ including lambda_nn_grid.

Address AI review P1/P2 findings: remove fabricated covariates param
from ContinuousDiD.fit(), fix aggregate enum values to match source
(None/dose/eventstudy), document TROP treatment as absorbing-state
indicator, fix lambda_nn_grid default, give ContinuousDiDResults its
own results section with correct ATT/ACRT schema, and use RTD's
READTHEDOCS_CANONICAL_URL for version-aware html_baseurl/ogp_site_url.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Mar 18, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 888eb11f1fd1067f2d54ecfacb7562bcc4c45050


Overall Assessment

✅ Looks good

All prior P1 methodology findings from the previous review appear resolved. The remaining issues are non-blocking P2/P3 documentation and metadata consistency items.

Executive Summary

  • Re-review: the prior ContinuousDiD and TROP methodology blockers appear fixed. ContinuousDiD now matches the implemented fit() signature/options in docs/llms-full.txt:L461 and diff_diff/continuous_did.py:L153; TROP now documents the required absorbing-state treatment indicator consistently with docs/llms-full.txt:L594, diff_diff/trop.py:L1508, and docs/methodology/REGISTRY.md:L1243.
  • No estimator implementation, weighting logic, or inference code changed in this PR.
  • llms-full still has one schema drift: TROPResults.n_treated and n_control are documented as observation counts, but the actual dataclass stores unit counts and exposes n_treated_obs separately.
  • The new SEO metadata still has a split source of truth for site URLs: docs/conf.py uses READTHEDOCS_CANONICAL_URL, while the JSON-LD template hardcodes the root docs URL.
  • The duplicate-object-description warning cleanup is properly tracked in TODO.md and is non-blocking.

Methodology

No findings. The previous ContinuousDiD and TROP methodology issues appear resolved in docs/llms-full.txt:L461, docs/llms-full.txt:L594, diff_diff/continuous_did.py:L153, diff_diff/trop.py:L1508, and docs/methodology/REGISTRY.md:L1243.

Code Quality

No findings.

Performance

No findings.

Maintainability

  • Severity P2. Impact: versioned docs builds can emit mixed discoverability metadata because docs/conf.py:L69 and docs/conf.py:L103 derive canonical/OpenGraph URLs from READTHEDOCS_CANONICAL_URL, but docs/_templates/layout.html:L13 hardcodes "https://diff-diff.readthedocs.io". Concrete fix: pass the same canonical/base URL into the template and reuse it for JSON-LD url instead of maintaining a second hardcoded value.

Tech Debt

  • Severity P3. Impact: the duplicate-object-description warning cleanup is explicitly tracked in TODO.md:L66, so this is appropriately deferred rather than merge-blocking. Concrete fix: no action required in this PR.

Security

No findings.

Documentation/Tests

Relabel n_treated/n_control as unit counts and add n_treated_obs
to match trop_results.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit 3340cc7 into main Mar 18, 2026
10 checks passed
@igerber igerber deleted the seo branch March 18, 2026 21:21
@igerber igerber mentioned this pull request Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant