Add SEO infrastructure and AI discoverability by igerber · Pull Request #210 · igerber/diff-diff

igerber · 2026-03-18T19:19:56Z

Summary

Switch Sphinx theme from RTD to PyData Sphinx Theme (breadcrumbs, mobile, dark mode)
Add sphinxext-opengraph for OG meta tags, auto-generated social cards, and meta descriptions on every page
Add sphinx-sitemap for sitemap.xml generation
Add nbsphinx to integrate all 15 Jupyter tutorials into the Sphinx toctree as searchable HTML pages
Add .. meta:: directives with targeted descriptions/keywords to 7 high-value pages
Add docs/llms.txt and docs/llms-full.txt for AI crawler discoverability
Add CITATION.cff for GitHub "Cite this repository" and Google Scholar indexing
Add Schema.org JSON-LD structured data via docs/_templates/layout.html
Expand PyPI keywords (5 → 15), improve description, add classifiers
Add README badges (PyPI, Python versions, license, downloads, docs) and citation section
Update GitHub repo description, topics, and homepage URL
Migrate RTD version-aware warning banner to PyData's native announcement bar
Add Sphinx duplicate-object-description warning cleanup to TODO.md

Methodology references

N/A — no methodology changes

Validation

Tests added/updated: No test changes (docs/config only)
Full test suite passes: 1807 passed, 67 skipped, 59 deselected
Docs build verified locally: sphinx-build -b html docs docs/_build/html succeeds
Verified: meta descriptions, OG tags, JSON-LD, sitemap.xml, social cards (140 images), all 15 tutorial HTML pages, llms.txt served at root

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Switch to PyData Sphinx Theme, add sphinxext-opengraph (OG meta tags, social cards, meta descriptions), sphinx-sitemap, and nbsphinx. Add llms.txt/llms-full.txt for AI crawlers, CITATION.cff for academic citation, Schema.org JSON-LD structured data, and meta directives on high-value pages. Integrate all 15 Jupyter tutorials into the Sphinx toctree. Expand PyPI keywords/classifiers and add README badges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-18T19:27:40Z

Overall Assessment

⚠️ Needs changes

The PR does not modify estimator implementations, weighting, or variance code, but it does introduce new public AI-discoverability documentation that now diverges from the implemented methodology for ContinuousDiD and TROP. Those are unmitigated P1 issues.

Executive Summary

No estimator code changed; the blocking issues are in the new llms discoverability content.
ContinuousDiD is documented as supporting covariates and aggregate modes that the library does not implement.
TROP is documented as taking a generic time-varying treatment indicator, but the method requires an absorbing-state treatment path.
The new llms-full.txt also already drifts from actual result schemas, which indicates the file will be hard to keep correct without generation or validation.
The new SEO config hardcodes/inconsistently hardcodes canonical URLs, which is non-blocking but likely to emit mixed metadata across versioned docs builds.

Methodology

No estimator implementation changed in this PR; the affected methods are ContinuousDiD and TROP in the new discoverability docs.

Severity P1. docs/llms-full.txt documents ContinuousDiD.fit() with a covariates argument and aggregate values None/"simple"/"dose"/"event_study", but the implementation in diff_diff/continuous_did.py accepts no covariates parameter and only allows aggregate in {None, "dose", "eventstudy"}. The Methodology Registry explicitly says covariate support is deferred in docs/methodology/REGISTRY.md. Impact: the new AI-facing source materially overstates implemented methodology and will generate failing or methodologically incorrect user/LLM code. Concrete fix: make the ContinuousDiD section match the live signature/options exactly and state that covariate support is deferred.
Severity P1. docs/llms-full.txt describes TROP.fit(..., treatment=...) as accepting a treatment indicator that “can vary over time,” but both the implementation diff_diff/trop.py and the registry docs/methodology/REGISTRY.md require an absorbing-state treatment path and call out non-absorbing D as bias-inducing misuse. The same block also drifts from the actual constructor defaults by omitting the 10.0 entry from lambda_nn_grid shown in diff_diff/trop.py. Impact: this weakens a core identification/input assumption in a file specifically intended for AI consumption. Concrete fix: document treatment as an absorbing-state indicator and sync the defaults table to TROP.__init__.

Code Quality

No additional code-quality findings in the changed code/config beyond the documentation mismatches above.

Performance

No performance findings in the changed paths.

Maintainability

Severity P2. The new discoverability URL configuration is hardcoded and inconsistent: docs/conf.py and docs/conf.py force stable URLs for sitemap/OpenGraph, while docs/_templates/layout.html hardcodes the unversioned docs root in JSON-LD. Impact: versioned docs builds can emit mixed canonical/discoverability metadata. Concrete fix: compute one per-build base URL from the RTD build context and reuse it for html_baseurl, ogp_site_url, and JSON-LD.

Tech Debt

Severity P3. The duplicate-object-description warning cleanup added in TODO.md is appropriately tracked deferred work. Impact: none for merge gating. Concrete fix: no action required in this PR.

Security

No findings in the changed files.

Documentation/Tests

Severity P2. docs/llms-full.txt already misstates ContinuousDiDResults as following the generic staggered-results pattern (overall_se, group_effects, event_study naming), but the real dataclass in diff_diff/continuous_did_results.py exposes separate ATT/ACRT inference fields and no group_effects. Impact: the new public AI-facing schema will generate incorrect attribute access patterns. Concrete fix: give ContinuousDiD its own results block or generate/validate llms*.txt from source so signatures, enums, and result fields cannot drift silently.

Path to Approval

Fix the ContinuousDiD section in docs/llms-full.txt so it matches the implemented API and registry: remove covariates, use the exact aggregate names None, "dose", and "eventstudy", and correct the ContinuousDiDResults description.
Fix the TROP section in docs/llms-full.txt so treatment is documented as an absorbing-state indicator, not generic time variation, and sync the documented defaults to TROP.__init__ including lambda_nn_grid.

Address AI review P1/P2 findings: remove fabricated covariates param from ContinuousDiD.fit(), fix aggregate enum values to match source (None/dose/eventstudy), document TROP treatment as absorbing-state indicator, fix lambda_nn_grid default, give ContinuousDiDResults its own results section with correct ATT/ACRT schema, and use RTD's READTHEDOCS_CANONICAL_URL for version-aware html_baseurl/ogp_site_url. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-03-18T20:19:59Z

/ai-review

github-actions · 2026-03-18T20:26:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 888eb11f1fd1067f2d54ecfacb7562bcc4c45050

Overall Assessment

✅ Looks good

All prior P1 methodology findings from the previous review appear resolved. The remaining issues are non-blocking P2/P3 documentation and metadata consistency items.

Executive Summary

Re-review: the prior ContinuousDiD and TROP methodology blockers appear fixed. ContinuousDiD now matches the implemented fit() signature/options in docs/llms-full.txt:L461 and diff_diff/continuous_did.py:L153; TROP now documents the required absorbing-state treatment indicator consistently with docs/llms-full.txt:L594, diff_diff/trop.py:L1508, and docs/methodology/REGISTRY.md:L1243.
No estimator implementation, weighting logic, or inference code changed in this PR.
llms-full still has one schema drift: TROPResults.n_treated and n_control are documented as observation counts, but the actual dataclass stores unit counts and exposes n_treated_obs separately.
The new SEO metadata still has a split source of truth for site URLs: docs/conf.py uses READTHEDOCS_CANONICAL_URL, while the JSON-LD template hardcodes the root docs URL.
The duplicate-object-description warning cleanup is properly tracked in TODO.md and is non-blocking.

Methodology

No findings. The previous ContinuousDiD and TROP methodology issues appear resolved in docs/llms-full.txt:L461, docs/llms-full.txt:L594, diff_diff/continuous_did.py:L153, diff_diff/trop.py:L1508, and docs/methodology/REGISTRY.md:L1243.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P2. Impact: versioned docs builds can emit mixed discoverability metadata because docs/conf.py:L69 and docs/conf.py:L103 derive canonical/OpenGraph URLs from READTHEDOCS_CANONICAL_URL, but docs/_templates/layout.html:L13 hardcodes "https://diff-diff.readthedocs.io". Concrete fix: pass the same canonical/base URL into the template and reuse it for JSON-LD url instead of maintaining a second hardcoded value.

Tech Debt

Severity P3. Impact: the duplicate-object-description warning cleanup is explicitly tracked in TODO.md:L66, so this is appropriately deferred rather than merge-blocking. Concrete fix: no action required in this PR.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new AI-facing llms-full schema still drifts from the actual TROP results object. docs/llms-full.txt:L936 and docs/llms-full.txt:L937 describe n_treated/n_control as observation counts, but diff_diff/trop_results.py:L88, diff_diff/trop_results.py:L90, and diff_diff/trop_results.py:L92 define treated/control unit counts plus a separate n_treated_obs. Concrete fix: relabel n_treated and n_control as unit counts and add n_treated_obs if observation-level counts should be exposed in llms-full.

Relabel n_treated/n_control as unit counts and add n_treated_obs to match trop_results.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix TROPResults schema: unit counts not observation counts

6677b73

Relabel n_treated/n_control as unit counts and add n_treated_obs to match trop_results.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber merged commit 3340cc7 into main Mar 18, 2026
10 checks passed

igerber deleted the seo branch March 18, 2026 21:21

igerber mentioned this pull request Mar 18, 2026

Release v2.7.2 #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SEO infrastructure and AI discoverability#210

Add SEO infrastructure and AI discoverability#210
igerber merged 3 commits intomainfrom
seo

igerber commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

igerber commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 18, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

igerber commented Mar 18, 2026

Uh oh!

github-actions bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant