Skip to content

Add excerpt/word count to posts#126

Open
fennifith wants to merge 5 commits intomainfrom
post-word-count
Open

Add excerpt/word count to posts#126
fennifith wants to merge 5 commits intomainfrom
post-word-count

Conversation

@fennifith
Copy link
Copy Markdown
Member

@fennifith fennifith commented Mar 22, 2026

Fixes #99

Summary by CodeRabbit

  • New Features

    • Posts now track and persist word count
    • Markdown excerpts auto-populate post descriptions when missing
    • Added "coderpad" as a supported license option
  • Tests

    • Added tests validating markdown excerpt extraction behavior
  • Chores

    • Added markdown/AST processing dependencies
    • Database schema extended with a word count column
  • Validation

    • Metadata schema tightened with stricter date/url formats and optional description

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 22, 2026

Warning

Rate limit exceeded

@fennifith has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 9 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 68ec0b21-9b92-49ba-8d7b-be7692f10628

📥 Commits

Reviewing files that changed from the base of the PR and between eeef5b7 and 478f753.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • apps/worker/package.json
  • apps/worker/src/utils/extractMarkdownExcerpt.ts
📝 Walkthrough

Walkthrough

Adds markdown excerpt extraction and word-count calculation to post sync: new extractor util and tests, markdown/AST deps, processor changes to derive description and wordCount, DB migration and schema update adding post_data.word_count, and stricter post metadata schema fields.

Changes

Cohort / File(s) Summary
Markdown excerpt util
apps/worker/src/utils/extractMarkdownExcerpt.ts, apps/worker/src/utils/extractMarkdownExcerpt.test.ts
New extractMarkdownExcerpt exported function that parses markdown with unified/remark, walks the AST, builds a plaintext excerpt with optional truncation and ellipsis. Includes 3 unit tests (format stripping, list handling, truncation).
Worker dependencies
apps/worker/package.json
Added markdown/AST processing packages: unified, remark-parse, remark-gfm, hast-util-to-string, unist-util-is.
Post sync processor
apps/worker/src/tasks/sync-post/processor.ts, apps/worker/src/tasks/sync-post/processor.test.ts
Processor now parses markdown (gray-matter) to get content, fills missing description from excerpt, computes wordCount (whitespace-split), threads wordCount through localeData and persists it. Test updated to expect wordCount: 10.
Post metadata types
apps/worker/src/tasks/sync-post/types.ts
Tighter schema: published and edited use date-time format, originalLink uses url format, description became optional (no default), and license union adds coderpad.
Database schema & migration
packages/db/src/schema/posts.ts, packages/db/drizzle/20260322121256_illegal_rick_jones/migration.sql, packages/db/drizzle/.../snapshot.json
Added word_count (integer NOT NULL DEFAULT 0) to post_data via new Drizzle migration; schema export updated and snapshot added.
Workspace catalog
pnpm-workspace.yaml
Minor reordering of catalog entries (typebox entry moved).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant GitHub as GitHub (repo/content)
participant Worker as Worker Processor
participant Extractor as extractMarkdownExcerpt
participant DB as Database (post_data)
GitHub->>Worker: Provide post file + frontmatter
Worker->>Extractor: Parse markdown, extract excerpt & compute wordCount
Extractor-->>Worker: Excerpt, wordCount
Worker->>DB: Insert/Update post_data (includes description, rawMarkdown, wordCount)
DB-->>Worker: Ack

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • update drizzle?? #111: Modifies packages/db/src/schema/posts.ts / DB schema; likely related to the word_count column changes.
  • initial e2e tests #113: Changes worker dependencies (gray-matter/sharp resolution); related to added markdown/AST packages in apps/worker/package.json.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding word count and excerpt functionality to posts.
Linked Issues check ✅ Passed All objectives from issue #99 are met: word count column added to database and computed during sync, excerpt extraction implemented via markdown parsing, and logic reused/uplifted as requested.
Out of Scope Changes check ✅ Passed All changes directly support the stated objective. Dependency additions, schema updates, processor changes, and new utility functions are all necessary to implement excerpt/word count functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch post-word-count

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
apps/worker/src/tasks/sync-post/processor.test.ts (1)

61-73: Add one integration case without frontmatter description.

This fixture still exercises only the authored-description path. A body-only markdown case would lock in the generated excerpt fallback and also cover the new simple-body word-count path.

Also applies to: 99-115

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/worker/src/tasks/sync-post/processor.test.ts` around lines 61 - 73, Add
a new integration test in processor.test.ts that uses a markdown fixture missing
the frontmatter "description" field (only frontmatter title/tags/date or even no
frontmatter) and a body-only markdown content; in the new test assert that the
processor falls back to generating an excerpt from the body (excerpt equals
expected first N words/sentence) and that the word-count path uses the
simple-body fallback (check whatever function output/field indicates wordCount
or excerpt source). Locate nearby tests that use the `data` fixture and mirror
their setup/assertions (same test suite) so the new case covers the
authored-description-missing branch and the simple-body word-count/excerpt
fallback.
packages/db/drizzle/20260322121256_illegal_rick_jones/snapshot.json (1)

394-405: Enforce non-negative word_count at the database layer.

Line 394 defines word_count as integer with default 0, but without a check constraint negative values can still be written.

Suggested constraint
+ALTER TABLE "post_data"
+ADD CONSTRAINT "post_data_word_count_non_negative_chk"
+CHECK ("word_count" >= 0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/db/drizzle/20260322121256_illegal_rick_jones/snapshot.json` around
lines 394 - 405, The snapshot shows the post_data.column word_count is integer
with default 0 but no constraint preventing negatives; add a database-level
CHECK constraint (e.g., ck_post_data_word_count_nonnegative or
post_data_word_count_nonnegative) that enforces word_count >= 0 and include it
in the schema/migration that defines/updates the post_data table (or in the
Drizzle table definition for post_data) so the constraint is applied at the DB
level while keeping the default 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/worker/src/tasks/sync-post/processor.ts`:
- Around line 101-106: Replace the current word-count calculation that splits
raw markdown (content.split(/\s+/).length) with a count derived from extracted
plain text: use the parsed plain-text field (parsed.text or whichever parsed
property holds plaintext) falling back to a markdown-stripped version of
content, call .trim() and then if the resulting string is empty set wordCount =
0, otherwise set wordCount = plain.split(/\s+/).length; update the assignment of
wordCount (the variable in the processor function) accordingly so empty-body
fixtures in processor.test.ts produce 0.

In `@apps/worker/src/utils/extractMarkdownExcerpt.ts`:
- Around line 7-20: The code is using hast types/utilities while
unifiedChain.parse returns mdast; update the imports and type checks to mdast
equivalents: import types from "mdast" (use mdast Content/Nodes types) and
import toString from "mdast-util-to-string" instead of hast versions, and ensure
unifiedChain is typed for mdast; in the type guard function isTextOrCode (and
any use of is(...)), replace hast node names with mdast names (e.g., "listitem"
-> "listItem") and adjust the union of node kinds to mdast node types like
"text", "code", "link", "listItem", "inlineCode" so the list-item branch becomes
reachable and type-safe.

In `@packages/db/drizzle/20260322121256_illegal_rick_jones/snapshot.json`:
- Around line 381-405: The snapshot for the "post_data" table is missing a
dedicated "excerpt" column: add a new not-null text column named "excerpt"
(similar to "description") to the post_data schema alongside "word_count" and
"description", update whatever migration or schema definition that defines the
post_data table (e.g., the createTable or table schema that contains post_data)
to include excerpt, run the migration/regeneration step to rebuild Drizzle
artifacts and regenerate snapshot.json so the new "excerpt" column appears in
the snapshot.

In `@packages/db/src/schema/posts.ts`:
- Around line 32-34: The posts schema lacks an excerpt column so generated
excerpts are never persisted: add a new column (e.g., excerpt:
text("excerpt").notNull().default("") or nullable per product choice) alongside
description and wordCount in packages/db/src/schema/posts.ts, add a
corresponding DB migration to create the excerpt column, and update all
write/read paths that populate post_data (including the worker that currently
only backfills description) so it writes the excerpt regardless of whether an
authored description exists and reads it when returning posts.

In `@pnpm-workspace.yaml`:
- Around line 6-15: Reformat pnpm-workspace.yaml with the repository’s Prettier
configuration so it matches CI expectations: open pnpm-workspace.yaml and run
the project’s formatter (the repo’s Prettier/format script) to rewrite the file
using the repo config, then stage and commit the updated file so CI no longer
reports formatting errors.

---

Nitpick comments:
In `@apps/worker/src/tasks/sync-post/processor.test.ts`:
- Around line 61-73: Add a new integration test in processor.test.ts that uses a
markdown fixture missing the frontmatter "description" field (only frontmatter
title/tags/date or even no frontmatter) and a body-only markdown content; in the
new test assert that the processor falls back to generating an excerpt from the
body (excerpt equals expected first N words/sentence) and that the word-count
path uses the simple-body fallback (check whatever function output/field
indicates wordCount or excerpt source). Locate nearby tests that use the `data`
fixture and mirror their setup/assertions (same test suite) so the new case
covers the authored-description-missing branch and the simple-body
word-count/excerpt fallback.

In `@packages/db/drizzle/20260322121256_illegal_rick_jones/snapshot.json`:
- Around line 394-405: The snapshot shows the post_data.column word_count is
integer with default 0 but no constraint preventing negatives; add a
database-level CHECK constraint (e.g., ck_post_data_word_count_nonnegative or
post_data_word_count_nonnegative) that enforces word_count >= 0 and include it
in the schema/migration that defines/updates the post_data table (or in the
Drizzle table definition for post_data) so the constraint is applied at the DB
level while keeping the default 0.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05f27e3f-789b-4937-a3d3-018d843f4134

📥 Commits

Reviewing files that changed from the base of the PR and between 66ba7e1 and 6fd8ff3.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (10)
  • apps/worker/package.json
  • apps/worker/src/tasks/sync-post/processor.test.ts
  • apps/worker/src/tasks/sync-post/processor.ts
  • apps/worker/src/tasks/sync-post/types.ts
  • apps/worker/src/utils/extractMarkdownExcerpt.test.ts
  • apps/worker/src/utils/extractMarkdownExcerpt.ts
  • packages/db/drizzle/20260322121256_illegal_rick_jones/migration.sql
  • packages/db/drizzle/20260322121256_illegal_rick_jones/snapshot.json
  • packages/db/src/schema/posts.ts
  • pnpm-workspace.yaml

@fennifith fennifith enabled auto-merge March 22, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create & populate word count and excerpt columns on the posts table

1 participant