skip to content

Overview

This post is about a system I made for people to share comments on my “blog”/personal site. All this system really has is just a way to send and view comments with some basic features. Everything was built to be sustained by me for a pretty long time.

What I mean by small

Small doesn’t mean “no security” or “no moderation.” Small just means to keep within my ability to maintain it.

Here are the development constraints I set for myself:

  1. One thread per blog post, created with my blog rss.
  2. Replies are regular comments with a parentCommentId and a depth.
  3. Markdown is allowed, raw HTML is not.
  4. Mutations require Origin allowlist + CSRF token + auth.

If you’re building something similar, this is the difference between “I can maintain this” and “I can’t touch this without opening ten tabs.”

Architecture: flows before tables

I’m going to describe the system the way it behaves, because the database schema makes more sense once you can picture the request paths.

1) Viewer loads a post

  • The blog frontend asks the comments service to “resolve” the post slug into a threadId.
  • Then it lists comments for that thread.
  • If the viewer is signed in, the list includes whether they personally liked each comment.

2) Viewer signs in

  • The frontend sends you to /auth/github/start.
  • That sets up OAuth PKCE state in the database.
  • GitHub redirects back to /auth/github/callback.
  • The service upserts a User row, creates a Session row, and sets a session cookie.

3) Viewer writes anything

  1. Check Origin allowlist (and require Origin in production).
  2. Check CSRF token (cookie + header).
  3. Check auth session.
  4. Apply rate limits.
  5. Do the write.

Data model

Tables:

  • User — GitHub identity + moderation flags (isAdmin, isBanned, reasons)
  • Session — server-side session with expiresAt, revokedAt, lastUsedAt
  • Thread — unique (siteKey, resourceType, resourceId) so one post maps cleanly
  • Comment — parent pointer, depth, markdown body, HTML body, edit/delete timestamps
  • CommentReaction — unique (commentId, userId, reaction) (reaction currently only like)
  • OAuthState — PKCE state (short lived), stores codeVerifier + returnTo
  • PrebannedUser — “don’t ever let this login/id in” list

Soft delete fields

Instead of instantly hard-deleting, a comment has:

  • deletedAt
  • deletedBy

I then have a cron-job that deletes comments older than 72 hours.

Thread resolution (RSS-gated, on purpose)

The resolve endpoint takes a tuple:

  • siteKey (basically “which site are we talking about?”)
  • resourceType (currently only post)
  • resourceId (the slug)

and returns a threadId. The upsert is normal. The guardrail is not.

The guardrail: only allow real posts

Instead of letting any resourceId create a thread, the service fetches the site’s RSS feed, extracts slugs, and only permits ones that exist. And that prevents:

  • someone creating threads for random slugs forever
  • your database becoming a graveyard of junk rows
Loading chart...

Why PKCE

PKCE lets you avoid storing a client secret in the browser while still getting a strong proof that the callback belongs to the flow you started. So it looks like:

  • /auth/github/start generates:
    • state (random)
    • codeVerifier
    • codeChallenge
  • it stores { state, codeVerifier, returnTo, expiresAt } in OAuthState
  • it redirects to GitHub with { state, codeChallenge }

The returnTo problem

Any OAuth flow has a “where do we go after login?” question. So the returnTo URL is validated against an allowlist of known origins:

  • the blog origins
  • plus the service origin itself

Sessions here are not JWTs. They’re server-side rows in Postgres. The cookie is basically: lh_comments_session=<uuid>

On each authenticated request with a lastUsedAt update in the background:

  1. read cookie
  2. find session row
  3. verify not revoked
  4. verify not expired
  5. return user

Why server-side sessions?

  • easy revocation (log out, revoke)
  • easy ban enforcement (don’t trust old tokens)
  • less complexity around token invalidation

JWTs are fine. I just didn’t want to carry that complexity for a blog comments service.

2) Mutation gating

Even with CORS configured, the service still enforces Origin checks on writes. In production-like mode, it requires an Origin header. If something non-browser hits your endpoints, you still want the server to be the authority.

// Pseudocode shaped like the real route guard
export async function mutationAllowed(request: NextRequest) {
const origin = request.headers.get('origin');
if (env.NODE_ENV === 'production') {
if (!origin) return { ok: false, code: 'MUTATION_ORIGIN_REQUIRED' };
if (!isAllowedOrigin(origin)) return { ok: false, code: 'MUTATION_ORIGIN_NOT_ALLOWED' };
} else {
if (origin && !isAllowedOrigin(origin)) {
return { ok: false, code: 'MUTATION_ORIGIN_NOT_ALLOWED' };
}
}
// CSRF check happens here too (next section)
return { ok: true };
}

The mechanism

This service uses a CSRF cookie plus a request header:

  • cookie: csrf_token=<random>
  • header: X-CSRF-Token: <same random>

The server checks:

  1. cookie token exists
  2. header token exists
  3. same length (fails early)
  4. constant-time equality (prevents timing leaks)

If any of that fails, the mutation is blocked.

const CSRF_COOKIE = 'csrf_token';
export async function verifyCsrf(request: NextRequest) {
const cookieToken = (await cookies()).get(CSRF_COOKIE)?.value;
const headerToken = request.headers.get('x-csrf-token');
if (!cookieToken || !headerToken) return false;
if (cookieToken.length !== headerToken.length) return false;
// Constant-time compare to avoid leaking info via timing
return crypto.timingSafeEqual(
Buffer.from(cookieToken),
Buffer.from(headerToken)
);
}

How the client gets the token

The /v1/me endpoint returns:

  • user (or null)
  • csrfToken

So the client pattern is:

  1. call /v1/me on load
  2. keep csrfToken in memory
  3. attach it to every write in X-CSRF-Token

Why this fails in real life

The failures are usually mundane:

  • Your frontend forgot to include the header on one request.
  • You’re testing in a new origin and forgot to add it to the allowlist.
  • Cookies aren’t being set because you’re mixing http and https.
  • You tried to mutate without calling /v1/me first.
Loading chart...

GET /comments list latency (p50)

Loading chart...

Most of the improvement here is “stop doing extra work.” Stuff like:

  • returning bodyHtml instead of re-rendering
  • grouping likes in one query
  • keeping response shapes consistent so clients don’t do follow-up calls

GET /comments list latency (p95)

Loading chart...

The p95 tells on you.

It’s usually:

  • cold starts
  • a slow DB connection setup
  • a “this one thread has a ton of comments and you forgot a limit” moment

POST /comment create latency (median-ish)

Loading chart...

Creating comments costs a little more because it includes markdown render + sanitize, plus checks. But honestly, I’m okay with that. If writes are a bit heavier, it usually means reads can stay cheap, and on a blog that trade is basically always correct.

Error rate (all endpoints)

Loading chart...

The bump on Day 4 is the kind of thing I’d expect from either:

  • RSS fetch failing (thread resolve refuses to create threads)
  • an origin allowlist mismatch after a domain change
  • an accidentally missing credentials: 'include' on one client fetch

Problems I actually hit (mostly browser-shaped)

This is the part where I admit the “hard” problems weren’t SQL.

Cookies + multiple origins

You end up asking questions like:

  • Is the service on HTTPS?
  • Is the blog on HTTPS?
  • What does the browser think the site boundary is?
  • Are you sending credentials on fetch?

If something breaks, it often looks like “random 401s.” But it’s really “cookie didn’t get sent.”

CSRF token ordering

If the frontend tries to post before fetching /v1/me (and therefore before receiving the CSRF cookie/token), your write fails. Which is correct. But it feels like a bug until you remember you’re protecting yourself from cross-site writes.

Origin allowlist drift

This one is extremely normal:

  • you add a new domain or preview deploy
  • you forget to update the allowlist
  • suddenly writes fail

Rate limits in multi-instance environments

The current rate limiter uses an in-memory map.

That’s fine for a single instance.

If you ever run multiple instances, you want a shared store (Redis, etc.).

Closing

Everything was done intentionally small:

  • OAuth for identity
  • server sessions for control
  • Origin allowlist for request boundaries
  • CSRF for cookie-based write protection
  • markdown + sanitize for UGC safety
  • moderation for reality

It’s basically me trying to build something I won’t hate maintaining. And if you’ve ever maintained a “simple” system that wasn’t actually simple, you know exactly what I mean.