Overview

I built this comments service for my own site. The scope is small on purpose. A reader signs in, opens a thread, writes a comment, replies, and leaves a like. I also need moderation, session control, and enough security work that the system stays safe to run over time.

That phrase, small, needs a clear meaning. Small does not mean weak. Small means I can still understand the whole service, audit the request path, and fix problems without hunting across six systems.

What small means here

I set a few rules early and kept them fixed. Each blog post gets one thread, created from the site RSS feed. Replies are plain comments with parentCommentId and depth. Markdown is allowed. Raw HTML is blocked. Every write goes through Origin checks, CSRF validation, and auth.

Those limits keep the system maintainable. They also remove a lot of hidden scope. When you build a service like this, clear boundaries matter more than feature count.

Architecture, start with the request flow

The schema matters, though the request flow explains the system better. When a reader opens a post, the frontend asks the service to map the post slug to a thread. After that, the client loads the comments for that thread. If the reader is signed in, the response also includes whether that reader liked each comment.

Login follows a standard GitHub OAuth PKCE flow. The service stores the temporary OAuth state, receives the callback, upserts the user, creates a session row, and sets a session cookie. Every write follows the same gate. Check the Origin. Check the CSRF token. Check the session. Apply rate limits. Then run the write.

That order is worth keeping. Cheap rejection first. Stateful work later.

Data model

The tables are simple and each one has a narrow job. User stores GitHub identity and moderation flags. Session stores server-side session state such as expiresAt, revokedAt, and lastUsedAt. Thread maps a (siteKey, resourceType, resourceId) tuple to a single thread. Comment stores the parent pointer, depth, markdown body, rendered HTML, and edit or delete timestamps. CommentReaction stores likes with a unique (commentId, userId, reaction) key. OAuthState stores short-lived PKCE state with codeVerifier and returnTo. PrebannedUser blocks identities before they log in.

That shape is enough for the current feature set. More tables would not make the service safer or easier to run.

Soft delete and cleanup

Comments are soft deleted first. Each row stores deletedAt and deletedBy. A cron job hard deletes items older than 72 hours.

That policy gives moderators room to react without forcing instant data loss. It also keeps the active tables clean over time.

Thread resolution stays RSS-gated on purpose

The resolve endpoint takes a siteKey, resourceType, and resourceId, then returns a threadId. The upsert logic is ordinary. The guardrail is the important part.

The service fetches the site RSS feed, extracts valid slugs, and only allows thread creation for posts that exist. This blocks junk thread creation for random slugs and keeps the database from filling with unused rows.

Loading chart...

Why PKCE

PKCE fits this service well. The client does not need a secret in the browser, though the callback still has proof that the flow matches the request that started earlier. The start route generates state, codeVerifier, and codeChallenge, stores the verifier and return path in OAuthState, then redirects to GitHub with the state and challenge.

This is the right trade for a public web login flow. The browser stays simple and the service keeps the sensitive part of the exchange.

The return path problem

Every OAuth flow needs a safe place to return after login. I validate returnTo against an allowlist of known blog origins plus the service origin itself. Without that check, a login flow turns into an open redirect risk.

Sessions are stored in Postgres. The cookie is only a pointer, lh_comments_session=<uuid>. On each authenticated request the service reads the cookie, loads the session row, checks revocation, checks expiry, and returns the user. lastUsedAt updates in the background.

I chose this over JWTs because revocation is simple, ban enforcement is simple, and session invalidation does not need extra token rules. For a blog comments service, that simplicity is worth more than token portability.

Mutation gating

Every write checks Origin even when CORS is configured. In production mode the service requires an Origin header and rejects anything outside the allowlist. That keeps the server, not the browser, in charge of write boundaries.

// Pseudocode shaped like the real route guard
export async function mutationAllowed(request: NextRequest) {
  const origin = request.headers.get('origin');

  if (env.NODE_ENV === 'production') {
    if (!origin) return { ok: false, code: 'MUTATION_ORIGIN_REQUIRED' };
    if (!isAllowedOrigin(origin)) return { ok: false, code: 'MUTATION_ORIGIN_NOT_ALLOWED' };
  } else {
    if (origin && !isAllowedOrigin(origin)) {
      return { ok: false, code: 'MUTATION_ORIGIN_NOT_ALLOWED' };
    }
  }

  // CSRF check happens here too (next section)
  return { ok: true };
}

The CSRF mechanism

The service uses a CSRF cookie and a request header. The cookie stores csrf_token=<random>. The client sends the same value in X-CSRF-Token. The server checks presence, equal length, and then uses constant-time equality.

That last detail matters. If the comparison fails, failure timing should not expose token shape.

const CSRF_COOKIE = 'csrf_token';

export async function verifyCsrf(request: NextRequest) {
  const cookieToken = (await cookies()).get(CSRF_COOKIE)?.value;
  const headerToken = request.headers.get('x-csrf-token');

  if (!cookieToken || !headerToken) return false;
  if (cookieToken.length !== headerToken.length) return false;

  // Constant-time compare to avoid leaking info via timing
  return crypto.timingSafeEqual(
    Buffer.from(cookieToken),
    Buffer.from(headerToken)
  );
}

How the client gets the token

The /v1/me endpoint returns the current user, if one exists, plus csrfToken. The client calls that endpoint on load, keeps the token in memory, and attaches it to every write request. This keeps the write path explicit and avoids hidden state in random components.

Why this fails in real life

Most failures are ordinary browser problems. A new origin is missing from the allowlist. One request forgets the CSRF header. Cookies stop being sent because http and https got mixed. A write runs before the client has loaded /v1/me.

That is why I like a narrow flow. Once the request path is fixed, debugging is dull and direct.

Loading chart...

Read latency, p50

Loading chart...

Most read wins came from doing less work. The service returns bodyHtml instead of rendering on each client load. Likes are grouped in one query. Response shapes stay consistent, so the frontend does not need follow-up fetches for ordinary list views.

Read latency, p95

Loading chart...

The tail tells the truth. Spikes usually point to cold starts, slow database setup, or an unexpectedly large thread without a proper limit. Those are the numbers worth watching, because readers notice tail latency first.

Write latency

Loading chart...

Writes cost more because they include markdown render, sanitize work, and a full set of checks. That is fine. Reads should stay cheap on a blog. A slightly heavier write path is a good trade if the common path stays fast.

Error rate

Loading chart...

The bump on day 4 matches the type of issue I would expect from an RSS fetch failure, an allowlist mismatch after a domain change, or a client request that forgot credentials: 'include'.

Problems I hit in practice

The hard parts were not SQL. Most issues came from browser state and cross-origin rules.

Cookies across multiple origins

The main questions were simple. Is the blog on HTTPS. Is the service on HTTPS. Is the browser sending credentials. Most random 401s are not random. A cookie failed to cross the boundary you expected.

CSRF token ordering

If the frontend posts before it fetches /v1/me, the request fails. That is the right behavior, though the failure feels surprising until you remember the write path is supposed to stay strict.

Allowlist drift

This is routine. Add a new preview domain, forget the allowlist update, then watch writes fail. The fix is small. The lesson is that origin policy needs a single source of truth.

Rate limits in multi-instance setups

The current rate limiter is an in-memory map. That is fine on one instance. A shared store such as Redis becomes the next step once the service runs across multiple instances.

Closing

This project stays small by keeping identity simple, sessions server-side, writes guarded, markdown sanitized, and moderation built in from the start. That is the whole goal. I want a comments service that does the job, stays understandable, and does not turn routine maintenance into a weekend project.

Where to next?

Search

Popular Tags

Quick Links

Building a Small Simple Comment System

Overview

What small means here

Architecture, start with the request flow

Data model

Soft delete and cleanup

Thread resolution stays RSS-gated on purpose

Why PKCE

The return path problem

Mutation gating

The CSRF mechanism

How the client gets the token

Why this fails in real life

Read latency, p50

Read latency, p95

Write latency

Error rate

Problems I hit in practice

Cookies across multiple origins

CSRF token ordering

Allowlist drift

Rate limits in multi-instance setups

Closing