CraftWeb
Loader image Loader image Loader image Loader image Loader image Loader image Loader image
0 %
Loading

SiteSpector

From a URL to a 25-section SEO+UX audit PDF in 3 minutes

SEO+UX audit SaaS FastAPI Next.js 16 Postgres
Qdrant Gemini WeasyPrint

/ Overview

A multi-tenant SEO+UX audit SaaS — FastAPI orchestration on top of Screaming Frog, Lighthouse, Senuto and a Qdrant-powered RAG over Gemini embeddings. Paints a 25-section white-label PDF in three minutes.

See it live

/ Project Details

Category: AI SaaS · own product · agency tool

Engagement: In-house product — CraftWeb Labs

Role: Full-stack · AI · infrastructure

Timeline: 2026

/ Problem

Site audits in 2026 are still a copy-paste graveyard. Screaming Frog locally, Lighthouse separately, rankings from Senuto into a Google Doc, screenshots into Slides, 60-page PDF three days later. Half the SaaS competitors solve crawling and ignore strategy. The other half generate beautiful reports nobody acts on.

/ TL;DR

2–3 min

URL → downloadable PDF, vs half a day of manual audit.

25+ sections

Technical, CWV, keywords, backlinks, AI strategy — each its own Jinja2 partial + matplotlib charts.

6 personas

Built-in personas, AI context tuned per role per audit.

30 cr/audit

Stripe metering across 4 plans (50–2000 cr/mo).

/ Vision

One audit, one button, one PDF. Strategy is the product. Multi-tenant by default. The execution plan ships to the client — not the raw crawl.

/ Principles

One audit, one PDF

Paste a URL, wait 3 minutes, download a white-labelled report.

Strategy is the product

Raw data is the price of entry. What ships is an execution plan — 9 areas analysed, prioritised tasks, code snippets per recommendation.

Multi-tenant by default

Workspaces > projects > audits, three-level hierarchy enforced by Postgres RLS from day one. Stripe credits cover the whole team.

/ Engineering challenges

Lighthouse OOM on a 2-core box. Gemini 429s on the third concurrent audit. RAG race with vector-dimension drift. Jinja2 templates with 25+ ways to fail silently. Six production incidents, six named commits.

/ 01 — 06

01 — Lighthouse parallel OOM on 2-core VPS

Desktop + mobile in parallel on CPX21 → load avg 31, both timed out. Fix: sequential execution with 180 s timeouts, then VPS upgrade CPX21 → CPX42 (commit c34fc72).

02 — Gemini Embedding 429 ResourceExhausted

Batches of 100 chunks × 3072-dim vectors blew the TPM. Fix: drop batch to 10–20, exponential backoff with jitter, global asyncio.Semaphore shared across audits.

03 — RAG race + vector-dim drift in Qdrant

SSE generator shared session with request handler; embedding model swap collapsed 768 → 3072 silently. Fix: dedicated AsyncSessionLocal() + self_heal_rag validates dim on every write.

04 — Jinja2 PDF templates failing silently

29-file refactor broke template resolution. Senuto returns top_keywords as dict-or-list. Fix: _safe_extract() + as_list() + | safe boundaries in pdf/generator.py.

05 — Workspace hierarchy + RLS

Audits scoped to users → agencies wanted shared client projects + RBAC. Fix: three-level hierarchy in Postgres with RLS policies. One require_project_membership() dependency in FastAPI.

06 — Next.js landing OOM in 512 MB container

Static rendering of large MDX pages OOM-killed itself. Fix: NODE_OPTIONS=--max-old-space-size=512, multi-stage Dockerfile, separated landing/dashboard into independent services.

/ Stack

FastAPI orchestrator on top of Screaming Frog + Lighthouse + Senuto + Gemini + Qdrant. One Hetzner CPX42 VPS. One docker compose up -d.

/ Tech

Next.js 16 + React 19

Server Components for landing, client components for dashboard. One codebase, three route groups.

FastAPI + Pydantic + async SQLAlchemy

Native async for parallel audit fan-out; Pydantic v2 validates AI boundary + Stripe webhooks.

PostgreSQL 16 + pgvector + RLS

Multi-tenant enforced at DB. Vector column lives next to data it indexes.

Qdrant (per-audit collection)

One collection per audit means dimension drift never poisons another audit.

Screaming Frog CLI + Lighthouse

Pro crawler in Docker image + two Lighthouse runs (desktop + mobile) parallel since CPX42 upgrade.

Senuto API

Polish-market visibility, backlinks, keyword positions, AI Overviews — what Ahrefs doesn't have.

Google Gemini + embeddings

Analysis + embeddings from same vendor — one quota, one SDK to mock.

WeasyPrint + Jinja2 (25+ templates)

HTML/CSS → A4 with running headers. SVG charts embed directly. matplotlib for chart rendering.

/ Engineering decisions

Five ADRs that shaped the platform. Container queries instead of media queries. SSE not WebSockets. Workspace > Project > Audit hierarchy with RLS.

/ 5 ADRs

ADR-037 — Container queries

Sidebar panels respond to container width, not viewport — dashboard works in every split-screen.

ADR-038 — SSE for AI chat + RAG indexing

Not WebSockets. FastAPI's EventSourceResponse + browser's native EventSource reconnect for free.

ADR-039 — Batch embedding 10–20 chunks per call

Sweet spot empirically against Gemini TPM.

ADR-040 — Vector dim validation + auto-recreate

Model swaps stopped being scary.

ADR-041 — Workspace > Project > Audit + RLS

Privilege-escalation surface collapsed into one policy file.

/ Results

Full audit in 2–3 minutes. Six personas, three report types, 25+ sections per PDF. One VPS, one Caddy config, one docker compose.

/ Numbers

2–3 min audit

Screaming Frog + Lighthouse desk+mob + Senuto + Gemini + PDF.

25+ sections

Each with own Jinja2 template + matplotlib charts.

6 / 3 personas

Built-in personas / report types — same audit, six AI tones, three audience tiers.

30 cr/audit

Stripe metering across 4 plans (50–2000 cr/mo).

SEO+UX audit
FastAPI
Next.js 16
Postgres + RLS
Qdrant
Gemini
WeasyPrint
Screaming Frog
Lighthouse
Stripe