SiteSpector
From a URL to a 25-section SEO+UX audit PDF in 3 minutes
/ Overview
A multi-tenant SEO+UX audit SaaS — FastAPI orchestration on top of Screaming Frog, Lighthouse, Senuto and a Qdrant-powered RAG over Gemini embeddings. Paints a 25-section white-label PDF in three minutes.
See it live/ Project Details
Category: AI SaaS · own product · agency tool
Engagement: In-house product — CraftWeb Labs
Role: Full-stack · AI · infrastructure
Timeline: 2026
Live: sitespector.app
/ Problem
Site audits in 2026 are still a copy-paste graveyard. Screaming Frog locally, Lighthouse separately, rankings from Senuto into a Google Doc, screenshots into Slides, 60-page PDF three days later. Half the SaaS competitors solve crawling and ignore strategy. The other half generate beautiful reports nobody acts on.
/ TL;DR
2–3 min
URL → downloadable PDF, vs half a day of manual audit.
25+ sections
Technical, CWV, keywords, backlinks, AI strategy — each its own Jinja2 partial + matplotlib charts.
6 personas
Built-in personas, AI context tuned per role per audit.
30 cr/audit
Stripe metering across 4 plans (50–2000 cr/mo).



/ Vision
One audit, one button, one PDF. Strategy is the product. Multi-tenant by default. The execution plan ships to the client — not the raw crawl.
/ Principles
One audit, one PDF
Paste a URL, wait 3 minutes, download a white-labelled report.
Strategy is the product
Raw data is the price of entry. What ships is an execution plan — 9 areas analysed, prioritised tasks, code snippets per recommendation.
Multi-tenant by default
Workspaces > projects > audits, three-level hierarchy enforced by Postgres RLS from day one. Stripe credits cover the whole team.
/ Engineering challenges
Lighthouse OOM on a 2-core box. Gemini 429s on the third concurrent audit. RAG race with vector-dimension drift. Jinja2 templates with 25+ ways to fail silently. Six production incidents, six named commits.
/ 01 — 06
01 — Lighthouse parallel OOM on 2-core VPS
Desktop + mobile in parallel on CPX21 → load avg 31, both timed out. Fix: sequential execution with 180 s timeouts, then VPS upgrade CPX21 → CPX42 (commit c34fc72).
02 — Gemini Embedding 429 ResourceExhausted
Batches of 100 chunks × 3072-dim vectors blew the TPM. Fix: drop batch to 10–20, exponential backoff with jitter, global asyncio.Semaphore shared across audits.
03 — RAG race + vector-dim drift in Qdrant
SSE generator shared session with request handler; embedding model swap collapsed 768 → 3072 silently. Fix: dedicated AsyncSessionLocal() + self_heal_rag validates dim on every write.
04 — Jinja2 PDF templates failing silently
29-file refactor broke template resolution. Senuto returns top_keywords as dict-or-list. Fix: _safe_extract() + as_list() + | safe boundaries in pdf/generator.py.
05 — Workspace hierarchy + RLS
Audits scoped to users → agencies wanted shared client projects + RBAC. Fix: three-level hierarchy in Postgres with RLS policies. One require_project_membership() dependency in FastAPI.
06 — Next.js landing OOM in 512 MB container
Static rendering of large MDX pages OOM-killed itself. Fix: NODE_OPTIONS=--max-old-space-size=512, multi-stage Dockerfile, separated landing/dashboard into independent services.

/ Stack
FastAPI orchestrator on top of Screaming Frog + Lighthouse + Senuto + Gemini + Qdrant.
One Hetzner CPX42 VPS. One docker compose up -d.
/ Tech
Next.js 16 + React 19
Server Components for landing, client components for dashboard. One codebase, three route groups.
FastAPI + Pydantic + async SQLAlchemy
Native async for parallel audit fan-out; Pydantic v2 validates AI boundary + Stripe webhooks.
PostgreSQL 16 + pgvector + RLS
Multi-tenant enforced at DB. Vector column lives next to data it indexes.
Qdrant (per-audit collection)
One collection per audit means dimension drift never poisons another audit.
Screaming Frog CLI + Lighthouse
Pro crawler in Docker image + two Lighthouse runs (desktop + mobile) parallel since CPX42 upgrade.
Senuto API
Polish-market visibility, backlinks, keyword positions, AI Overviews — what Ahrefs doesn't have.
Google Gemini + embeddings
Analysis + embeddings from same vendor — one quota, one SDK to mock.
WeasyPrint + Jinja2 (25+ templates)
HTML/CSS → A4 with running headers. SVG charts embed directly. matplotlib for chart rendering.
/ Engineering decisions
Five ADRs that shaped the platform. Container queries instead of media queries. SSE not WebSockets. Workspace > Project > Audit hierarchy with RLS.
/ 5 ADRs
ADR-037 — Container queries
Sidebar panels respond to container width, not viewport — dashboard works in every split-screen.
ADR-038 — SSE for AI chat + RAG indexing
Not WebSockets. FastAPI's EventSourceResponse + browser's native EventSource reconnect for free.
ADR-039 — Batch embedding 10–20 chunks per call
Sweet spot empirically against Gemini TPM.
ADR-040 — Vector dim validation + auto-recreate
Model swaps stopped being scary.
ADR-041 — Workspace > Project > Audit + RLS
Privilege-escalation surface collapsed into one policy file.
/ Results
Full audit in 2–3 minutes. Six personas, three report types, 25+ sections per PDF.
One VPS, one Caddy config, one docker compose.
/ Numbers
2–3 min audit
Screaming Frog + Lighthouse desk+mob + Senuto + Gemini + PDF.
25+ sections
Each with own Jinja2 template + matplotlib charts.
6 / 3 personas
Built-in personas / report types — same audit, six AI tones, three audience tiers.
30 cr/audit
Stripe metering across 4 plans (50–2000 cr/mo).










