Documentation

Crawlspace docs

A local-first, self-hostable SEO audit tool. Crawl a site, run focused inspection modules, and get a prioritized list of issues with page-level evidence. This page covers what it does and how to drive it from the UI, public JSON API, or remote MCP.

Overview

Crawlspace walks a site within configurable depth and page caps, runs inspection modules on each page, and streams findings back while the crawl is still running. Results are grouped by module and severity so the output stays actionable instead of turning into dashboard noise.

Crawlspace is local-first and self-hostable. Audits run on your machine by default, and the same audit engine also powers the remote JSON API and MCP endpoint.

Core features

  • Live crawl stream

    Findings arrive over SSE while the crawl is still in flight.

  • Five inspection modules

    Content, technical, meta/links/images, and performance checks.

  • Sitemap coverage

    Optional sitemap discovery with orphan and coverage summary.

  • Raw vs rendered diff

    Compare HTML SEO fields against the rendered DOM when browser mode is on.

  • Screenshots

    Optional Playwright capture per page for visual review.

  • Markdown export

    Structured remediation report, readable by humans and LLMs.

  • Link classification

    Broken links are typed; low-value redirect noise is de-emphasized.

  • Grouped findings

    Issues cluster by module and severity so fixes stay ordered.

Typical workflow

  1. 01 Enter a URL on the home screen, set depth and page caps, and pick which modules to run.
  2. 02 Optionally enable sitemap discovery and screenshot capture (browser mode).
  3. 03 Watch findings arrive live. The log panel records every step the crawler takes.
  4. 04 Triage results by module, mark findings as excluded if they are not real issues, then export a Markdown report.

JSON API

The public HTTP API exposes a single JSON endpoint for one-shot audits. It returns the final audit result shape and is suited to scripts and integrations that do not need progressive updates.

GET /api/audit-json One-shot JSON
Non-streaming. Runs the audit and returns the final result as JSON. Good for scripts and integrations that do not need progressive updates.

Query parameters

ParamDefaultNotes
urlRequired. Must be http(s).
depth3Capped at 5.
maxPages100Capped at 500.
modulesallComma-separated module ids.
sitemaptruePass false to skip.

Example: /api/audit-json?url=https://example.com&depth=2

MCP server

Crawlspace speaks the Model Context Protocol over remote Streamable HTTP, so external clients can run audits as a tool call. The supported endpoint is https://seo-crawler.naregt.dev/mcp. The tool is crawlspace.run_audit.

Client config

{
  "mcpServers": {
    "crawlspace": {
      "type": "http",
      "url": "https://seo-crawler.naregt.dev/mcp"
    }
  }
}

The MCP path is independent of the UI and the JSON API, but it reuses the same audit engine and HTML-only contract.

Current constraints / Limitations

  • v1 MCP runs one bounded synchronous audit per request.
  • v1 Public JSON API and MCP are capped at 5 concurrent audits total, with 1 active audit per client/IP.
  • v1 Server-side caps on depth (5) and page count (500). The JSON endpoint uses the same caps.
  • v1 The MCP tool and the JSON endpoint are HTML-only. Browser mode and screenshots are only available in the interactive UI.
  • note Heuristic rules (title length and similar) are reported conservatively. Severity labels represent confidence, not absolute correctness.

Free · local-first SEO audits · self-hostable · remote API/MCP available

crawlspace · naregt.dev