denoizev1 · beta

Raw HTML is noise for an LLM. Denoize fixes that.

One API call. Point us at a page, a PDF, a doc — get back clean Markdown, structured metadata, and chunks already sized for your RAG pipeline.

Get an API key →See a real response

no credit card · 1,000 req/mo free

before / after

The HTML goes in. The noise goes away.

raw.html48 KB

<!DOCTYPE html>
<html><head><script src="gtag.js"></script>
<script>dataLayer.push({...})</script>
<meta property="og:title" content="...">
... 180 more lines ...
</head><body>
<nav class="site-nav">...</nav>
<div class="cookie-banner">We value your privacy...</div>
<aside class="sidebar-ads">...</aside>
<article>
  <h1>RAG for Large Language Models: A Survey</h1>
  <p>Large Language Models showcase impressive...</p>
</article>
<footer>... share buttons, comments widget ...</footer>
<script>/* 30 more trackers */</script>
</body></html>

extract

↓ extract ↓

response.json200 · 1.8 KB · 412 tokens

{
  "url":         "https://arxiv.org/abs/2312.10997",
  "kind":        "html",
  "contentType": "text/html; charset=utf-8",
  "rendered":    false,
  "metadata": {
    "title":       "RAG for Large Language Models: A Survey",
    "description": "A survey of retrieval-augmented generation...",
    "author":      "Gao, Xiong, Gao, et al.",
    "siteName":    "arXiv",
    "image":       null,
    "publishedAt": "2023-12-18",
    "language":    "en",
    "canonical":   "https://arxiv.org/abs/2312.10997"
  },
  "markdown": "# RAG for Large Language Models: A Survey\n\nLarge Language Models showcase impressive capabilities but encounter challenges like hallucination...",
  "chunks": [
    { "index": 0, "text": "# RAG for Large Language Models...", "charCount": 2043, "estimatedTokens": 509 },
    { "index": 1, "text": "Retrieval-Augmented Generation...", "charCount": 1954, "estimatedTokens": 487 },
    … 8 more
  ],
  "stats": { "chars": 1847, "estimatedTokens": 412, "chunkCount": 10 },
  "cached":      false
}

how it works

Three steps. That's the whole thing.

POST a URL

Any http(s) URL. No schema, no scraping config.

We fetch and render

Plain fetch first, headless Chromium fallback for JS-heavy pages.

Get Markdown + chunks

Boilerplate stripped, metadata normalized, chunks sized for your embedding model.

pricing

Pay for what you use. No subscription.

Credits never expire. Each API call uses 1 credit.

Free

forever

—25 free credits at signup
—HTML + PDF + text
—Headless browser fallback
—REST API access

Start free →

Starter

$4.99

100 credits

$0.05 / request

—Credits never expire
—Everything in Free
—MCP server access
—Cache hits free (24h)
—Email support

Buy Starter →

popular

Standard

$14.99

500 credits

$0.03 / request

—Credits never expire
—Everything in Starter
—Priority rendering

Buy Standard →

faq

Obvious questions.

How is this different from a scraper I could build?+

We do the unglamorous parts: Readability-class content extraction, paragraph-aware Markdown conversion, automatic browser fallback only when needed, Redis cache shared across customers. Fifteen minutes of work in one fetch() call.

Does it handle JavaScript-rendered pages?+

Yes. We try a plain fetch first (fast, cheap) and spin up a headless Chromium only when the Markdown comes back thin or the page looks like an empty SPA shell. You don't pick — we do.

What about PDFs?+

Parsed with unpdf (pdf.js under the hood). Per-page Markdown plus document metadata: author, title, language, page count. OCR for scanned PDFs is on the roadmap.

What's the latency?+

500–2000 ms for a fresh plain fetch. 2–5 s when Chromium is involved. Paid plans get cached responses (under 50 ms, zero credits) for any URL extracted in the past 24 hours.

What chunk size should I use?+

Default is 512 tokens with 50 overlap — a good match for OpenAI and Cohere embeddings. Override chunk.size and chunk.overlap per request if your retriever expects different dimensions.

Can I plug this into Claude Desktop?+

Yes — MCP is included in Starter and Standard. Drop one block into your Claude Desktop / Cursor / Windsurf config with your API key as a Bearer token — no local install, no npm package to maintain. Free accounts can still use the REST API; MCP unlocks the moment you buy any credit pack.

Think of a URL you want in your RAG pipeline. Now try it.

Get an API key →