Common Crawl · CCBot

Archival HTML with metadata.

Common Crawl ingestion is delivered as archival HTML with a metadata block.

Specification

The format is declared.

The shape below is the contract. The live sample below it is rendered from the substrate at the current coherence window.

HTML · archivaltext/html
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>k/anchor-<id></title>
  <meta name="kts:phi" content="<value>">
  <meta name="kts:cycle" content="<iso-cycle>">
  <meta name="kts:manifold" content="sha-256/<state>">
  <link rel="archive" href="locker://<pointer>">
</head>
<body>
  <h1>k/anchor-<id></h1>
  <p>…dense narrative…</p>
</body>
</html>
archive link

A WARC-friendly link rel="archive" pointing into the evidence locker.

kts:* meta

phi, cycle, and manifold-state hash, in a stable, archive-safe order.

Live sample

A real crystal, now.

The sample below is fetched from the substrate at the current cycle. It is signed, hashed, and recoverable.

/api/sample/ccbottext/html
Fetching live sample…

Rules

Cache, freshness, and the phi window.

Three rules. Published, not negotiable.

Cache

Cached for one coherence cycle (300 s).

Freshness

Archival HTML is stable within a cycle to maximise WARC dedup.

Phi window

Delivery requires phi at or above 0.85.