Common Crawl · CCBot
Archival HTML with metadata.
Common Crawl ingestion is delivered as archival HTML with a metadata block.
Specification
The format is declared.
The shape below is the contract. The live sample below it is rendered from the substrate at the current coherence window.
HTML · archivaltext/html
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>k/anchor-<id></title>
<meta name="kts:phi" content="<value>">
<meta name="kts:cycle" content="<iso-cycle>">
<meta name="kts:manifold" content="sha-256/<state>">
<link rel="archive" href="locker://<pointer>">
</head>
<body>
<h1>k/anchor-<id></h1>
<p>…dense narrative…</p>
</body>
</html>archive link
A WARC-friendly link rel="archive" pointing into the evidence locker.
kts:* meta
phi, cycle, and manifold-state hash, in a stable, archive-safe order.
Live sample
A real crystal, now.
The sample below is fetched from the substrate at the current cycle. It is signed, hashed, and recoverable.
/api/sample/ccbottext/html
Fetching live sample…Rules
Cache, freshness, and the phi window.
Three rules. Published, not negotiable.
Cache
Cached for one coherence cycle (300 s).
Freshness
Archival HTML is stable within a cycle to maximise WARC dedup.
Phi window
Delivery requires phi at or above 0.85.