│

Offline

│Paper P&L+0.00 ◎│Open0│Trades0

docs

How the engine works, end to end

Hunter-X reverse-engineers repeatable on-chain playbooks — crews coordinating wallets so a token pumps — and turns the survivors into per-crew, out-of-sample-validated signals. Nothing about the pattern shape is hardcoded: the vocabulary is discovered by compression and steered by whether it predicts price. Here is every stage.

events/day

~38M edges

WL colors

401,949,139

productions

kept rules

run time

< 8 min/day

00 · The core idea

Two principles drive every choice.

Compression = pattern discovery

A shape that recurs is worth naming (it compresses the data). We find recurring structure with no template — fan-ins, chains, co-buys, round-trips, deployer chains all fall out. But compression alone finds the most frequent structure, which on Solana is bots and arbitrage…

Information bottleneck = aim at alpha

…so we keep only structure that carries information about the outcome (forward return). The outcome is a score, never a template. Bots recur constantly and predict nothing → dropped. A funding→buy shape that precedes moves → kept.

01 · Ingest → one typed value-flow graph

Every program's rows become instructions in a single graph — no token-centering, no 'episode'.

Nodes are wallets and mints. Edges are the four instruction types of the language — each swap edge carries which DEX it happened on, so buy@pumpswap and buy@meteora_damm are genuinely different. A day is ~38 M edges.

transfer— SOL moved wallet→wallet (funding)

buy@dex— wallet bought a token on a DEX

sell@dex— wallet sold a token on a DEX

create@dex— deployer launched the token

The example (right): a deployer launches a token, a boss funds three wallets, they buy on PumpSwap, and someone sells. This little structure — funding + coordinated buys — is exactly the kind of playbook we hunt, but we never tell the engine to look for it.

02 · The key trick — mine by refining, not enumerating

Finding every recurring subgraph is exponential. Weisfeiler–Lehman color refinement makes it linear.

Instead of enumerating subgraphs (the exponential trap that forces people to hardcode a template), we give each node a color that fingerprints its neighborhood, refined outward one hop per round. Two nodes with the same color have the same local structure. Step through it:

Interactive: Weisfeiler–Lehman color refinement

Round 0 — every node is colored only by its type (wallet vs token). No structure yet.

⬤ The two gold-ringed buyers now share a color → the engine has discovered the shape fan-in(3)→buy as one production, with two occurrences.

Same color = same k-hop neighborhood ⇒ same recurring structure. Counting colors is O(rounds × edges) — linear — which is how we mine 100 M-edge graphs without ever enumerating subgraphs.

03 · Keep it bounded — hub-cut + capped neighborhoods

So popular tokens don't blow up the search.

A token bought by 10,000 wallets would give one node a giant neighborhood. We cap each node's expansion and cut known hubs (exchanges, routers, from the labeled-wallet list) plus anything with runaway degree. Neighborhoods stay small, so refinement and reconstruction stay cheap — and a crew's real structure (a handful of funded wallets) is preserved.

04 · Frequent colors → motif → dedup

Turn the winning colors into readable, non-redundant productions.

① frequency floor

Keep only colors seen ≥ τ times (dropped counts are logged — no silent caps).

② reconstruct + canonicalize

Unfold a color into its actual subgraph, compute an exact canonical key, and merge colors that reconstruct to the same shape.

③ per-token dedup

Emit one occurrence per (shape, token). This kills pseudo-replication — a pumped token can't be counted 50× through its neighbors.

05 · Score causally — and leakage solves itself

A pattern's outcome is the token's move measured AFTER the pattern forms.

Each occurrence is stamped with the slot it formed (s); its forward return is measured over [s, s+horizon] — only the future. A fund→buy that forms before a pump scores positive. A sell that happens after the pump forms late, has no move ahead of it, and scores ~0 — so it can never sneak into a predictive rule. The exit/dump playbook is still discoverable, just correctly labeled as descriptive.

06 · Ground per crew — specific, not average

'This crew doing this shape ⇒ pump 77% of the time,' not a population average.

The grounding lattice

Each shape is tested at General (anyone) and at every resolved Crew(c) (wallets sharing a dominant funder). A shape can be worthless in the population yet strong for one crew — that's the CATE we want.

Then it has to survive

• empirical-Bayes shrinkage toward the global mean (small samples don't get to be heroes)
• walk-forward OOS — must still work on the held-out later slots
• two-stage FDR — screen by base-rate lift, then Benjamini–Hochberg only over survivors

07 · The output — a deployable rule

What a survivor looks like (from a real run).

rule fan_in_5_buy_meteora_crew177595 {
  when {
    transfer(?w0 -> ?w1)           // boss funds a wallet
    transfer(?w0 -> ?w2)           //   …five of them
    buy@meteora_damm(?w1 -> ?t)    // the funded wallets buy the token
    buy@meteora_damm(?w2 -> ?t)    //   …on Meteora
    ground actor in Crew(177595)   // ← specific crew, not the population
  }
  expect forward_return(?t, horizon = 5000 slots) >= +30%
  action buy(?t) at detection ; exit at +30% or horizon
}
// support 517 tokens · hit 77% · OOS +8% · FDR q≈0

Every kept rule renders as this exact program (Monaco on the rule page), plus its motif graph, its resolved crew wallets, and clickable supporting episodes you can replay step by step.

08 · Why it fits the compute budget

Near-linear front-end → a tiny expensive back-end. A full day in under 8 minutes.

raw events (swaps + creations + transfers)

~38 M edges / day

WL colors considered (linear pass)

401,949,139

productions (frequent + canonical-deduped)

occurrence-episodes (one per shape×token)

2,544,672

rules kept (per-crew, OOS+, FDR-significant)

The costly steps (exact canonicalization, grammar, per-crew scoring, FDR) run on the ~64 survivors, not the 38 M edges. Building the graph, hub-cut, and WL are all linear passes, parallelized across the box's cores.

09 · The architecture

Where each piece runs. ClickHouse is only a results sink — all analytics happen on the box.

Sources

cronos_dev.swaps_wallet_nats (buys+sells, 7 DEXes) · cronos_dev.token_creation (deployer) — slot-indexed reads. SOL transfers — local parquet on HunterX.

ClickHouse + HunterX disk

▼

xp-converter

Normalizes every row to one UnifiedEvent {actor, counterparty, instrument, venue=dex, class} and publishes to a single topic. Dumb + parallel.

HunterX → Redpanda

▼

xp-mine — the funnel

Drains the topic + folds in the local transfers → ONE typed value-flow graph → hub-cut → WL refinement → frequent colors → reconstruct motif → per-token dedup → causal return.

HunterX (32c/34G)

▼

xp-score / backtest / detect

Grounds each shape per crew (CATE), empirical-Bayes shrinkage, walk-forward out-of-sample, two-stage FDR; simulates equity; compiles a detection feed.

HunterX

▼

ClickHouse hunterx.* + this UI

Every run is versioned by run_id and queried live by the visualizer (rules, episodes, grammar, backtest).

ClickHouse → Next.js :3939

Template-free · multi-DEX · per-crew · causal · out-of-sample. Explore it in Rules and Episodes.