How the engine works, end to end
Hunter-X reverse-engineers repeatable on-chain playbooks — crews coordinating wallets so a token pumps — and turns the survivors into per-crew, out-of-sample-validated signals. Nothing about the pattern shape is hardcoded: the vocabulary is discovered by compression and steered by whether it predicts price. Here is every stage.
Two principles drive every choice.
A shape that recurs is worth naming (it compresses the data). We find recurring structure with no template — fan-ins, chains, co-buys, round-trips, deployer chains all fall out. But compression alone finds the most frequent structure, which on Solana is bots and arbitrage…
…so we keep only structure that carries information about the outcome (forward return). The outcome is a score, never a template. Bots recur constantly and predict nothing → dropped. A funding→buy shape that precedes moves → kept.
Every program's rows become instructions in a single graph — no token-centering, no 'episode'.
Nodes are wallets and mints. Edges are the four instruction types of the language — each swap edge carries which DEX it happened on, so buy@pumpswap and buy@meteora_damm are genuinely different. A day is ~38 M edges.
The example (right): a deployer launches a token, a boss funds three wallets, they buy on PumpSwap, and someone sells. This little structure — funding + coordinated buys — is exactly the kind of playbook we hunt, but we never tell the engine to look for it.
Finding every recurring subgraph is exponential. Weisfeiler–Lehman color refinement makes it linear.
Instead of enumerating subgraphs (the exponential trap that forces people to hardcode a template), we give each node a color that fingerprints its neighborhood, refined outward one hop per round. Two nodes with the same color have the same local structure. Step through it:
So popular tokens don't blow up the search.
A token bought by 10,000 wallets would give one node a giant neighborhood. We cap each node's expansion and cut known hubs (exchanges, routers, from the labeled-wallet list) plus anything with runaway degree. Neighborhoods stay small, so refinement and reconstruction stay cheap — and a crew's real structure (a handful of funded wallets) is preserved.
Turn the winning colors into readable, non-redundant productions.
A pattern's outcome is the token's move measured AFTER the pattern forms.
Each occurrence is stamped with the slot it formed (s); its forward return is measured over [s, s+horizon] — only the future. A fund→buy that forms before a pump scores positive. A sell that happens after the pump forms late, has no move ahead of it, and scores ~0 — so it can never sneak into a predictive rule. The exit/dump playbook is still discoverable, just correctly labeled as descriptive.
'This crew doing this shape ⇒ pump 77% of the time,' not a population average.
Each shape is tested at General (anyone) and at every resolved Crew(c) (wallets sharing a dominant funder). A shape can be worthless in the population yet strong for one crew — that's the CATE we want.
- • empirical-Bayes shrinkage toward the global mean (small samples don't get to be heroes)
- • walk-forward OOS — must still work on the held-out later slots
- • two-stage FDR — screen by base-rate lift, then Benjamini–Hochberg only over survivors
What a survivor looks like (from a real run).
rule fan_in_5_buy_meteora_crew177595 {
when {
transfer(?w0 -> ?w1) // boss funds a wallet
transfer(?w0 -> ?w2) // …five of them
buy@meteora_damm(?w1 -> ?t) // the funded wallets buy the token
buy@meteora_damm(?w2 -> ?t) // …on Meteora
ground actor in Crew(177595) // ← specific crew, not the population
}
expect forward_return(?t, horizon = 5000 slots) >= +30%
action buy(?t) at detection ; exit at +30% or horizon
}
// support 517 tokens · hit 77% · OOS +8% · FDR q≈0Every kept rule renders as this exact program (Monaco on the rule page), plus its motif graph, its resolved crew wallets, and clickable supporting episodes you can replay step by step.
Near-linear front-end → a tiny expensive back-end. A full day in under 8 minutes.
The costly steps (exact canonicalization, grammar, per-crew scoring, FDR) run on the ~64 survivors, not the 38 M edges. Building the graph, hub-cut, and WL are all linear passes, parallelized across the box's cores.
Where each piece runs. ClickHouse is only a results sink — all analytics happen on the box.