Documentation

Everything you need to run, configure, and extend PolySwarm.

Installation

Requirements

Python 3.11+
An API key from Anthropic, OpenAI, or a local Ollama instance

Install

git clone https://github.com/defidaddydavid/polyswarm.git
cd polyswarm
pip install -r requirements.txt
cp .env.example .env

Configuration

All configuration is via .env file or environment variables.

Variable	Default	Description
`LLM_PROVIDER`	anthropic	LLM provider: anthropic, openai, or ollama
`ANTHROPIC_API_KEY`	—	Anthropic API key
`OPENAI_API_KEY`	—	OpenAI API key (if using openai provider)
`OLLAMA_BASE_URL`	http://localhost:11434	Ollama server URL
`MODEL_FAST`	auto	Fast model for agent estimates (e.g. claude-sonnet-4-20250514, gpt-4o-mini)
`MODEL_DEEP`	auto	Powerful model for synthesis and aggregation narratives
`DEBATE_ROUNDS`	2	Number of debate rounds per forecast
`SWARM_SIZE`	12	Number of agents in the swarm

First Forecast

python main.py forecast "Will BTC close above $100k before June 2026?" --odds 0.42

The --odds flag is optional. If provided, PolySwarm calculates the edge between the swarm's probability and the market odds.

Forecast Mode

Forecast mode handles binary questions — anything with a YES/NO resolution.

How it works

Context injection — 23 live data sources are fetched in parallel
Round 1 — each agent independently forms a probability estimate + reasoning
Round 2+ — agents see each other's estimates and can update or defend
Analysis Pipeline — 26 mathematical methods: classical aggregation (Bayesian, extremized, SP, LogOP, Cooke's, meta-probability, neutral pivot, MC, bootstrap, coherence), advanced analysis (Dempster-Shafer, copula, MCMC, KDE, conformal, optimal transport, stacking), game theory (herding, cascades, Nash, scoring rules, agreement), information theory (MI, transfer entropy, redundancy), meta-analysis (Shapley attribution, HMM regime detection, calibration curves)
Edge calculation — if market odds are provided, the edge is computed for all methods

Scenario Mode

Scenario mode answers: "If X happens, what follows?"

python main.py scenario "Fed announces emergency rate cut" --context "BTC at $87k"

Output includes

Each agent's immediate reaction and specific actions
Sentiment shift per agent (-1.0 to +1.0)
Estimated price impact percentage
Aggregated crowd sentiment and consensus
Generated crowd narrative
3-4 second-order effects

Agents

PolySwarm ships with 12 agents. Each has:

Persona — their identity and expertise
Information focus — what data they prioritize
Bias profile — documented blind spots
Base confidence — how confident they typically are
Memory — in-memory list of observations accumulated during a single session

Note on memory persistence: Agent memory is currently in-memory only. It accumulates observations across multiple forecast/scenario calls within the same Python process (e.g., when running the API server), but resets when the process restarts. Redis-backed persistent memory is on the roadmap.

Agents are defined in agents/personas.py. See Custom Personas to add your own.

Calibration

Every forecast is stored in a SQLite database. When a market resolves, call resolve to compute Brier scores.

# Resolve a forecast
python main.py resolve "Will BTC hit $100k?" --outcome 1.0

# View calibration leaderboard
python main.py calibration

Brier score: mean((forecast - outcome)²). Lower = better. Perfect = 0.0, random = 0.25.

Agents with better calibration automatically receive higher weight in future forecasts via get_calibration_weights(). Calibration weights feed into Cooke's Classical Model for performance-based expert weighting.

Context Engine

Before any forecast or scenario, PolySwarm fetches live data from 23 sources via a modular plugin registry (ThreadPoolExecutor, 12 workers, 15s timeout per source).

# See exactly what data agents receive (outputs formatted text sections):
python main.py context

# With a specific question (adds Polymarket/Manifold market search):
python main.py context "Will BTC hit $100k?"

The context command outputs formatted text sections for each data category: prices, funding rates, options data, DeFi metrics, sentiment, and news headlines. This is the exact data string injected into each agent's prompt.

Bayesian Updating

Module: core/bayesian.py

Instead of simple averaging, PolySwarm treats each agent's estimate as evidence and updates a prior belief using Bayes' theorem. The key innovation is information-theoretic weighting: agents whose estimates are more surprising (higher KL divergence from the current posterior) carry more weight in the update.

How it works

Start with a prior (market odds if available, otherwise 0.5)
For each agent, compute the KL divergence between their estimate and the current posterior
Weight the Bayesian update by confidence × (1 + KL divergence)
Apply the likelihood ratio update: P(H|E) = P(E|H) × P(H) / P(E)

Output fields

bayesian_probability — the posterior probability after all agent updates
bayesian_shift — difference between Bayesian posterior and simple mean
entropy — Shannon entropy of the posterior (bits). Lower = more certain
information_gain — entropy reduction from prior to posterior (bits)

from core.bayesian import bayesian_aggregate
result = bayesian_aggregate(estimates, prior=0.42)
# result["bayesian_probability"] → 0.487
# result["information_gain"] → 0.031 bits

Extremized Aggregation (IARPA)

Module: core/extremize.py

Based on research from the IARPA ACE forecasting tournament (Satopää et al. 2014, Baron et al. 2014, Tetlock 2015). Simple averages of forecaster probabilities are systematically under-confident. Extremizing pushes the aggregate away from 50% toward 0 or 1, correcting this bias.

How it works

Transform agent probabilities to log-odds space: logit(p) = log(p / (1-p))
Compute confidence-weighted mean in log-odds space
Apply extremizing factor d > 1: logit(p_ext) = d × mean(logit(p_i))
Transform back to probability space via inverse logit

Extremizing factor estimation

The factor d is automatically estimated from forecaster diversity (coefficient of variation of log-odds). Higher diversity (more independent agents) warrants stronger extremizing (d ≈ 2.5), while correlated agents get mild adjustment (d ≈ 1.0-1.2).

from core.extremize import extremize
result = extremize(estimates)
# result["extremized_probability"] → 0.513
# result["extremizing_factor"] → 1.82
# result["shift"] → +0.026

Surprisingly Popular Algorithm

Module: core/surprisingly_popular.py

Based on Prelec et al. (2017, Nature). The correct answer is often "more popular than people predict." If 60% of forecasters say YES but they predicted 75% would say YES, the surprisingly popular answer is actually NO.

Key insight

Agents may not fully incorporate their private information into first-order estimates, but they DO leak it through their predictions about what others will say. The SP algorithm exploits this meta-cognitive information.

How it works

Collect actual probability estimates from all agents
Estimate meta-predictions: what each agent thinks the crowd average will be
SP score = actual mean - predicted mean
Adjust the aggregate in the "surprisingly popular" direction

Note: Meta-predictions are currently estimated from agent confidence patterns rather than explicitly elicited.

from core.surprisingly_popular import surprisingly_popular
result = surprisingly_popular(estimates)
# result["sp_adjusted_probability"] → 0.498
# result["sp_score"] → -0.023
# result["surprise_direction"] → "no"

Logarithmic Opinion Pool

Module: core/opinion_pool.py

Unlike linear opinion pools (simple weighted averages), logarithmic pools combine probabilities multiplicatively in log space. This satisfies the "external Bayesianity" property: if all agents are Bayesian with different priors but shared likelihood, LogOP recovers the correct posterior.

Formula

p_logop = Π(p_i^w_i) / [Π(p_i^w_i) + Π((1-p_i)^w_i)]

Property: Extreme beliefs have strong influence. One agent at 0.01 pulls the aggregate much harder than in a linear pool.

from core.opinion_pool import logarithmic_opinion_pool
result = logarithmic_opinion_pool(estimates)
# result["logop_probability"] → 0.478
# result["logop_shift"] → -0.009

Cooke's Classical Model

Module: core/opinion_pool.py

From expert elicitation theory (Cooke, 1991). Weights forecasters by both calibration AND informativeness. Agents below a calibration threshold are pruned entirely.

Weight formula

weight_i = calibration_score_i × informativeness_i

Calibration — from historical Brier scores (if available) or confidence consistency
Informativeness — Shannon information content. More extreme + confident = more informative
Threshold — agents below alpha (default 0.05) are disqualified

from core.opinion_pool import cooke_classical_weights
result = cooke_classical_weights(estimates, calibration_scores)
# result["cooke_probability"] → 0.491
# result["n_qualified"] → 10

Monte Carlo Simulation

Module: core/statistics.py

Runs 5,000 simulations where each agent's estimate is treated as a beta distribution parameterized by their probability and confidence. Higher confidence = tighter distribution around their estimate.

Output fields

percentiles — p5, p10, p25, p50, p75, p90, p95
thresholds — P(>25%), P(>50%), P(>75%)
mean, std, skew — distribution moments

from core.statistics import monte_carlo_scenarios
mc = monte_carlo_scenarios(estimates, n_simulations=5000)
# mc["percentiles"]["p50"] → 0.472
# mc["thresholds"]["P(>50%)"] → 0.43

Bootstrap Confidence Intervals

Module: core/statistics.py

Resamples the agent estimates 1,000 times with replacement to produce a 95% confidence interval for the swarm's probability estimate.

Limitation: With only 12 agents, bootstrap CIs have limited statistical power. The interval reflects resampling noise as much as genuine uncertainty. Interpret widths above 15% as "high disagreement" rather than precise uncertainty bounds.

from core.statistics import bootstrap_confidence_interval
ci = bootstrap_confidence_interval(estimates)
# ci["ci_lower"] → 0.381, ci["ci_upper"] → 0.573
# ci["ci_width"] → 0.192

Game Theory

Module: core/game_theory.py

Three game-theoretic analyses run on every forecast to detect pathological agent behavior.

Herding Detection

Uses the Herfindahl-Hirschman Index (HHI) adapted for probability space. Buckets agent estimates into quintiles and measures concentration.

Limitation: With 12 agents across 5 quintiles, the expected count is 2.4 per bucket — the HHI signal can be noisy. Herding scores >0.5 are reliable; 0.3-0.5 should be interpreted cautiously.

herding_score — 0 to 1 (>0.3 = herding detected)
herd_direction — bullish/bearish/neutral when herding is detected
contrarians — agents >1.5 std devs from mean (potential signal)

Information Cascades

Compares Round 1 and Round 2 estimates to detect whether agents converged genuinely or just followed the herd.

convergence_rate — fraction of agents that moved toward consensus
cascade_detected — true when convergence > 70%
flipped_agents — agents who changed sides (crossed 50%)
biggest_movers — top 3 agents by absolute probability shift

Nash Equilibrium

Checks whether the consensus is stable — would any agent benefit from deviating? An agent is a potential deviator if their estimate is >15% from consensus AND their confidence is >70%.

stable — true if no agent has incentive to deviate
stability_score — 0 to 1 (fraction of non-deviating agents)
potential_deviators — list of agents with high-confidence divergent views

Unstable equilibria signal low-confidence forecasts where the consensus may not hold.

CLI Reference

`forecast`

python main.py forecast QUESTION [--odds FLOAT] [--rounds INT] [--size INT]

Argument	Required	Description
`QUESTION`	Yes	The binary question to forecast
`--odds`	No	Current market odds (0.0-1.0) for edge calculation
`--rounds`	No	Number of debate rounds (default: 2)
`--size`	No	Number of agents (default: 12)

`scenario`

python main.py scenario DESCRIPTION [--context TEXT]

Argument	Required	Description
`DESCRIPTION`	Yes	The scenario to simulate
`--context`	No	Additional context for the scenario

`context`

python main.py context [QUESTION]

Shows all live data sources the agents see. Outputs formatted text sections for each data category. Optionally pass a question for Polymarket/Manifold market-specific search.

`resolve`

python main.py resolve QUESTION --outcome FLOAT

Resolve a forecast. Outcome: 1.0 = YES, 0.0 = NO. Updates Brier scores.

`calibration`

python main.py calibration

Display Brier scores for all agents and the swarm aggregate.

`serve`

python main.py serve [--host TEXT] [--port INT]

Start the FastAPI server. Default: 0.0.0.0:8000. Interactive docs at /docs.

Note: The API currently has no authentication layer. Do not expose to untrusted networks without a reverse proxy.

REST API Reference

Start the server with python main.py serve. Interactive Swagger docs at /docs.

POST /forecast

{
  "question": "Will BTC hit $150k in 2026?",
  "market_odds": 0.25,
  "rounds": 2
}

Returns: swarm probability (7 aggregation methods), Bayesian posterior, extremized estimate, MC percentiles, confidence interval, consensus score, individual agent estimates, herding analysis, Nash stability, edge vs market.

POST /scenario

{
  "scenario": "Binance announces insolvency",
  "context": "BTC at $87k, market euphoric"
}

Returns: per-agent reactions, sentiment shifts, price impact, crowd narrative, secondary effects.

POST /resolve

{
  "question": "Will BTC hit $150k in 2026?",
  "outcome": 1.0
}

GET /calibration

Returns swarm and per-agent Brier scores.

GET /agents

Returns list of all agents with their persona definitions.

Multi-LLM Setup

Anthropic (default)

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
MODEL_FAST=claude-sonnet-4-20250514

OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
MODEL_FAST=gpt-4o-mini

Ollama (local, free)

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
MODEL_FAST=llama3.1:8b

Any OpenAI-compatible (Groq, Together, etc.)

LLM_PROVIDER=openai
OPENAI_API_KEY=gsk_...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
MODEL_FAST=llama-3.1-70b-versatile

Docker

# One command
ANTHROPIC_API_KEY=your_key docker compose up

# Or with OpenAI
LLM_PROVIDER=openai OPENAI_API_KEY=your_key docker compose up

API available at http://localhost:8000. Docs at /docs.

Custom Personas

Edit agents/personas.py to add your own agent:

{
    "agent_id": "meme_trader",
    "persona": "Meme Coin Degen",
    "description": "Trades exclusively based on meme momentum...",
    "information_focus": "Twitter mentions, TikTok trends, Telegram pump groups",
    "bias_profile": "Extremely short-term. Will ape into anything trending.",
    "base_confidence": 0.45,
}

Add it to PERSONA_DEFINITIONS and it will be included in the next swarm run.

Data Sources

PolySwarm uses a modular plugin architecture for data sources. All sources are fetched in parallel via ThreadPoolExecutor (12 workers, 15s timeout). Sources auto-discover API keys from environment variables.

Registered Sources

Source	Module	API Key	Data
binance_spot	data/sources/market.py	—	BTC, ETH, SOL spot prices + 24h change
binance_movers	data/sources/market.py	—	Top 5 gainers/losers in last 24h
coingecko_market	data/sources/market.py	COINGECKO_API_KEY*	Global market overview, BTC dominance
funding_rates	data/sources/derivatives_src.py	—	Funding rates for 6 major perpetuals
open_interest	data/sources/derivatives_src.py	—	BTC/ETH open interest
long_short_ratio	data/sources/derivatives_src.py	—	Global long/short ratio
top_traders	data/sources/derivatives_src.py	—	Top trader position ratios
liquidations	data/sources/derivatives_src.py	—	Recent forced liquidations
btc_options	data/sources/derivatives_src.py	—	Deribit BTC options OI, put/call ratio
btc_vol	data/sources/derivatives_src.py	—	Deribit BTC historical volatility
btc_mempool	data/sources/onchain_src.py	—	BTC mempool size, pending tx count
btc_fees	data/sources/onchain_src.py	—	BTC recommended fee levels
btc_hashrate	data/sources/onchain_src.py	—	BTC network hashrate
eth_gas	data/sources/onchain_src.py	ETHERSCAN_API_KEY*	ETH gas prices (safe/propose/fast)
defi_tvl	data/sources/onchain_src.py	—	Total DeFi TVL
top_protocols	data/sources/onchain_src.py	—	Top 10 DeFi protocols by TVL
stablecoin_supply	data/sources/onchain_src.py	—	Total stablecoin supply
fear_greed	data/sources/sentiment.py	—	Fear & Greed Index (7-day history)
reddit_crypto	data/sources/sentiment.py	—	r/cryptocurrency hot posts
coingecko_trending	data/sources/sentiment.py	—	Trending coins on CoinGecko
cryptopanic	data/sources/sentiment.py	CRYPTOPANIC_API_KEY*	Latest crypto headlines
polymarket	data/sources/prediction.py	—	Trending markets + question-specific search
manifold	data/sources/prediction.py	—	Trending markets + question-specific search

* Optional — source works without key but may be rate-limited. Set in .env.

Adding a Custom Data Source

Create a new file in data/sources/ and it will be auto-discovered:

# data/sources/my_custom.py
from data.registry import DataSource, register_source

@register_source
class MyCustomSource(DataSource):
    name = "my_custom"
    category = "sentiment"
    description = "My custom data feed"
    requires_key = "MY_API_KEY"  # optional, set None if no key needed
    priority = 50                # 0-100, higher = shown first
    timeout = 10                 # request timeout in seconds

    def fetch(self) -> str:
        resp = self.http.get("https://api.example.com/data")
        data = resp.json()
        return f"My Custom Data: {data['value']}"

    # Optional: implement search() for question-specific queries
    def search(self, question: str) -> str:
        resp = self.http.get(f"https://api.example.com/search?q={question}")
        return f"Related: {resp.json()}"

That's it. No other files need modification. The registry auto-discovers the source on import.

Source Filtering

Whitelist specific sources via environment variable:

POLYSWARM_SOURCES=binance_spot,funding_rates,fear_greed,btc_options

When set, only the listed sources will be fetched. When unset, all available sources run.

CLI Source Status

python main.py sources

Shows all registered sources with their status (ready, needs key, disabled) and priority.

Known Limitations

Bootstrap CI at n=12 — Bootstrapping over 12 agent estimates has limited statistical power. CIs reflect resampling noise as much as genuine uncertainty.
HHI herding at n=12 — The Herfindahl-Hirschman Index with 5 quintiles and 12 agents gives coarse resolution. Herding scores between 0.3-0.5 should be interpreted cautiously.
Agent memory is in-memory — Memory accumulates within a session but resets on process restart. Redis persistence is planned.
Optional API authentication — Set POLYSWARM_API_KEY env var to require X-API-Key header on protected endpoints. Without it, the API is open.
Current-state data only — Data sources provide snapshots, not historical trends (no 7d/30d deltas).
BTC/ETH/SOL only — Spot price data covers three assets. Add more by creating a new source in data/sources/.
SP meta-predictions are estimated — Without explicit second-order predictions from agents, the Surprisingly Popular algorithm uses a confidence-based heuristic.
No streaming — Multi-round debates can take 30-60+ seconds. No real-time progress output in the API (CLI shows per-agent status).