Documentation
Everything you need to run, configure, and extend PolySwarm.
Installation
Requirements
- Python 3.11+
- An API key from Anthropic, OpenAI, or a local Ollama instance
Install
git clone https://github.com/defidaddydavid/polyswarm.git
cd polyswarm
pip install -r requirements.txt
cp .env.example .env
Configuration
All configuration is via .env file or environment variables.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER | anthropic | LLM provider: anthropic, openai, or ollama |
ANTHROPIC_API_KEY | — | Anthropic API key |
OPENAI_API_KEY | — | OpenAI API key (if using openai provider) |
OLLAMA_BASE_URL | http://localhost:11434 | Ollama server URL |
MODEL_FAST | auto | Fast model for agent estimates (e.g. claude-sonnet-4-20250514, gpt-4o-mini) |
MODEL_DEEP | auto | Powerful model for synthesis and aggregation narratives |
DEBATE_ROUNDS | 2 | Number of debate rounds per forecast |
SWARM_SIZE | 12 | Number of agents in the swarm |
First Forecast
python main.py forecast "Will BTC close above $100k before June 2026?" --odds 0.42
The --odds flag is optional. If provided, PolySwarm calculates the edge between the swarm's probability and the market odds.
Forecast Mode
Forecast mode handles binary questions — anything with a YES/NO resolution.
How it works
- Context injection — 23 live data sources are fetched in parallel
- Round 1 — each agent independently forms a probability estimate + reasoning
- Round 2+ — agents see each other's estimates and can update or defend
- Analysis Pipeline — 26 mathematical methods: classical aggregation (Bayesian, extremized, SP, LogOP, Cooke's, meta-probability, neutral pivot, MC, bootstrap, coherence), advanced analysis (Dempster-Shafer, copula, MCMC, KDE, conformal, optimal transport, stacking), game theory (herding, cascades, Nash, scoring rules, agreement), information theory (MI, transfer entropy, redundancy), meta-analysis (Shapley attribution, HMM regime detection, calibration curves)
- Edge calculation — if market odds are provided, the edge is computed for all methods
Scenario Mode
Scenario mode answers: "If X happens, what follows?"
python main.py scenario "Fed announces emergency rate cut" --context "BTC at $87k"
Output includes
- Each agent's immediate reaction and specific actions
- Sentiment shift per agent (-1.0 to +1.0)
- Estimated price impact percentage
- Aggregated crowd sentiment and consensus
- Generated crowd narrative
- 3-4 second-order effects
Agents
PolySwarm ships with 12 agents. Each has:
- Persona — their identity and expertise
- Information focus — what data they prioritize
- Bias profile — documented blind spots
- Base confidence — how confident they typically are
- Memory — in-memory list of observations accumulated during a single session
Note on memory persistence: Agent memory is currently in-memory only. It accumulates observations across multiple forecast/scenario calls within the same Python process (e.g., when running the API server), but resets when the process restarts. Redis-backed persistent memory is on the roadmap.
Agents are defined in agents/personas.py. See Custom Personas to add your own.
Calibration
Every forecast is stored in a SQLite database. When a market resolves, call resolve to compute Brier scores.
# Resolve a forecast
python main.py resolve "Will BTC hit $100k?" --outcome 1.0
# View calibration leaderboard
python main.py calibration
Brier score: mean((forecast - outcome)²). Lower = better. Perfect = 0.0, random = 0.25.
Agents with better calibration automatically receive higher weight in future forecasts via get_calibration_weights(). Calibration weights feed into Cooke's Classical Model for performance-based expert weighting.
Context Engine
Before any forecast or scenario, PolySwarm fetches live data from 23 sources via a modular plugin registry (ThreadPoolExecutor, 12 workers, 15s timeout per source).
# See exactly what data agents receive (outputs formatted text sections):
python main.py context
# With a specific question (adds Polymarket/Manifold market search):
python main.py context "Will BTC hit $100k?"
The context command outputs formatted text sections for each data category: prices, funding rates, options data, DeFi metrics, sentiment, and news headlines. This is the exact data string injected into each agent's prompt.
Bayesian Updating
Module: core/bayesian.py
Instead of simple averaging, PolySwarm treats each agent's estimate as evidence and updates a prior belief using Bayes' theorem. The key innovation is information-theoretic weighting: agents whose estimates are more surprising (higher KL divergence from the current posterior) carry more weight in the update.
How it works
- Start with a prior (market odds if available, otherwise 0.5)
- For each agent, compute the KL divergence between their estimate and the current posterior
- Weight the Bayesian update by
confidence × (1 + KL divergence) - Apply the likelihood ratio update:
P(H|E) = P(E|H) × P(H) / P(E)
Output fields
bayesian_probability— the posterior probability after all agent updatesbayesian_shift— difference between Bayesian posterior and simple meanentropy— Shannon entropy of the posterior (bits). Lower = more certaininformation_gain— entropy reduction from prior to posterior (bits)
from core.bayesian import bayesian_aggregate
result = bayesian_aggregate(estimates, prior=0.42)
# result["bayesian_probability"] → 0.487
# result["information_gain"] → 0.031 bits
Extremized Aggregation (IARPA)
Module: core/extremize.py
Based on research from the IARPA ACE forecasting tournament (Satopää et al. 2014, Baron et al. 2014, Tetlock 2015). Simple averages of forecaster probabilities are systematically under-confident. Extremizing pushes the aggregate away from 50% toward 0 or 1, correcting this bias.
How it works
- Transform agent probabilities to log-odds space:
logit(p) = log(p / (1-p)) - Compute confidence-weighted mean in log-odds space
- Apply extremizing factor
d > 1:logit(p_ext) = d × mean(logit(p_i)) - Transform back to probability space via inverse logit
Extremizing factor estimation
The factor d is automatically estimated from forecaster diversity (coefficient of variation of log-odds). Higher diversity (more independent agents) warrants stronger extremizing (d ≈ 2.5), while correlated agents get mild adjustment (d ≈ 1.0-1.2).
from core.extremize import extremize
result = extremize(estimates)
# result["extremized_probability"] → 0.513
# result["extremizing_factor"] → 1.82
# result["shift"] → +0.026
Surprisingly Popular Algorithm
Module: core/surprisingly_popular.py
Based on Prelec et al. (2017, Nature). The correct answer is often "more popular than people predict." If 60% of forecasters say YES but they predicted 75% would say YES, the surprisingly popular answer is actually NO.
Key insight
Agents may not fully incorporate their private information into first-order estimates, but they DO leak it through their predictions about what others will say. The SP algorithm exploits this meta-cognitive information.
How it works
- Collect actual probability estimates from all agents
- Estimate meta-predictions: what each agent thinks the crowd average will be
- SP score = actual mean - predicted mean
- Adjust the aggregate in the "surprisingly popular" direction
Note: Meta-predictions are currently estimated from agent confidence patterns rather than explicitly elicited.
from core.surprisingly_popular import surprisingly_popular
result = surprisingly_popular(estimates)
# result["sp_adjusted_probability"] → 0.498
# result["sp_score"] → -0.023
# result["surprise_direction"] → "no"
Logarithmic Opinion Pool
Module: core/opinion_pool.py
Unlike linear opinion pools (simple weighted averages), logarithmic pools combine probabilities multiplicatively in log space. This satisfies the "external Bayesianity" property: if all agents are Bayesian with different priors but shared likelihood, LogOP recovers the correct posterior.
Formula
p_logop = Π(p_i^w_i) / [Π(p_i^w_i) + Π((1-p_i)^w_i)]
Property: Extreme beliefs have strong influence. One agent at 0.01 pulls the aggregate much harder than in a linear pool.
from core.opinion_pool import logarithmic_opinion_pool
result = logarithmic_opinion_pool(estimates)
# result["logop_probability"] → 0.478
# result["logop_shift"] → -0.009
Cooke's Classical Model
Module: core/opinion_pool.py
From expert elicitation theory (Cooke, 1991). Weights forecasters by both calibration AND informativeness. Agents below a calibration threshold are pruned entirely.
Weight formula
weight_i = calibration_score_i × informativeness_i
- Calibration — from historical Brier scores (if available) or confidence consistency
- Informativeness — Shannon information content. More extreme + confident = more informative
- Threshold — agents below alpha (default 0.05) are disqualified
from core.opinion_pool import cooke_classical_weights
result = cooke_classical_weights(estimates, calibration_scores)
# result["cooke_probability"] → 0.491
# result["n_qualified"] → 10
Monte Carlo Simulation
Module: core/statistics.py
Runs 5,000 simulations where each agent's estimate is treated as a beta distribution parameterized by their probability and confidence. Higher confidence = tighter distribution around their estimate.
Output fields
percentiles— p5, p10, p25, p50, p75, p90, p95thresholds— P(>25%), P(>50%), P(>75%)mean,std,skew— distribution moments
from core.statistics import monte_carlo_scenarios
mc = monte_carlo_scenarios(estimates, n_simulations=5000)
# mc["percentiles"]["p50"] → 0.472
# mc["thresholds"]["P(>50%)"] → 0.43
Bootstrap Confidence Intervals
Module: core/statistics.py
Resamples the agent estimates 1,000 times with replacement to produce a 95% confidence interval for the swarm's probability estimate.
Limitation: With only 12 agents, bootstrap CIs have limited statistical power. The interval reflects resampling noise as much as genuine uncertainty. Interpret widths above 15% as "high disagreement" rather than precise uncertainty bounds.
from core.statistics import bootstrap_confidence_interval
ci = bootstrap_confidence_interval(estimates)
# ci["ci_lower"] → 0.381, ci["ci_upper"] → 0.573
# ci["ci_width"] → 0.192
Game Theory
Module: core/game_theory.py
Three game-theoretic analyses run on every forecast to detect pathological agent behavior.
Herding Detection
Uses the Herfindahl-Hirschman Index (HHI) adapted for probability space. Buckets agent estimates into quintiles and measures concentration.
Limitation: With 12 agents across 5 quintiles, the expected count is 2.4 per bucket — the HHI signal can be noisy. Herding scores >0.5 are reliable; 0.3-0.5 should be interpreted cautiously.
herding_score— 0 to 1 (>0.3 = herding detected)herd_direction— bullish/bearish/neutral when herding is detectedcontrarians— agents >1.5 std devs from mean (potential signal)
Information Cascades
Compares Round 1 and Round 2 estimates to detect whether agents converged genuinely or just followed the herd.
convergence_rate— fraction of agents that moved toward consensuscascade_detected— true when convergence > 70%flipped_agents— agents who changed sides (crossed 50%)biggest_movers— top 3 agents by absolute probability shift
Nash Equilibrium
Checks whether the consensus is stable — would any agent benefit from deviating? An agent is a potential deviator if their estimate is >15% from consensus AND their confidence is >70%.
stable— true if no agent has incentive to deviatestability_score— 0 to 1 (fraction of non-deviating agents)potential_deviators— list of agents with high-confidence divergent views
Unstable equilibria signal low-confidence forecasts where the consensus may not hold.
CLI Reference
forecast
python main.py forecast QUESTION [--odds FLOAT] [--rounds INT] [--size INT]| Argument | Required | Description |
|---|---|---|
QUESTION | Yes | The binary question to forecast |
--odds | No | Current market odds (0.0-1.0) for edge calculation |
--rounds | No | Number of debate rounds (default: 2) |
--size | No | Number of agents (default: 12) |
scenario
python main.py scenario DESCRIPTION [--context TEXT]| Argument | Required | Description |
|---|---|---|
DESCRIPTION | Yes | The scenario to simulate |
--context | No | Additional context for the scenario |
context
python main.py context [QUESTION]Shows all live data sources the agents see. Outputs formatted text sections for each data category. Optionally pass a question for Polymarket/Manifold market-specific search.
resolve
python main.py resolve QUESTION --outcome FLOATResolve a forecast. Outcome: 1.0 = YES, 0.0 = NO. Updates Brier scores.
calibration
python main.py calibrationDisplay Brier scores for all agents and the swarm aggregate.
serve
python main.py serve [--host TEXT] [--port INT]Start the FastAPI server. Default: 0.0.0.0:8000. Interactive docs at /docs.
Note: The API currently has no authentication layer. Do not expose to untrusted networks without a reverse proxy.
REST API Reference
Start the server with python main.py serve. Interactive Swagger docs at /docs.
POST /forecast
{
"question": "Will BTC hit $150k in 2026?",
"market_odds": 0.25,
"rounds": 2
}Returns: swarm probability (7 aggregation methods), Bayesian posterior, extremized estimate, MC percentiles, confidence interval, consensus score, individual agent estimates, herding analysis, Nash stability, edge vs market.
POST /scenario
{
"scenario": "Binance announces insolvency",
"context": "BTC at $87k, market euphoric"
}Returns: per-agent reactions, sentiment shifts, price impact, crowd narrative, secondary effects.
POST /resolve
{
"question": "Will BTC hit $150k in 2026?",
"outcome": 1.0
}GET /calibration
Returns swarm and per-agent Brier scores.
GET /agents
Returns list of all agents with their persona definitions.
Multi-LLM Setup
Anthropic (default)
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
MODEL_FAST=claude-sonnet-4-20250514OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
MODEL_FAST=gpt-4o-miniOllama (local, free)
LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
MODEL_FAST=llama3.1:8bAny OpenAI-compatible (Groq, Together, etc.)
LLM_PROVIDER=openai
OPENAI_API_KEY=gsk_...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
MODEL_FAST=llama-3.1-70b-versatileDocker
# One command
ANTHROPIC_API_KEY=your_key docker compose up
# Or with OpenAI
LLM_PROVIDER=openai OPENAI_API_KEY=your_key docker compose upAPI available at http://localhost:8000. Docs at /docs.
Custom Personas
Edit agents/personas.py to add your own agent:
{
"agent_id": "meme_trader",
"persona": "Meme Coin Degen",
"description": "Trades exclusively based on meme momentum...",
"information_focus": "Twitter mentions, TikTok trends, Telegram pump groups",
"bias_profile": "Extremely short-term. Will ape into anything trending.",
"base_confidence": 0.45,
}Add it to PERSONA_DEFINITIONS and it will be included in the next swarm run.
Data Sources
PolySwarm uses a modular plugin architecture for data sources. All sources are fetched in parallel via ThreadPoolExecutor (12 workers, 15s timeout). Sources auto-discover API keys from environment variables.
Registered Sources
| Source | Module | API Key | Data |
|---|---|---|---|
| binance_spot | data/sources/market.py | — | BTC, ETH, SOL spot prices + 24h change |
| binance_movers | data/sources/market.py | — | Top 5 gainers/losers in last 24h |
| coingecko_market | data/sources/market.py | COINGECKO_API_KEY* | Global market overview, BTC dominance |
| funding_rates | data/sources/derivatives_src.py | — | Funding rates for 6 major perpetuals |
| open_interest | data/sources/derivatives_src.py | — | BTC/ETH open interest |
| long_short_ratio | data/sources/derivatives_src.py | — | Global long/short ratio |
| top_traders | data/sources/derivatives_src.py | — | Top trader position ratios |
| liquidations | data/sources/derivatives_src.py | — | Recent forced liquidations |
| btc_options | data/sources/derivatives_src.py | — | Deribit BTC options OI, put/call ratio |
| btc_vol | data/sources/derivatives_src.py | — | Deribit BTC historical volatility |
| btc_mempool | data/sources/onchain_src.py | — | BTC mempool size, pending tx count |
| btc_fees | data/sources/onchain_src.py | — | BTC recommended fee levels |
| btc_hashrate | data/sources/onchain_src.py | — | BTC network hashrate |
| eth_gas | data/sources/onchain_src.py | ETHERSCAN_API_KEY* | ETH gas prices (safe/propose/fast) |
| defi_tvl | data/sources/onchain_src.py | — | Total DeFi TVL |
| top_protocols | data/sources/onchain_src.py | — | Top 10 DeFi protocols by TVL |
| stablecoin_supply | data/sources/onchain_src.py | — | Total stablecoin supply |
| fear_greed | data/sources/sentiment.py | — | Fear & Greed Index (7-day history) |
| reddit_crypto | data/sources/sentiment.py | — | r/cryptocurrency hot posts |
| coingecko_trending | data/sources/sentiment.py | — | Trending coins on CoinGecko |
| cryptopanic | data/sources/sentiment.py | CRYPTOPANIC_API_KEY* | Latest crypto headlines |
| polymarket | data/sources/prediction.py | — | Trending markets + question-specific search |
| manifold | data/sources/prediction.py | — | Trending markets + question-specific search |
* Optional — source works without key but may be rate-limited. Set in .env.
Adding a Custom Data Source
Create a new file in data/sources/ and it will be auto-discovered:
# data/sources/my_custom.py
from data.registry import DataSource, register_source
@register_source
class MyCustomSource(DataSource):
name = "my_custom"
category = "sentiment"
description = "My custom data feed"
requires_key = "MY_API_KEY" # optional, set None if no key needed
priority = 50 # 0-100, higher = shown first
timeout = 10 # request timeout in seconds
def fetch(self) -> str:
resp = self.http.get("https://api.example.com/data")
data = resp.json()
return f"My Custom Data: {data['value']}"
# Optional: implement search() for question-specific queries
def search(self, question: str) -> str:
resp = self.http.get(f"https://api.example.com/search?q={question}")
return f"Related: {resp.json()}"
That's it. No other files need modification. The registry auto-discovers the source on import.
Source Filtering
Whitelist specific sources via environment variable:
POLYSWARM_SOURCES=binance_spot,funding_rates,fear_greed,btc_options
When set, only the listed sources will be fetched. When unset, all available sources run.
CLI Source Status
python main.py sources
Shows all registered sources with their status (ready, needs key, disabled) and priority.
Known Limitations
- Bootstrap CI at n=12 — Bootstrapping over 12 agent estimates has limited statistical power. CIs reflect resampling noise as much as genuine uncertainty.
- HHI herding at n=12 — The Herfindahl-Hirschman Index with 5 quintiles and 12 agents gives coarse resolution. Herding scores between 0.3-0.5 should be interpreted cautiously.
- Agent memory is in-memory — Memory accumulates within a session but resets on process restart. Redis persistence is planned.
- Optional API authentication — Set
POLYSWARM_API_KEYenv var to requireX-API-Keyheader on protected endpoints. Without it, the API is open. - Current-state data only — Data sources provide snapshots, not historical trends (no 7d/30d deltas).
- BTC/ETH/SOL only — Spot price data covers three assets. Add more by creating a new source in
data/sources/. - SP meta-predictions are estimated — Without explicit second-order predictions from agents, the Surprisingly Popular algorithm uses a confidence-based heuristic.
- No streaming — Multi-round debates can take 30-60+ seconds. No real-time progress output in the API (CLI shows per-agent status).