Documentation

Everything you need to run, configure, and extend PolySwarm.

Installation

Requirements

  • Python 3.11+
  • An API key from Anthropic, OpenAI, or a local Ollama instance

Install

git clone https://github.com/defidaddydavid/polyswarm.git
cd polyswarm
pip install -r requirements.txt
cp .env.example .env

Configuration

All configuration is via .env file or environment variables.

VariableDefaultDescription
LLM_PROVIDERanthropicLLM provider: anthropic, openai, or ollama
ANTHROPIC_API_KEYAnthropic API key
OPENAI_API_KEYOpenAI API key (if using openai provider)
OLLAMA_BASE_URLhttp://localhost:11434Ollama server URL
MODEL_FASTautoFast model for agent estimates (e.g. claude-sonnet-4-20250514, gpt-4o-mini)
MODEL_DEEPautoPowerful model for synthesis and aggregation narratives
DEBATE_ROUNDS2Number of debate rounds per forecast
SWARM_SIZE12Number of agents in the swarm

First Forecast

python main.py forecast "Will BTC close above $100k before June 2026?" --odds 0.42

The --odds flag is optional. If provided, PolySwarm calculates the edge between the swarm's probability and the market odds.

Forecast Mode

Forecast mode handles binary questions — anything with a YES/NO resolution.

How it works

  1. Context injection — 23 live data sources are fetched in parallel
  2. Round 1 — each agent independently forms a probability estimate + reasoning
  3. Round 2+ — agents see each other's estimates and can update or defend
  4. Analysis Pipeline — 26 mathematical methods: classical aggregation (Bayesian, extremized, SP, LogOP, Cooke's, meta-probability, neutral pivot, MC, bootstrap, coherence), advanced analysis (Dempster-Shafer, copula, MCMC, KDE, conformal, optimal transport, stacking), game theory (herding, cascades, Nash, scoring rules, agreement), information theory (MI, transfer entropy, redundancy), meta-analysis (Shapley attribution, HMM regime detection, calibration curves)
  5. Edge calculation — if market odds are provided, the edge is computed for all methods

Scenario Mode

Scenario mode answers: "If X happens, what follows?"

python main.py scenario "Fed announces emergency rate cut" --context "BTC at $87k"

Output includes

  • Each agent's immediate reaction and specific actions
  • Sentiment shift per agent (-1.0 to +1.0)
  • Estimated price impact percentage
  • Aggregated crowd sentiment and consensus
  • Generated crowd narrative
  • 3-4 second-order effects

Agents

PolySwarm ships with 12 agents. Each has:

  • Persona — their identity and expertise
  • Information focus — what data they prioritize
  • Bias profile — documented blind spots
  • Base confidence — how confident they typically are
  • Memory — in-memory list of observations accumulated during a single session

Note on memory persistence: Agent memory is currently in-memory only. It accumulates observations across multiple forecast/scenario calls within the same Python process (e.g., when running the API server), but resets when the process restarts. Redis-backed persistent memory is on the roadmap.

Agents are defined in agents/personas.py. See Custom Personas to add your own.

Calibration

Every forecast is stored in a SQLite database. When a market resolves, call resolve to compute Brier scores.

# Resolve a forecast
python main.py resolve "Will BTC hit $100k?" --outcome 1.0

# View calibration leaderboard
python main.py calibration

Brier score: mean((forecast - outcome)²). Lower = better. Perfect = 0.0, random = 0.25.

Agents with better calibration automatically receive higher weight in future forecasts via get_calibration_weights(). Calibration weights feed into Cooke's Classical Model for performance-based expert weighting.

Context Engine

Before any forecast or scenario, PolySwarm fetches live data from 23 sources via a modular plugin registry (ThreadPoolExecutor, 12 workers, 15s timeout per source).

# See exactly what data agents receive (outputs formatted text sections):
python main.py context

# With a specific question (adds Polymarket/Manifold market search):
python main.py context "Will BTC hit $100k?"

The context command outputs formatted text sections for each data category: prices, funding rates, options data, DeFi metrics, sentiment, and news headlines. This is the exact data string injected into each agent's prompt.

Bayesian Updating

Module: core/bayesian.py

Instead of simple averaging, PolySwarm treats each agent's estimate as evidence and updates a prior belief using Bayes' theorem. The key innovation is information-theoretic weighting: agents whose estimates are more surprising (higher KL divergence from the current posterior) carry more weight in the update.

How it works

  1. Start with a prior (market odds if available, otherwise 0.5)
  2. For each agent, compute the KL divergence between their estimate and the current posterior
  3. Weight the Bayesian update by confidence × (1 + KL divergence)
  4. Apply the likelihood ratio update: P(H|E) = P(E|H) × P(H) / P(E)

Output fields

  • bayesian_probability — the posterior probability after all agent updates
  • bayesian_shift — difference between Bayesian posterior and simple mean
  • entropy — Shannon entropy of the posterior (bits). Lower = more certain
  • information_gain — entropy reduction from prior to posterior (bits)
from core.bayesian import bayesian_aggregate
result = bayesian_aggregate(estimates, prior=0.42)
# result["bayesian_probability"] → 0.487
# result["information_gain"] → 0.031 bits

Extremized Aggregation (IARPA)

Module: core/extremize.py

Based on research from the IARPA ACE forecasting tournament (Satopää et al. 2014, Baron et al. 2014, Tetlock 2015). Simple averages of forecaster probabilities are systematically under-confident. Extremizing pushes the aggregate away from 50% toward 0 or 1, correcting this bias.

How it works

  1. Transform agent probabilities to log-odds space: logit(p) = log(p / (1-p))
  2. Compute confidence-weighted mean in log-odds space
  3. Apply extremizing factor d > 1: logit(p_ext) = d × mean(logit(p_i))
  4. Transform back to probability space via inverse logit

Extremizing factor estimation

The factor d is automatically estimated from forecaster diversity (coefficient of variation of log-odds). Higher diversity (more independent agents) warrants stronger extremizing (d ≈ 2.5), while correlated agents get mild adjustment (d ≈ 1.0-1.2).

from core.extremize import extremize
result = extremize(estimates)
# result["extremized_probability"] → 0.513
# result["extremizing_factor"] → 1.82
# result["shift"] → +0.026

Logarithmic Opinion Pool

Module: core/opinion_pool.py

Unlike linear opinion pools (simple weighted averages), logarithmic pools combine probabilities multiplicatively in log space. This satisfies the "external Bayesianity" property: if all agents are Bayesian with different priors but shared likelihood, LogOP recovers the correct posterior.

Formula

p_logop = Π(p_i^w_i) / [Π(p_i^w_i) + Π((1-p_i)^w_i)]

Property: Extreme beliefs have strong influence. One agent at 0.01 pulls the aggregate much harder than in a linear pool.

from core.opinion_pool import logarithmic_opinion_pool
result = logarithmic_opinion_pool(estimates)
# result["logop_probability"] → 0.478
# result["logop_shift"] → -0.009

Cooke's Classical Model

Module: core/opinion_pool.py

From expert elicitation theory (Cooke, 1991). Weights forecasters by both calibration AND informativeness. Agents below a calibration threshold are pruned entirely.

Weight formula

weight_i = calibration_score_i × informativeness_i
  • Calibration — from historical Brier scores (if available) or confidence consistency
  • Informativeness — Shannon information content. More extreme + confident = more informative
  • Threshold — agents below alpha (default 0.05) are disqualified
from core.opinion_pool import cooke_classical_weights
result = cooke_classical_weights(estimates, calibration_scores)
# result["cooke_probability"] → 0.491
# result["n_qualified"] → 10

Monte Carlo Simulation

Module: core/statistics.py

Runs 5,000 simulations where each agent's estimate is treated as a beta distribution parameterized by their probability and confidence. Higher confidence = tighter distribution around their estimate.

Output fields

  • percentiles — p5, p10, p25, p50, p75, p90, p95
  • thresholds — P(>25%), P(>50%), P(>75%)
  • mean, std, skew — distribution moments
from core.statistics import monte_carlo_scenarios
mc = monte_carlo_scenarios(estimates, n_simulations=5000)
# mc["percentiles"]["p50"] → 0.472
# mc["thresholds"]["P(>50%)"] → 0.43

Bootstrap Confidence Intervals

Module: core/statistics.py

Resamples the agent estimates 1,000 times with replacement to produce a 95% confidence interval for the swarm's probability estimate.

Limitation: With only 12 agents, bootstrap CIs have limited statistical power. The interval reflects resampling noise as much as genuine uncertainty. Interpret widths above 15% as "high disagreement" rather than precise uncertainty bounds.

from core.statistics import bootstrap_confidence_interval
ci = bootstrap_confidence_interval(estimates)
# ci["ci_lower"] → 0.381, ci["ci_upper"] → 0.573
# ci["ci_width"] → 0.192

Game Theory

Module: core/game_theory.py

Three game-theoretic analyses run on every forecast to detect pathological agent behavior.

Herding Detection

Uses the Herfindahl-Hirschman Index (HHI) adapted for probability space. Buckets agent estimates into quintiles and measures concentration.

Limitation: With 12 agents across 5 quintiles, the expected count is 2.4 per bucket — the HHI signal can be noisy. Herding scores >0.5 are reliable; 0.3-0.5 should be interpreted cautiously.

  • herding_score — 0 to 1 (>0.3 = herding detected)
  • herd_direction — bullish/bearish/neutral when herding is detected
  • contrarians — agents >1.5 std devs from mean (potential signal)

Information Cascades

Compares Round 1 and Round 2 estimates to detect whether agents converged genuinely or just followed the herd.

  • convergence_rate — fraction of agents that moved toward consensus
  • cascade_detected — true when convergence > 70%
  • flipped_agents — agents who changed sides (crossed 50%)
  • biggest_movers — top 3 agents by absolute probability shift

Nash Equilibrium

Checks whether the consensus is stable — would any agent benefit from deviating? An agent is a potential deviator if their estimate is >15% from consensus AND their confidence is >70%.

  • stable — true if no agent has incentive to deviate
  • stability_score — 0 to 1 (fraction of non-deviating agents)
  • potential_deviators — list of agents with high-confidence divergent views

Unstable equilibria signal low-confidence forecasts where the consensus may not hold.

CLI Reference

forecast

python main.py forecast QUESTION [--odds FLOAT] [--rounds INT] [--size INT]
ArgumentRequiredDescription
QUESTIONYesThe binary question to forecast
--oddsNoCurrent market odds (0.0-1.0) for edge calculation
--roundsNoNumber of debate rounds (default: 2)
--sizeNoNumber of agents (default: 12)

scenario

python main.py scenario DESCRIPTION [--context TEXT]
ArgumentRequiredDescription
DESCRIPTIONYesThe scenario to simulate
--contextNoAdditional context for the scenario

context

python main.py context [QUESTION]

Shows all live data sources the agents see. Outputs formatted text sections for each data category. Optionally pass a question for Polymarket/Manifold market-specific search.

resolve

python main.py resolve QUESTION --outcome FLOAT

Resolve a forecast. Outcome: 1.0 = YES, 0.0 = NO. Updates Brier scores.

calibration

python main.py calibration

Display Brier scores for all agents and the swarm aggregate.

serve

python main.py serve [--host TEXT] [--port INT]

Start the FastAPI server. Default: 0.0.0.0:8000. Interactive docs at /docs.

Note: The API currently has no authentication layer. Do not expose to untrusted networks without a reverse proxy.

REST API Reference

Start the server with python main.py serve. Interactive Swagger docs at /docs.

POST /forecast

{
  "question": "Will BTC hit $150k in 2026?",
  "market_odds": 0.25,
  "rounds": 2
}

Returns: swarm probability (7 aggregation methods), Bayesian posterior, extremized estimate, MC percentiles, confidence interval, consensus score, individual agent estimates, herding analysis, Nash stability, edge vs market.

POST /scenario

{
  "scenario": "Binance announces insolvency",
  "context": "BTC at $87k, market euphoric"
}

Returns: per-agent reactions, sentiment shifts, price impact, crowd narrative, secondary effects.

POST /resolve

{
  "question": "Will BTC hit $150k in 2026?",
  "outcome": 1.0
}

GET /calibration

Returns swarm and per-agent Brier scores.

GET /agents

Returns list of all agents with their persona definitions.

Multi-LLM Setup

Anthropic (default)

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
MODEL_FAST=claude-sonnet-4-20250514

OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
MODEL_FAST=gpt-4o-mini

Ollama (local, free)

LLM_PROVIDER=ollama
OLLAMA_BASE_URL=http://localhost:11434
MODEL_FAST=llama3.1:8b

Any OpenAI-compatible (Groq, Together, etc.)

LLM_PROVIDER=openai
OPENAI_API_KEY=gsk_...
OPENAI_BASE_URL=https://api.groq.com/openai/v1
MODEL_FAST=llama-3.1-70b-versatile

Docker

# One command
ANTHROPIC_API_KEY=your_key docker compose up

# Or with OpenAI
LLM_PROVIDER=openai OPENAI_API_KEY=your_key docker compose up

API available at http://localhost:8000. Docs at /docs.

Custom Personas

Edit agents/personas.py to add your own agent:

{
    "agent_id": "meme_trader",
    "persona": "Meme Coin Degen",
    "description": "Trades exclusively based on meme momentum...",
    "information_focus": "Twitter mentions, TikTok trends, Telegram pump groups",
    "bias_profile": "Extremely short-term. Will ape into anything trending.",
    "base_confidence": 0.45,
}

Add it to PERSONA_DEFINITIONS and it will be included in the next swarm run.

Data Sources

PolySwarm uses a modular plugin architecture for data sources. All sources are fetched in parallel via ThreadPoolExecutor (12 workers, 15s timeout). Sources auto-discover API keys from environment variables.

Registered Sources

SourceModuleAPI KeyData
binance_spotdata/sources/market.pyBTC, ETH, SOL spot prices + 24h change
binance_moversdata/sources/market.pyTop 5 gainers/losers in last 24h
coingecko_marketdata/sources/market.pyCOINGECKO_API_KEY*Global market overview, BTC dominance
funding_ratesdata/sources/derivatives_src.pyFunding rates for 6 major perpetuals
open_interestdata/sources/derivatives_src.pyBTC/ETH open interest
long_short_ratiodata/sources/derivatives_src.pyGlobal long/short ratio
top_tradersdata/sources/derivatives_src.pyTop trader position ratios
liquidationsdata/sources/derivatives_src.pyRecent forced liquidations
btc_optionsdata/sources/derivatives_src.pyDeribit BTC options OI, put/call ratio
btc_voldata/sources/derivatives_src.pyDeribit BTC historical volatility
btc_mempooldata/sources/onchain_src.pyBTC mempool size, pending tx count
btc_feesdata/sources/onchain_src.pyBTC recommended fee levels
btc_hashratedata/sources/onchain_src.pyBTC network hashrate
eth_gasdata/sources/onchain_src.pyETHERSCAN_API_KEY*ETH gas prices (safe/propose/fast)
defi_tvldata/sources/onchain_src.pyTotal DeFi TVL
top_protocolsdata/sources/onchain_src.pyTop 10 DeFi protocols by TVL
stablecoin_supplydata/sources/onchain_src.pyTotal stablecoin supply
fear_greeddata/sources/sentiment.pyFear & Greed Index (7-day history)
reddit_cryptodata/sources/sentiment.pyr/cryptocurrency hot posts
coingecko_trendingdata/sources/sentiment.pyTrending coins on CoinGecko
cryptopanicdata/sources/sentiment.pyCRYPTOPANIC_API_KEY*Latest crypto headlines
polymarketdata/sources/prediction.pyTrending markets + question-specific search
manifolddata/sources/prediction.pyTrending markets + question-specific search

* Optional — source works without key but may be rate-limited. Set in .env.

Adding a Custom Data Source

Create a new file in data/sources/ and it will be auto-discovered:

# data/sources/my_custom.py
from data.registry import DataSource, register_source

@register_source
class MyCustomSource(DataSource):
    name = "my_custom"
    category = "sentiment"
    description = "My custom data feed"
    requires_key = "MY_API_KEY"  # optional, set None if no key needed
    priority = 50                # 0-100, higher = shown first
    timeout = 10                 # request timeout in seconds

    def fetch(self) -> str:
        resp = self.http.get("https://api.example.com/data")
        data = resp.json()
        return f"My Custom Data: {data['value']}"

    # Optional: implement search() for question-specific queries
    def search(self, question: str) -> str:
        resp = self.http.get(f"https://api.example.com/search?q={question}")
        return f"Related: {resp.json()}"

That's it. No other files need modification. The registry auto-discovers the source on import.

Source Filtering

Whitelist specific sources via environment variable:

POLYSWARM_SOURCES=binance_spot,funding_rates,fear_greed,btc_options

When set, only the listed sources will be fetched. When unset, all available sources run.

CLI Source Status

python main.py sources

Shows all registered sources with their status (ready, needs key, disabled) and priority.

Known Limitations

  • Bootstrap CI at n=12 — Bootstrapping over 12 agent estimates has limited statistical power. CIs reflect resampling noise as much as genuine uncertainty.
  • HHI herding at n=12 — The Herfindahl-Hirschman Index with 5 quintiles and 12 agents gives coarse resolution. Herding scores between 0.3-0.5 should be interpreted cautiously.
  • Agent memory is in-memory — Memory accumulates within a session but resets on process restart. Redis persistence is planned.
  • Optional API authentication — Set POLYSWARM_API_KEY env var to require X-API-Key header on protected endpoints. Without it, the API is open.
  • Current-state data only — Data sources provide snapshots, not historical trends (no 7d/30d deltas).
  • BTC/ETH/SOL only — Spot price data covers three assets. Add more by creating a new source in data/sources/.
  • SP meta-predictions are estimated — Without explicit second-order predictions from agents, the Surprisingly Popular algorithm uses a confidence-based heuristic.
  • No streaming — Multi-round debates can take 30-60+ seconds. No real-time progress output in the API (CLI shows per-agent status).