Four nations, four LLMs, one strait

Peakstone Labs is a quant shop. We build factor models, risk parity portfolios, and AI-powered fundamental analysis. Serious stuff.

This is not that.

Last weekend the Strait of Hormuz crisis was all over the news — Iran blockading shipping, US strikes, oil past $110. We were watching the same four or five analysts on Twitter all confidently predicting completely different outcomes. So we thought: what if we gave each country its own AI brain and just… let them figure it out?

Forty-eight hours later we had a working prototype. Then we couldn't stop playing with it. Now it's open source.

What is it

A real-time geopolitical sandbox. Four countries — Iran, US, Israel, Gulf States — each controlled by an independent LLM agent. You adjust a chaos slider, pick some crisis modifiers, and hit start. The agents read the situation, reason about their constraints, and take actions. A market agent watches everything and updates the oil price. Then the next round starts, and everyone reacts to what just happened.

No script. No predetermined outcomes. The agents surprise us regularly.

Hormuz Agent Sandbox screenshot

How it works

Three things define a simulation:

1. Souls

Each country has a "soul" — a markdown file that defines who it is. YAML frontmatter for the numbers (aggressiveness, economic tolerance, red lines), markdown body for the strategic doctrine.

Agent In a nutshell
Iran "Time is your friend." Asymmetric warfare, proxy networks, Strait blockade as leverage. New Supreme Leader still consolidating power. Can't win a conventional war, doesn't plan to fight one.
US Stuck between three bad options: retreat (embarrassment), status quo (bleeding money), escalation (casualties before midterms). Gas above $5/gal is political poison.
Israel One red line that matters: nuclear program. Multi-front exposure makes unilateral action risky. Needs US political cover for anything big.
Gulf States Sitting on 3M bbl/day of spare capacity — the world's only meaningful oil buffer. Can pump to help or withhold for leverage. Trying not to get hit by either side.

Want China? Create china.md, write a soul, restart. The system picks it up automatically.

2. Tags

Eleven pluggable scenario modifiers. Toggle them on or off before each run. Each one injects extra context into the agents it affects.

Nuclear Brinkmanship. Oil Weapon. Houthi Wildcard. Cyber Offensive. Diplomatic Breakthrough. Domestic Unrest. Mix and match — the combinations get wild.

3. The chaos slider

One number from 0 to 1. Low chaos: agents seek diplomacy, act cautiously, find off-ramps. High chaos: bold moves, miscommunication, red lines get tested. Same scenario, different chaos — completely different world.

The fun part

We expected the system to produce plausible-sounding but predictable narratives. It didn't.

The one that surprised us

Chaos at 0.7. Nuclear Brinkmanship + Oil Weapon both active. Iran announced it was accelerating enrichment toward weapons-grade and signaled willingness to negotiate — simultaneously. Classic dual-track. The US agent read the move and responded with targeted strikes on enrichment facilities.

Here's where it got interesting: Israel held back. Its soul explicitly prioritizes preemptive action on nuclear threats. But the agent reasoned — on its own — that with the US already striking, unilateral Israeli action would look like "piling on" rather than self-defense, weakening the coalition narrative. It decided to free-ride on US strikes.

Nobody programmed that. The agent derived it from the constraint profile and the observed behavior of the other agents.

Meanwhile Gulf States announced they would withhold spare capacity until they got explicit security guarantees. Oil jumped $18 in one round. The market agent's take: "Saudi restraint is the clearest signal yet that Riyadh views this crisis as an opportunity to renegotiate the entire Gulf security architecture."

We've run dozens of simulations now. A few patterns:

  • Some scenarios are stable at low chaos but spiral into nuclear confrontation at 0.7. Others are surprisingly robust even when everyone's aggressive. The sensitivity itself is the insight.
  • Oil price creates a feedback loop nobody anticipated. Spike → US pressure to de-escalate → Iran reads the signal → tightens blockade → spike again.
  • Iran almost always develops a dual-track strategy (threaten + negotiate). We didn't program that — the soul says "time is your friend" and the agent figures out what that means in practice.

Under the hood

FastAPI + LiteLLM
Backend routes to Gemini, GPT-4o, Claude, or DeepSeek through one interface. Bring your own API key or use the free trial tier (DeepSeek V3).
Vue 3 + Tailwind
Terminal-styled frontend. Decisions stream in real-time via SSE — you watch each agent think, one by one.
Everything is markdown
Countries, scenario tags, background briefings — all plain text files. No database, no admin panel. Edit and restart.
Scenario-agnostic engine
Fork it and replace Hormuz with the Taiwan Strait, the South China Sea, or the Suez Canal. The engine doesn't care. The souls and tags make it specific.

Each round, every agent gets a layered prompt: strategic doctrine, profile parameters, chaos instruction, scenario background, active tag injections, and the last three rounds of history. The agent outputs strategic reasoning, constraints it considered, a public action, and a risk assessment. Then the market agent prices in the consequences.

P_new = P_old * (1 + alpha * delta_escalation + noise)

Oil price feeds back into the next round. The loop is the game.

Try it / Fork it

It's MIT licensed. The whole thing runs locally with Docker or a simple pip install + npm install.

  • Free trial: 20 rounds on DeepSeek V3, no API key needed
  • BYOK: plug in your Gemini / OpenAI / Anthropic / DeepSeek key for unlimited rounds
  • Add a country: write a markdown file
  • Add a crisis variant: write another markdown file
  • Change the entire scenario: edit one JSON file

This was a weekend project that turned out more interesting than we expected. If you do something fun with it — a different conflict, a historical scenario, a classroom exercise — we'd love to hear about it.

← Back to Insights