Four nations, four LLMs, one strait

April 4, 2026 · Side Project

Peakstone Labs is a quant shop. We build factor models, risk parity portfolios, and AI-powered fundamental analysis. Serious stuff.

This is not that.

Last weekend the Strait of Hormuz crisis was all over the news — Iran blockading shipping, US strikes, oil past $110. We were watching the same four or five analysts on Twitter all confidently predicting completely different outcomes. So we thought: what if we gave each country its own AI brain and just… let them figure it out?

Forty-eight hours later we had a working prototype. Then we couldn't stop playing with it. Now it's open source.

What is it

A real-time geopolitical sandbox. Four countries — Iran, US, Israel, Gulf States — each controlled by an independent LLM agent. You adjust a chaos slider, pick some crisis modifiers, and hit start. The agents read the situation, reason about their constraints, and take actions. A market agent watches everything and updates the oil price. Then the next round starts, and everyone reacts to what just happened.

No script. No predetermined outcomes. The agents surprise us regularly.

How it works

Three things define a simulation:

1. Souls

Each country has a "soul" — a markdown file that defines who it is. YAML frontmatter for the numbers (aggressiveness, economic tolerance, red lines), markdown body for the strategic doctrine.

Agent	In a nutshell
Iran	"Time is your friend." Asymmetric warfare, proxy networks, Strait blockade as leverage. New Supreme Leader still consolidating power. Can't win a conventional war, doesn't plan to fight one.
US	Stuck between three bad options: retreat (embarrassment), status quo (bleeding money), escalation (casualties before midterms). Gas above $5/gal is political poison.
Israel	One red line that matters: nuclear program. Multi-front exposure makes unilateral action risky. Needs US political cover for anything big.
Gulf States	Sitting on 3M bbl/day of spare capacity — the world's only meaningful oil buffer. Can pump to help or withhold for leverage. Trying not to get hit by either side.

Want China? Create china.md, write a soul, restart. The system picks it up automatically.

2. Tags

Eleven pluggable scenario modifiers. Toggle them on or off before each run. Each one injects extra context into the agents it affects.

Nuclear Brinkmanship. Oil Weapon. Houthi Wildcard. Cyber Offensive. Diplomatic Breakthrough. Domestic Unrest. Mix and match — the combinations get wild.

3. The chaos slider

One number from 0 to 1. Low chaos: agents seek diplomacy, act cautiously, find off-ramps. High chaos: bold moves, miscommunication, red lines get tested. Same scenario, different chaos — completely different world.

The fun part

We expected the system to produce plausible-sounding but predictable narratives. It didn't.

The one that surprised us

Chaos at 0.7. Nuclear Brinkmanship + Oil Weapon both active. Iran announced it was accelerating enrichment toward weapons-grade and signaled willingness to negotiate — simultaneously. Classic dual-track. The US agent read the move and responded with targeted strikes on enrichment facilities.

Here's where it got interesting: Israel held back. Its soul explicitly prioritizes preemptive action on nuclear threats. But the agent reasoned — on its own — that with the US already striking, unilateral Israeli action would look like "piling on" rather than self-defense, weakening the coalition narrative. It decided to free-ride on US strikes.

Nobody programmed that. The agent derived it from the constraint profile and the observed behavior of the other agents.

Meanwhile Gulf States announced they would withhold spare capacity until they got explicit security guarantees. Oil jumped $18 in one round. The market agent's take: "Saudi restraint is the clearest signal yet that Riyadh views this crisis as an opportunity to renegotiate the entire Gulf security architecture."

We've run dozens of simulations now. A few patterns:

Some scenarios are stable at low chaos but spiral into nuclear confrontation at 0.7. Others are surprisingly robust even when everyone's aggressive. The sensitivity itself is the insight.
Oil price creates a feedback loop nobody anticipated. Spike → US pressure to de-escalate → Iran reads the signal → tightens blockade → spike again.
Iran almost always develops a dual-track strategy (threaten + negotiate). We didn't program that — the soul says "time is your friend" and the agent figures out what that means in practice.

Under the hood

FastAPI + LiteLLM

Backend routes to Gemini, GPT-4o, Claude, or DeepSeek through one interface. Bring your own API key or use the free trial tier (DeepSeek V3).

Vue 3 + Tailwind

Terminal-styled frontend. Decisions stream in real-time via SSE — you watch each agent think, one by one.

Everything is markdown

Countries, scenario tags, background briefings — all plain text files. No database, no admin panel. Edit and restart.

Scenario-agnostic engine

Fork it and replace Hormuz with the Taiwan Strait, the South China Sea, or the Suez Canal. The engine doesn't care. The souls and tags make it specific.

Each round, every agent gets a layered prompt: strategic doctrine, profile parameters, chaos instruction, scenario background, active tag injections, and the last three rounds of history. The agent outputs strategic reasoning, constraints it considered, a public action, and a risk assessment. Then the market agent prices in the consequences.

P_new = P_old * (1 + alpha * delta_escalation + noise)

Oil price feeds back into the next round. The loop is the game.

Try it / Fork it

It's MIT licensed. The whole thing runs locally with Docker or a simple pip install + npm install.

Free trial: 20 rounds on DeepSeek V3, no API key needed
BYOK: plug in your Gemini / OpenAI / Anthropic / DeepSeek key for unlimited rounds
Add a country: write a markdown file
Add a crisis variant: write another markdown file
Change the entire scenario: edit one JSON file

This was a weekend project that turned out more interesting than we expected. If you do something fun with it — a different conflict, a historical scenario, a classroom exercise — we'd love to hear about it.

Play now → GitHub →

Peakstone Labs 平时干的是量化研究 — 因子模型、风险平价组合、AI 基本面分析。

这次不一样。

上个周末，霍尔木兹海峡危机铺天盖地 — 伊朗封锁航运、美国空袭、油价破 110。我们刷着推特，发现同样四五个分析师都在信心满满地预测完全不同的结局。于是我们想：如果给每个国家各分配一个 AI 大脑，然后就… 让它们自己搞呢？

四十八小时后原型跑通了。然后我们自己玩上瘾了。现在它开源了。

这是什么

一个实时地缘政治沙盒。四个国家 — 伊朗、美国、以色列、海湾国家 — 各自由一个独立的大模型智能体操控。你拨一下混沌滑块，选几个危机修饰符，按开始。智能体们各自判断形势、权衡约束、采取行动。一个市场智能体在旁边盯着，随时更新油价。然后下一轮开始，所有人对刚才发生的事做出反应。

没有剧本。没有预设结局。智能体们经常给我们惊喜。

怎么运作的

三样东西定义一次推演：

1. 灵魂

每个国家有一个"灵魂" — 一份 markdown 文件定义了它是谁。YAML 头部放数值（攻击性、经济容忍度、红线），正文写战略教义。

智能体	一句话人设
伊朗	"时间站在你这边。"非对称战争、代理人网络、海峡封锁当筹码。新最高领袖还在巩固权力。打不了正规战，也没打算打。
美国	困在三个烂选项之间：撤退（丢脸）、维持现状（烧钱）、升级（中期选举前出伤亡）。油价过 5 美元/加仑就是政治毒药。
以色列	只有一条真正重要的红线：核计划。多线作战导致单边行动风险高。大动作需要美国政治背书。
海湾国家	坐拥 300 万桶/天闲置产能 — 全球唯一有意义的石油缓冲。可以增产帮忙，也可以惜售施压。两边都不敢得罪。

想加中国？建个 china.md，写好灵魂，重启就行。系统自动加载。

2. 标签

11 个可插拔的情景修饰符，跑之前随便开关。每个标签给受影响的智能体注入额外上下文。

核边缘政策、石油武器、胡塞通配符、网络攻击、外交突破、国内动荡… 随意混搭 — 组合一多就开始魔幻了。

3. 混沌滑块

一个 0 到 1 的数字。低混沌：智能体寻求外交，谨慎行事，找退出坡道。高混沌：激进操作，误判高发，红线被试探。同一个情景，不同混沌 — 完全不同的世界。

好玩的部分

本来以为系统会生成听起来像那么回事但本质可预测的叙事。结果不是。

让我们意外的一局

混沌 0.7。核边缘政策 + 石油武器同时开启。伊朗宣布加速武器级浓缩，同时释放谈判意愿 — 经典双轨。美国智能体看穿了这手棋，回应以对浓缩设施的定点打击。

然后有意思的来了：以色列按兵不动。它的灵魂写得很明确 — 优先主动打击核威胁。但智能体自己推出了一条逻辑：美国已经动手了，以色列再上就不是"自卫"而是"蹭"，会削弱联盟叙事。它决定搭美国的便车。

没人编程写过这条策略。智能体从约束档案和观察到的他方行为中自己推导出来的。

与此同时，海湾国家宣布扣住闲置产能不放，直到拿到明确安全保证。油价单轮跳了 $18。市场智能体的评论："沙特的克制是迄今最清晰的信号：利雅得把这场危机看作重新谈判整个海湾安全架构的窗口。"

到现在我们跑了几十局了。几个规律：

有些情景在低混沌下稳定，到 0.7 直接螺旋升级到核对抗。另一些出奇地稳 — 即使大家都很激进，均衡照样成立。敏感性本身就是洞见。
油价创造了一个谁都没料到的反馈循环：飙升 → 美国承压降级 → 伊朗读到信号 → 收紧封锁 → 再飙升。
伊朗几乎每次都会自发发展出双轨策略（威胁 + 谈判并行）。我们没编程写这个 — 灵魂写着"时间站在你这边"，智能体自己想出了这句话的实操含义。

技术实现

FastAPI + LiteLLM

后端统一路由到 Gemini、GPT-4o、Claude、DeepSeek。带自己的 Key 或者用免费试玩（DeepSeek V3）。

Vue 3 + Tailwind

终端风格前端。决策通过 SSE 实时流式推送 — 你能看着每个智能体一个一个地思考。

一切皆 Markdown

国家、情景标签、背景简报 — 全是纯文本文件。不用数据库，不用管理后台。改完重启。

情景无关的引擎

Fork 之后把霍尔木兹换成台湾海峡、南海或苏伊士运河。引擎不在乎。灵魂和标签才是让它具体的东西。

每轮每个智能体收到一份分层提示词：战略教义、档案参数、混沌指令、情景背景、标签注入、最近三轮历史。输出战略推理、考虑的约束、公开行动、风险评估。然后市场智能体对后果定价。

P_new = P_old * (1 + alpha * delta_escalation + noise)

油价反馈到下一轮。循环就是博弈。

来玩 / 来 Fork

MIT 协议。Docker 一键跑，或者 pip install + npm install 本地起。

免费试玩：20 轮 DeepSeek V3，不需要 API Key
BYOK：插上你的 Gemini / OpenAI / Anthropic / DeepSeek Key，无限轮
加一个国家：写一个 markdown
加一个危机变体：再写一个 markdown
换整个情景：改一个 JSON

这是一个周末项目，但比我们预想的有意思得多。如果你拿它做了好玩的事 — 换一场冲突、搞一个历史推演、用在课堂上 — 欢迎告诉我们。

直接玩 → GitHub →

← Back to Insights