An AI analyst that argues with itself before producing a price target

March 27, 2026 · Deep Dive

Ask any LLM to analyze a stock. You'll get a report that reads beautifully and falls apart the moment you check a single number.

Stock Analyzer was built to fix that. Seven specialized modules, running in sequence, each one seeing the output of the last. A short-seller tears apart the bull case before anyone gets to set a price target. Python does the math so the LLM never has to. And the system remembers what it said three months ago.

Analysis modules

~$1

Per company

10 min

Full deep-dive

$0.03

Weekly monitoring scan

I. The problem: why LLM stock analysis doesn't work

A single LLM call produces a single narrative. The model optimizes for coherence, not correctness. A story that holds together perfectly can still be completely wrong.

Failure mode	What actually happens
Hallucinated financials	Revenue and growth figures look plausible but are quietly wrong. No way to tell without checking each one manually.
Confirmation cascade	Business model → risks → valuation. Each section confirms the one before it. One bad assumption echoes through the entire report unchallenged.
Black-box DCF	Discount rates and terminal growth appear with zero provenance. The arithmetic can be off by 40%.
Decorative risk section	Risks read like they were copied from the company's own annual report. Technically accurate, functionally useless for investment decisions.

II. The solution: a 7-module adversarial pipeline

Seven modules, DAG-ordered by dependency. Each one has a single job, real financial data as input, and the accumulated output of every module before it.

Module	Role	Why it matters
Classifier	Industry tagging	Auto-loads sector-specific prompts (pharma pipeline risk, manufacturing capacity, consumer brand strength)
Business Model	Core economics	Revenue drivers, cost structure, moat — anchored to real financial statements via MCP
Risk Assessment	Red-flag detection	Risk matrix with valuation discount factors. Catches what the company's own filings won't tell you
Outlook	Forward scenarios	The only module with web search. All results fact-checked before flowing downstream
Bull/Bear Debate	Adversarial stress-test	Short-seller attacks → bull defends → judge rules
Valuation	LLM + Python hybrid	LLM sets assumptions, Python does the arithmetic. Multi-method cross-validation
Synthesis	Investment decision	Buy / hold / sell with target price, stop-loss, and explicit expiration conditions

Four design choices make this pipeline different from chaining prompts together.

The bear goes first

After three modules build up the bull case, the system tries to destroy it.

Step	Role	Mandate
1. Bear	Short-seller research director	Find the weakest assumptions. No balance, no fairness. Attack with data.
2. Bull	Long-side defense	Read the attack. Defend point by point with counter-evidence from the same data.
3. Judge	Neutral arbiter	Issue a verdict on each contested point. Explicit reasoning required.

Why bear first? By the time a business model and risk section have been written, confirmation bias has already set in. Each module hardens the bull case. The bear breaks it open before the narrative solidifies.

The bear also hunts for echo chambers: when all three prior modules repeat the same unverified assumption as established fact. Three-way agreement might mean robustness. It might also mean one bad assumption echoed through the pipeline unchecked.

Case — chemical shipping company, March 2026

The first three modules built a classic contrarian narrative: rock-bottom asset leverage (36% debt ratio), counter-cyclical fleet expansion, dual catalysts from geopolitical disruption and a new free trade zone. It looked like a textbook bottom-fishing opportunity.

The bear tore it apart. It ran an "echo chamber" audit and found all three modules assumed the company's Hainan subsidiary would generate windfall profits from a new trade policy — but the subsidiary already had 6 vessels in operation during Q3 2025, and profits had collapsed 30% that quarter. The catalyst was already priced in and failing.

The bear's core attack: RMB 685M in construction-in-progress was about to convert to fixed assets when 5 new ships launched in 2026. The resulting depreciation surge, hitting a company whose gross margin had already dropped from 36% to 28%, would mathematically destroy earnings growth. The bear assigned a 90% probability of failure to the bull's +12% profit growth forecast.

The judge ruled: bear wins on earnings (depreciation bomb is real), bull wins on solvency (no bankruptcy risk), bull wins on policy timing (bear used Q3 data to disprove a Q1 2026 catalyst — a logical error). Final verdict: deadlock, but the target price was cut from the initial bull estimate by over 35%.

LLM judges, Python calculates

Early versions let the LLM compute DCF directly. The math looked clean. It was sometimes off by 40%. Now, judgment and arithmetic are completely separated:

LLM  →  assumptions.json  →  Python  →  calculations.json  →  LLM  →  report
(judgment)                     (math)                          (narrative)

Sanity checks — bull target above bear? Margins physically possible? EPS consistent across methods?
Auto-retry — failed checks trigger assumption regeneration, up to 3 rounds
Method selection — the LLM picks which methods apply (PE, PB, EV/EBITDA, DCF, dividend yield); Python computes all of them
Full auditability — every intermediate value saved as JSON, traceable to inputs

Search only when it matters, then fact-check everything

Most AI analysis tools give the LLM unrestricted web access. Stock Analyzer does the opposite: six of seven modules have zero search capability. They work exclusively from structured financial data delivered via MCP protocol.

Why restrict search? Giving an LLM search access during business model or risk analysis makes the output worse. The model pulls in secondhand commentary instead of doing the harder work of reasoning about primary financial data. Search is only useful for forward-looking information that doesn't exist in financial statements.

Only the Outlook module gets search. And everything it finds goes through a three-tier fact-check gate before reaching downstream modules:

Priority	Criteria	Action
Must-verify	Claim affects core investment thesis	Human review required before proceeding
Should-verify	Supporting argument with specific numbers	Flagged for review, analysis continues
Low priority	Background context	Passed through with lower confidence tag

The system extracts every factual claim containing a specific number, categorizes it, and generates fact_check.yaml for human review. Downstream modules see the verification status of each fact and are instructed to never build a core argument on unverified data.

Case — same shipping company

The Outlook module's web search pulled in 12 factual claims — from fleet capacity data attributed to the Ministry of Transport, to a specific port throughput figure ("Yangpu port cargo +43.2% in Q1 2026"). The system auto-classified 4 as must-verify (fleet expansion plans, industry capacity figures that directly affected the valuation), 5 as should-verify, and 3 as low priority. The resulting fact_check.yaml gave the analyst a focused checklist of exactly which numbers to confirm before trusting the report's forward-looking conclusions.

Persistent memory and self-monitoring

Most AI tools are stateless. Stock Analyzer maintains a persistent trace per company — and knows exactly what would change its mind.

After each analysis, the system generates a watchlist: 4-8 conditions, each one specific and testable.

Watchlist example	Trigger	Thesis impact
FY2025 earnings	Net profit YoY decline > 20%, or gross margin < 55%	Downgrade earnings forecast, reassess valuation
Titanium project milestone	Delay > 6 months, or trial yield below industry avg	Second growth curve collapses, cut long-term target

A weekly scan checks each item against live news. Triggered items fire an incremental update — every module re-runs with its prior output plus a delta summary. The question is always: does the previous conclusion still hold?

Over months, a layered history builds up. Round 1 sets the baseline. Round 2 updates after earnings. Round 3 responds to a policy shift. Each round records what changed and why. The system's conviction evolves with evidence, not from scratch.

III. What comes out

One run, one company, ~$1, ~10 minutes. The output is a PDF research report with a complete audit trail:

Section	Content
Industry classification	Auto-detected sector with specialized analysis prompts loaded (pharma, manufacturing, consumer, etc.)
Business model	Revenue drivers, cost structure, competitive moat — every claim anchored to financial data
Risk assessment	Risk matrix with red-flag detection, valuation discount factors
Forward outlook	Bull / base / bear scenarios with search-grounded catalysts, fact-checked
Bull/Bear debate	Full transcript: bear attack → bull defense → judge verdict on each contested point
Valuation	Multi-method (DCF, PE, PB, EV/EBITDA), Python-computed, with sensitivity analysis
Investment decision	Buy / hold / sell, target price, stop-loss, position sizing by investor type, explicit expiration conditions

Every report ships with a _debug/ folder containing the full prompt sent to each model. Any conclusion can be traced back to its exact inputs. The system also generates a fact_check.yaml for human verification and a watchlist.json for automated monitoring.

One person. Seven modules. A research report that used to require an entire team — generated in 10 minutes for $1. And the system will tell you exactly when it stops believing its own conclusions.

Follow our progress on LinkedIn →

让任何一个大语言模型分析一只股票。你会得到一份读起来很漂亮、核实一个数字就碎掉的报告。

Stock Analyzer 就是为了解决这个问题。七个专门模块按顺序执行，每一个都能看到前面所有模块的产出。空头在任何人给出目标价之前先撕碎多头论述。Python 做算术，LLM 永远不碰计算。而且系统记得三个月前自己说过什么。

分析模块

~$1

每家公司

10 分钟

完整深度分析

$0.03

每周自动巡检

一、痛点：为什么直接问 LLM 行不通

单次 LLM 调用产出单一叙事。模型优化的是连贯性，不是准确性。一个逻辑完美的故事可以完全错误。

失败模式	实际表现
财务数据幻觉	营收和增长率看着合理但悄悄就错了。不手动核实根本分辨不出来。
偏见滚雪球	商业模式 → 风险 → 估值，每个章节都在确认上一个。一个错误假设在整份报告里一路确认到底。
黑箱 DCF	折现率和永续增长凭空出现，零来源。计算偏差可达 40%。
装饰性风险章节	风险描述像从公司年报里抄的。技术上没错，对投资决策没用。

二、解法：七模块对抗式流水线

七个模块，按 DAG 依赖排序执行。每个模块只做一件事，用真实财务数据作为输入，获得前面所有模块的累积产出。

模块	角色	为什么重要
Classifier	行业分类	自动加载行业专项 prompt（医药管线风险、制造产能利用率、消费品牌力）
Business Model	商业模式内核	营收驱动、成本结构、护城河 — 锚定在真实财报上，通过 MCP 协议获取
Risk Assessment	红旗检测	风险矩阵 + 估值折价因子。抓出公司年报不会告诉你的东西
Outlook	前瞻情景	唯一开放搜索的模块。搜索结果在流入下游前必须经过事实核查
Bull/Bear Debate	对抗式压力测试	空头攻击 → 多头反驳 → 仲裁裁决
Valuation	LLM + Python 混合	LLM 出假设，Python 做计算。多方法交叉验证
Synthesis	投资决策	买入/持有/卖出，附目标价、止损价和明确的失效条件

四个设计决策让这条流水线和简单的 prompt 串联完全不同。

空头先发言

前三个模块构建起多头论述之后，系统试图摧毁它。

步骤	角色	指令
1. 空头	做空基金研究总监	找到最薄弱的假设。不要平衡，不要公正。用数据攻击。
2. 多头	多方辩护	阅读攻击，逐一反驳，用同一份数据中的反面证据回击。
3. 仲裁	中立裁判	对每个争议点裁决，必须给出明确推理。

为什么让空头先说？写完商业模式和风险评估之后，确认偏误已经形成了。每个模块都在加固多头论述。空头在叙事固化之前把它撕开。

空头还会搜猎回声室效应：前面三个模块把同一个未经验证的假设当成既定事实反复引用。三方一致可能说明结论稳健，也可能说明一个错误假设在流水线里无人质疑地传了三遍。

实例 — 化工航运公司，2026 年 3 月

前三个模块构建了一个典型的左侧布局叙事：极低的资产负债率（36%），逆周期扩张船队，地缘冲突 + 自贸区双重催化。看起来像教科书式的底部机会。

空头把它撕了。它做了一次"回声室"审计，发现三个模块都在假设公司的海南子公司会从新贸易政策中获得暴利 — 但这家子公司已经有 6 艘船在运营，2025 年三季度利润反而暴跌了 30%。催化剂已经在 price in 了，而且正在失效。

空头的核心攻击：6.85 亿元在建工程即将在 2026 年 5 艘新船下水时转为固定资产。在毛利率已经从 36% 跌到 28% 的背景下，猛增的折旧将从数学上摧毁盈利增长。空头给多头 +12% 的利润增速预期判了 90% 的失效概率。

仲裁结果：盈利问题空头胜（折旧炸弹是真的），偿债能力多头胜（没有破产风险），政策时间点多头胜（空头用三季度数据去证伪一季度才落地的政策，时间逻辑硬伤）。最终裁决：势均力敌，但目标价较最初多头估计下调超过 35%。

LLM 判断，Python 计算

早期版本让 LLM 直接算 DCF。数学看起来合理，偏差有时候达到 40%。现在判断和计算彻底分离：

LLM  →  assumptions.json  →  Python  →  calculations.json  →  LLM  →  report
(判断)                         (计算)                          (叙事)

合理性检查 — 牛市目标价高于熊市？利润率逻辑上合理？各方法 EPS 一致？
自动重试 — 检查失败触发假设重新生成，最多 3 轮
方法选择 — LLM 选适用的估值方法（PE、PB、EV/EBITDA、DCF、股息率），Python 全部算出来
完全可审计 — 每个中间值以 JSON 保存，任何数字可追溯到输入

只在该搜的时候搜，搜完全部核查

大多数 AI 分析工具把搜索当默认能力，恨不得每个环节都联网。Stock Analyzer 反过来：七个模块中有六个完全没有搜索权限，只用通过 MCP 协议注入的结构化财务数据。

为什么限制搜索？在商业模式或风险分析阶段给 LLM 搜索权限，产出反而变差。模型会引用二手评论，而不是对一手财务数据做深度推理。搜索只对财报里没有的前瞻性信息有用：政策动向、竞争对手变化、即将到来的催化剂。

只有 Outlook 模块开放搜索。它找到的所有信息在流入下游之前，必须通过三级事实核查门：

优先级	标准	处理方式
必须核实	声明影响核心投资逻辑	人工审核通过后才能继续
建议核实	含具体数字的支撑论据	标记待审，分析继续推进
低优先级	背景上下文	带低置信度标签放行

系统自动提取所有含具体数字的事实性声明，分级后生成 fact_check.yaml 供人工审核。下游模块能看到每条事实的核查状态，被明确要求不得在未经验证的数据上构建核心论点。

实例 — 同一家航运公司

Outlook 模块的搜索拉回了 12 条事实声明 — 从交通运输部的船队运力数据，到一个具体的港口吞吐量数字（"洋浦港一季度货物吞吐量 +43.2%"）。系统自动将 4 条标记为必须核实（船队扩张计划、直接影响估值的行业运力数据），5 条标记为建议核实，3 条标记为低优先级。生成的 fact_check.yaml 给了分析师一份精准的核查清单：在信任报告的前瞻性结论之前，需要确认哪些数字。

持久记忆与自我监控

大多数 AI 分析工具是无状态的。Stock Analyzer 为每家公司维护持久的分析轨迹 — 而且明确知道什么会改变自己的结论。

每次分析后，系统生成一份监控清单：4-8 个具体的、可判断的条件。

监控项示例	触发条件	对论点的影响
2025年报业绩	归母净利润同比下滑 > 20%，或毛利率 < 55%	下调盈利预测，重新评估估值
钛材项目进度	延期 > 6 个月，或试产良率低于行业平均	第二增长曲线崩塌，砍远期目标价

每周自动扫描逐一检查每个监控项。触发了就跑增量分析 — 每个模块拿到上轮输出和变化摘要重新运行。问题永远是：之前的结论还成立吗？

经过几轮，层叠式的历史逐渐积累。第一轮建立基线，第二轮在季报后更新，第三轮回应政策变化。每一轮记录什么变了、为什么变。系统的判断随证据演化，不是每次从头来过。

三、产出

一次运行，一家公司，约 1 美元，约 10 分钟。产出是一份带完整审计链的 PDF 研究报告：

章节	内容
行业分类	自动检测行业，加载对应行业专项分析 prompt（医药、制造、消费等）
商业模式	营收驱动、成本结构、竞争护城河 — 每个判断锚定在财务数据上
风险评估	风险矩阵 + 红旗检测 + 估值折价因子
前瞻展望	牛/中/熊三情景，搜索增强的催化剂识别，经事实核查
多空辩论	完整记录：空头攻击 → 多头反驳 → 仲裁者对每个争议点的裁决
估值	多方法（DCF、PE、PB、EV/EBITDA），Python 计算，含敏感性分析
投资决策	买入/持有/卖出，目标价，止损价，按投资者类型的仓位建议，明确的失效条件

每份报告附带 _debug/ 文件夹，保存发送给每个模型的完整 prompt，任何结论都可以追溯到原始输入。系统同时生成 fact_check.yaml 供人工核查，以及 watchlist.json 供自动监控。

一个人，七个模块。以前需要一整个团队产出的研究报告，现在 10 分钟 1 美元生成 — 而且系统会在自己不再相信结论的时候主动告诉你。

在 LinkedIn 关注我们的进展 →

← Back to Insights