One sentence. One factor. Meet FactorLab.

April 17, 2026 · Product Launch

You've been there. You read a research note and one sentence stops you: "Companies with high gross margins and low receivables-to-revenue ratios consistently outperform in the following month."

The logic checks out. High gross margins signal pricing power. Low receivables mean customers pay fast — cash flow is clean. If you could validate this across the full market, you might have a real alpha source.

So you open Jupyter. Pull up your data API. Start writing.

An hour later, you're still debugging the financial data alignment logic. Two hours in, you're fighting timestamp mismatches at rebalancing dates. Three hours later, your first IC curve appears — and it looks terrible. You can't tell if the factor doesn't work or if your code is wrong.

Four hours later, you give up. The idea joins the 87 other unvalidated hypotheses in your notes.

This is the cost we want to eliminate. Not just reduce — eliminate.

I. What FactorLab does

FactorLab is Peakstone Labs' AI-driven factor research engine. One sentence in, one research report out.

You describe an investment thesis in plain language. FactorLab translates it into quantitative language, writes the Python, runs a three-period backtest across four holding horizons, and — three rounds of AI iteration later — returns a complete research report. You can close the tab after you submit.

Lines of code required

AI iteration rounds

Test periods (IS / OOS1 / OOS2)

5–20

Minutes end-to-end

II. How it works

Intent parsing

The LLM translates your plain-language thesis into quantitative language: which financial ratios to compute, how to handle point-in-time financial data, how to normalize across sectors, what holding period to target. This step requires genuine domain knowledge — not just grammar. The model has to know that "high gross margin" means (revenue − COGS) / revenue, cross-sectionally ranked, with the appropriate reporting lag applied.

Code generation

From the parsed intent, the AI generates executable Python: factor formula, calculation function, data alignment across financial report dates and trading days, cross-sectional normalization, and factor combination weights. This code runs the actual backtest — the LLM wrote it but doesn't touch the math.

Three-period × multi-horizon backtesting

Looking only at in-sample IC is the most common trap in factor research. A factor can look excellent in-sample and fall apart the moment you move past it. FactorLab splits the data three ways:

In-Sample — the training window. This is where the factor gets its initial score.

OOS1

Out-of-Sample 1 — first validation. Does the factor generalize beyond the period it was computed on?

OOS2

Out-of-Sample 2 — the most recent data. Is the factor still working now?

Each period runs across four holding horizons (5 / 10 / 21 / 63 days). Metrics per combination: IC, ICIR, quintile returns, long-short Sharpe, max drawdown, and turnover. The scoring formula down-weights in-sample performance to penalize overfitting:

Weighted ICIR = 0.5 × IS + 0.3 × OOS1 + 0.2 × OOS2

Factors that collapse out-of-sample get penalized automatically.

FactorLab IS/OOS1/OOS2 metrics table — A real FactorLab metrics table. The orange cells in OOS2 ICIR flag where out-of-sample performance decays — the kind of signal that gets buried when you only look at in-sample numbers.

AI self-evaluation and iteration

Most tools stop after the backtest and hand you a table. Figuring out what went wrong and what to try next is left entirely to you — and that's usually where the investigation dies.

After each backtest, FactorLab's AI reads its own results and writes a diagnosis. Specific, not generic:

"This round's factor reached IS ICIR of 0.18 with good quintile monotonicity. But OOS1 ICIR decayed to 0.06, and the top/bottom spread nearly disappeared in OOS2. Likely cause: the gross margin percentile ranking created thresholds too sensitive to sector-cycle shifts. Next round: replace the static percentile rank with a year-over-year rate of change — capturing margin improvement rather than absolute level."

FactorLab iteration rounds showing factor variants with formulas and design rationale — Two rounds from a live research run. Each entry shows the factor name, formula, design rationale, and best IS ICIR. Round 2 factors are already materially different from Round 1 — the AI rewrote both the signal and the combination logic after reading its own results.

Then it acts on that assessment. Three rounds by default: an initial attempt, a failure analysis, and a revised factor built from what that analysis actually found. The third round tends to be the most useful, because by then the model has seen two iterations of real data.

Full research report

The AI assembles everything into a structured report: SOTA factor name, weighted ICIR, long/short Sharpe, iteration count, and total runtime up top; then the full three-period × multi-horizon metrics, equity curves, candidate factor comparison, and every round's AI evaluation with the reasoning behind each change. You get the answer and the full trail of how it got there.

III. Getting started

Two things we added after launch, both from direct user feedback:

Seed idea cards. The blank input box turns out to be its own obstacle. When you can type anything, it's harder to type something. The submission page now shows three randomly drawn hypothesis cards — things like "Low-PB broken-value stocks with a mean reversion alpha" or "8-quarter high-and-stable ROE as a quality signal." Click one to populate the input, edit it to fit your view, or ignore the cards entirely. There are 50+ in the pool, covering valuation, quality, growth, momentum, volatility, cash flow, and R&D factors. Refresh the batch if none of the three land.

FactorLab submission page with seed idea cards — The submission page. Three randomly drawn seed cards sit below the input box — click one to populate it, or ignore them and type your own. The task history panel on the right shows completed, cancelled, and failed runs.

Entry via WeChat. FactorLab is currently accessible through the Peakstone Labs WeChat official account. Search for Peakstone Labs on WeChat, send any message to receive your entry link, enter your research idea, and step away. A notification arrives when the report is ready — or check your task history on your next visit. (Rate limit: 5 tasks per account per day. A direct web interface is on the roadmap.)

IV. Why now

The honest answer: two things became true at the same time.

First, LLM code capability crossed a threshold. Three years ago you could ask a model to write a factor formula. But reading your own backtest results, spotting where a factor actually breaks, and writing a fix that addresses the right thing — that's a much harder problem. Current models (Claude Opus 4.6+, GPT-4o-class) can actually do it. Earlier ones couldn't, and it showed.

Second, the data infrastructure matured. FactorLab runs on QuantDataHub, Peakstone Labs' in-house data layer: 28 MCP tools, 23 data categories, point-in-time financial data, four-tier caching. Without that layer, each backtest iteration would be fighting raw API rate limits. The iteration loop only works because the data layer is fast and stable.

A factor research report used to take a quant team a week. Now it takes minutes — and starts with a sentence you type.

你一定经历过这样的时刻。

读完一份研报，看到一句话让你眼前一亮："高毛利率、低应收账款占比的公司，在接下来一个月的股价表现显著跑赢市场。"

你把这句话反复读了两遍。逻辑看起来很合理：高毛利率说明生意好、有定价权，低应收账款说明客户付钱爽快、现金流健康。如果能在全市场跑一下，验证这个因子的有效性，说不定就是一个不错的 alpha 来源。

然后你打开 Jupyter，点开数据接口，准备写代码。

一个小时后，你还在写财务数据的对齐逻辑。两个小时后，你还在调分组回测里换仓日的时间戳。三个小时后，你跑出了第一张 IC 曲线图。结果长得很丑，你不确定是因子没用，还是代码写错了。

四个小时后，你放弃了。这个想法永远留在了你的笔记里，和另外 87 个类似的未验证假设放在一起。

这个成本，我们想把它降到零。不是降低，是降到零。

一、FactorLab 是什么

FactorLab 是 Peakstone Labs 上线的 AI 驱动因子研究引擎。定位一句话：一句话挖一个因子。

你用自然语言描述一个投资想法，FactorLab 把这个想法翻译成量化语言、生成代码、跑三区间回测、AI 自主迭代优化，最终返回一份完整的研究报告——这段时间你可以去忙别的。

需要写的代码

AI 迭代轮次

回测区间（IS / OOS1 / OOS2）

5–20

分钟端到端

二、它怎么工作

意图理解

LLM 把你的自然语言翻译成量化语言：提取哪些财务指标、怎么处理 PIT 财务数据的滞后、如何跨行业横截面标准化、预测周期是多少。这一步看起来像翻译，其实是最需要 domain 知识的环节——模型需要知道"高毛利率"该怎么算，报告披露日和交易日之间的对齐逻辑是什么。

自动写代码

翻译完意图，AI 直接生成可运行的 Python 代码：因子公式、计算函数、财报日与交易日的数据对齐、横截面标准化、因子合成权重。这份代码进入后续的回测引擎——LLM 写代码，但不碰计算本身。

三区间 × 多周期全面回测

只看样本内 IC 特别容易误导人。一个因子在样本内看起来很稳，换到样本外可能完全失效。FactorLab 强制把数据切成三段：

IS（样本内）

训练区间，因子在这里拿到初始分数。

OOS1（样本外1）

第一段验证区间。因子能泛化吗？

OOS2（样本外2）

最新数据。因子现在还灵吗？

每个区间下跑 4 个持仓周期（5 / 10 / 21 / 63 日），计算 IC、ICIR、分组收益、多空 Sharpe、最大回撤、换手率。最终评分用加权 ICIR，主动惩罚样本外衰减严重的因子：

加权 ICIR = 0.5 × IS + 0.3 × OOS1 + 0.2 × OOS2

样本外衰减严重的因子会自动被降权。

FactorLab 三区间指标表 — FactorLab 实际报告中的三区间指标表。OOS2 列 ICIR 的橙色单元格直接标出样本外衰减位置——只看样本内数字时这些问题全部隐藏。

AI 自主评估与迭代

大部分类似的产品在这步就停下来了——把一堆表格甩给你，让你自己判断因子好不好、下一步该怎么改。但"看完结果再决定下一步怎么改"才是真正耗神的那部分。

FactorLab 让 AI 自己闭环。每轮回测跑完，AI 读自己的结果，直接写出诊断：

"本轮因子在 IS 区间 ICIR 达到 0.18，分组单调性良好。但 OOS1 ICIR 衰减到 0.06，Top/Bottom 组的收益差在 OOS2 几乎消失。初步判断：毛利率分位数的静态分档在不同行业周期下阈值过于激进。下一轮尝试用毛利率同比变化率代替分位数，捕捉边际改善而非绝对水平。"

FactorLab 迭代历程：Round 1 和 Round 2 的因子变体 — 一次实际研究中的两轮迭代历程。每条记录包括因子名称、公式、设计思路和最优 IS ICIR。第二轮因子在信号和合成方式上都和第一轮有实质差异——AI 读完第一轮数据之后自己改的。

然后它就真的去改。默认跑 3 轮。第一轮是初始答案，第二轮看它怎么失效，第三轮才是真正有数据支撑的修正。

生成完整研报

AI 把整个研究过程组装成一份报告：顶部给出 SOTA 因子名称、加权 ICIR、多空 Sharpe、迭代轮次、总耗时；往下是完整的三区间 × 多周期指标表、净值曲线图、候选因子横向对比、每轮 AI 评估记录、结论与建议。你拿到的是结果，也是 AI 完整的推理过程。

三、怎么用

两个功能，都是用户反馈直接催生的：

候选想法卡片。 空白输入框本身就是一道障碍——能写任何东西，反而更难下笔。现在提交页面下方会展示三张随机候选卡片，比如"PB 低于 1 的破净股具备估值修复 alpha"或者"近 8 季 ROE 均值高且波动率低的公司具备质量 alpha"。点哪张就填哪张，改不改由你。不喜欢这三张就整批换掉。池子里目前有 50 多条，覆盖估值、质量、成长、动量反转、波动、现金流、研发等常见因子族。

FactorLab 提交页面与候选想法卡片 — 提交页面。输入框下方三张随机候选卡片，点击填入、改了提交，或者忽略自己写。右侧历史记录里能看到已完成、已取消、失败的真实任务状态。

微信入口。 FactorLab 目前通过 Peakstone Labs 微信公众号提供服务。微信搜索 Peakstone Labs，发任意消息获取入口链接，输入你的研究想法，然后离开页面去忙别的。任务完成后回来查看，或者在公众号发消息让机器人把链接发回给你。（每账号每天 5 次限额，Web 直链入口在路线图上。）

四、为什么是现在

坦白说：两件事同时成真了。

第一，LLM 的代码能力真的可用了。 三年前让模型写一个因子表达式还可以，但读自己的回测结果、判断具体的失效模式、提出可执行的改进方案——这是另一个量级的任务。直到最近一代模型（Claude Opus 4.6+ 级别）才真正做到"能看懂自己的实验结果，并做出合理的下一步决策"。

第二，数据基础设施成熟了。 FactorLab 背后是 Peakstone Labs 自建的 QuantDataHub——28 个 MCP 工具、23 个数据类别、PIT 财务数据、四层缓存架构。没有这个数据层，AI 每跑一轮回测都要和原始 API 打交道，速度和稳定性都撑不住迭代。迭代循环能闭合，是因为数据层足够快、足够稳。

原来一份因子研报要占用一个量化团队好几天。现在你打一句话，等几分钟。

← Back to Insights