Synthetic Alpha: 10 Due-Diligence Tests That Expose LLM-Washing in Quant Pitch Decks

A practical allocator playbook to detect LLM-washing in quant pitches: 10 DDQ questions, red flags, and what real AI integration looks like.

11 min read | Feb 23, 2026

A lot of “AI quant” decks in 2026 are marketing decks with extra nouns. The claims sound technical. The evidence is thin. And the slide count goes up every time the Sharpe goes down.

The allocator’s job isn’t to decide whether large language models (LLMs) are “the future.” It’s to determine whether a manager has a repeatable process advantage — or just a convenient story that fits the current fundraising weather.

Here’s the uncomfortable truth from IC rooms: if an LLM is genuinely additive to alpha, the manager can explain where it sits in the pipeline, how it’s tested, and how it’s controlled without hiding behind “proprietary.” If they can’t, you’re looking at synthetic alpha: returns implied by narrative.

This is a cynical, practical playbook for spotting LLM-washing (inflating or misrepresenting LLM use) in quant pitches. It includes a mini-DDQ: 10 questions with what good answers sound like — and what should make you tighten terms or walk.

What “real” looks like

Before you read the deck, force a manager into one of three buckets:

LLM for operations (research, coding copilots, document parsing, client reporting)
LLM for data (turning messy text into structured features that feed classic models)
LLM for decisions (LLM output directly influences risk, sizing, or trading)

Bucket #2 can be real and valuable. Bucket #1 can improve throughput but rarely creates an allocatable edge by itself. Bucket #3 is where the burden of proof gets heavy, fast.

If a manager can’t tell you which bucket(s) they’re in — clearly and quickly — one can assume the LLM is decorative until proven otherwise.

An LLM is a model trained to predict and generate text. In investment workflows it’s usually best at language tasks: extracting entities, classifying documents, summarizing, translating, and generating structured outputs from messy text. It is not, by default, a magic signal machine. Treat any “LLM predicts returns” claim with a pinch of salt until stress tested.

Where LLMs actually fit in a quant stack

1) LLMs as a text-to-structure layer (the common “real” use case)

The manager ingests unstructured information (earnings call transcripts, regulatory filings, broker research they’re licensed to use, news feeds, internal notes). The LLM turns it into structured variables:

sentiment scores by entity (company, sector, executive)
topic and controversy flags
event tagging (guidance change, litigation, product recall)
supply chain or customer mentions mapped to tickers
consistency checks across sources

Then those features go into a familiar modeling world: regularized regression, gradient boosting, sequence models, portfolio construction with constraints, execution models.

If you’ve allocated to systematic NLP strategies pre-LLM, this is the evolution: cheaper feature generation and broader coverage — plus new failure modes.

2) LLMs for retrieval and analyst efficiency (helpful, not automatically “alpha”)

Some shops use retrieval-augmented generation (RAG): the model answers questions based on a controlled document set. This is great for:

speeding up research notes
drafting code scaffolding (with review)
summarizing filings and calls
triaging compliance and ops documents

But efficiency is not a moat unless it changes the investment decision in a measurable way. The right framing is: “It reduces latency or increases coverage,” then show how that translates into portfolio impact without blowing up turnover, costs, or crowding.

3) LLMs in the decision loop (rarely clean)

This is where decks can get grandiose: “agentic trading,” “autonomous portfolio manager,” “LLM-driven allocation.” Possible, but hard to do responsibly. If LLM output affects positions, you need clear controls: bounded actions, audit logs, fallback behavior, kill switches, and a validation regime that an independent risk function can defend.

If the pitch leans heavily into autonomy, your diligence should lean heavily into governance.

The classic tells of LLM-washing

Language tells

Verb fog: “leverage,” “harness,” “utilize” with no object. Leverage what, specifically?
Category errors: conflating “machine learning,” “AI,” and “LLMs” as synonyms.
Claims without a baseline: “improved forecasts” without “versus what” and “by how much,” measured out of sample.
Vendor-logo theatrics: a slide full of cloud/AI logos instead of an architecture diagram.

Economic tells

Compute doesn’t match the AUM. Training frontier models is not a side project. If a small shop claims they trained a large model from scratch, ask for compute spend, hardware, and training time.
Data rights are hand-waved. “We trained on proprietary data” can mean “we scraped the internet and hope Legal never asks.”

Process tells

No model inventory. If they run “dozens of models,” they should have a list and an owner for each.
No monitoring story. If the model drifts, what happens? “We retrain regularly” is not a clear plan.
No reproducibility. If they can’t recreate last quarter’s signal set from versioned data and code, you’re underwriting vibes.

What can go wrong

1) Hidden look-ahead bias via text pipelines

Text data is a minefield for point-in-time integrity. Vendors revise transcripts. Filings get amended. News gets updated. Even timestamps can be messy across feeds.

If an LLM is used to “interpret” text, you also have model versioning risk: the same prompt can produce different outputs across model updates. That can quietly contaminate backtests.

Allocator implication: if they can’t prove point-in-time data lineage, treat back-test statistics more as marketing, and less as evidence.

2) Hallucinations and silent errors

LLMs can generate plausible-sounding nonsense. That’s not a scandal; it’s a known characteristic. In an investment process, the danger is silent error: outputs that look structured and confident, then flow into downstream models.

Allocator implication: you want hard checks (schemas, confidence thresholds, anomaly detection, human review on sampled outputs) and strong “do no harm” defaults.

3) Crowding risk disguised as innovation

If the “edge” is generic (public transcripts + generic embedding model + basic sentiment), the alpha may be real in a back-test and disappointing in size. LLM tooling is increasingly commoditized. That doesn’t kill all opportunity, but it shrinks it.

Allocator implication: spend more time on capacity, turnover, and execution than on the AI story.

4) Model risk management and accountability

When a process becomes model-heavy, governance matters. Banks have lived this for years; the Federal Reserve’s SR 11-7 is still the cleanest plain-English articulation of model risk management principles: robust development, effective validation, and sound governance.

Allocator implication: you’re not asking a hedge fund to be a bank. You are asking them to act like an institution that expects to be audited.

5) Data privacy and vendor exposure

If a manager uses third-party LLM APIs, you need to know what data leaves the firm, under what terms, and how it’s logged. This isn’t theoretical. Regulators are explicitly looking at AI supervision and controls.

Allocator implication: treat vendor risk like prime brokerage risk. Concentrations and weak controls show up at the worst time.

Regulators are paying attention to the marketing, not your architecture diagram

Two points matter for allocators:

Misstatements about AI are being treated like any other misleading marketing claim.
The SEC charged two investment advisers in March 2024 for false and misleading statements about their use of AI, including inflated claims about capabilities and services.
Exams are explicitly looking at AI representations and oversight.
The SEC’s FY 2026 Examination Priorities flag AI as a focus area, including the accuracy of registrant representations and whether firms have adequate policies and procedures to supervise AI use.

Tie this back to your diligence: if a manager is waving “LLM-powered” around in a deck, they should be able to substantiate it. That word is doing a lot of work in real examinations, especially under the Marketing Rule framework and staff guidance around compliance expectations.

Outside the U.S., the posture is similar: focus on governance and investor protection rather than creating AI-specific rules for every use case. The FCA has published its approach to AI under existing regulatory expectations and continues to push safe experimentation (including sandbox-style initiatives). ESMA has issued guidance on AI use in investment services, emphasizing firms’ ongoing responsibilities even when AI tools are involved. IOSCO’s work frames the risk landscape—model risk, data issues, and governance—at a capital markets level.

And if you allocate in Europe, you can’t ignore the evolving compliance perimeter around AI more broadly. The EU’s AI Act framework and timeline have been a moving target, with public signaling on implementation schedules and potential adjustments.

The DDQ mini template: 10 questions managers can’t dodge

Use these in live meetings. Then ask for written answers in the DDQ. You’re looking for consistency.

1) “Where does the LLM sit in the investment process?”

Good answer: A clear boundary: “LLM creates structured features from filings; portfolio model is separate; LLM never places trades.”
Worry: “It’s embedded throughout the stack,” with no diagram or control points.

2) “What exactly is the model — vendor, version, and hosting?”

Good answer: Named models (or internal) with version control, hosting setup, and a rationale (latency, privacy, cost).
Worry: “We use the latest models,” implying they don’t control change risk.

3) “Show me the data lineage for one core dataset end-to-end.”

Good answer: A map: source → ingestion → storage → preprocessing → labeling → features → model inputs, with timestamps and versioning.
Worry: “The vendor handles point-in-time,” with no independent verification.

4) “How do you prevent look-ahead and contamination in text?”

Good answer: Explicit point-in-time snapshots, vendor revision handling, delayed availability assumptions, and immutable archives.
Worry: “We backtest on transcripts,” without addressing revisions and availability timing.

5) “What is your evaluation protocol beyond the backtest?”

Good answer: Walk-forward testing, out-of-sample periods, regime slicing, stress tests, and pre-defined acceptance metrics.
Worry: “The backtest is robust,” and nothing else.

6) “How do you measure whether the LLM adds value versus a simpler baseline?”

Good answer: A/B comparisons: classic NLP vs LLM features; incremental information ratio; impact on turnover and costs.
Worry: No baseline, or baselines that are strawmen.

7) “What breaks this system?”

Good answer: Named failure modes (model drift, feed outages, prompt injection, distribution shift) and the mitigation plan.
Worry: “The model adapts,” which is not a control.

8) “What monitoring runs in production, and who gets paged?”

Good answer: Drift metrics, data quality checks, schema validation, exception queues, and defined owners.
Worry: “We monitor performance,” with no operational detail.

9) “What’s your governance: approvals, validation, and audit trail?”

Good answer: Model inventory, independent validation (internal or external), change management, and auditable logs for outputs used in investment decisions.
Worry: “We’re a small team,” used as an excuse for missing controls.

10) “What can you substantiate if a regulator or LP asks tomorrow?”

Good answer: Documentation pack: policies, procedures, evidence of testing, and marketing claim substantiation. (If they’re an SEC-registered adviser, this is not optional in spirit.)
Worry: Marketing claims that outpace their ability to evidence. The SEC has already brought cases centered on misleading AI claims and inability to substantiate related assertions.

IC meeting tip: ask for one slide they probably don’t have: “Architecture + control points.” Real builders can produce it in a day. Storytellers can’t.

Allocator takeaways

Define the bucket. Ops, data, or decision loop. Different risks. Different evidence burden.
Insist on lineage. If point-in-time integrity and versioning aren’t real, the backtest is a brochure.
Ask for baselines. “LLM added value” only matters relative to simpler alternatives and net of costs.
Underwrite controls, not adjectives. Monitoring, drift management, human oversight, and kill switches.
Translate to portfolio terms. Capacity, turnover, hidden factor tilts, and drawdown behavior matter more than model names.
Treat “AI” marketing as examinable. Regulators are explicitly focused on AI representations and supervision.
Price the operational risk. Vendor exposure, data privacy, and reproducibility should affect sizing and terms.
Reward specificity. The best managers are usually less theatrical. They’re concrete.

Resonanz insights in your inbox...

Get the research behind strategies most professional allocators trust, but almost no-one explains.