Methodology

How we calibrate — and what we deliberately don't claim.

A score is only useful if you trust the method behind it. This page documents how Trigvale actually works: what the rubric measures, how the evidence pipeline pulls real-world signals, what calibration target we hold ourselves to, and — critically — the kind of claim we refuse to make even though it would look good in marketing.

The rubric

Every idea is scored as a (VRS) — a 0–100 number across 10 weighted dimensions. The model scores each dimension against fixed 0/25/50/75/100 anchors that we publish in the project outline. The final number is then computed in code from the per-dimension inputs, so the math is stable across runs. The model never emits the final score directly.

The kill / pivot / test / build verdict is also computed in code from a small set of explicit gates. "Build" requires a conjunction of conditions and is rare by construction. We'd rather under-issue Build than over-issue it.

The dimensions and how each one is graded are deliberately public. Hiding them would gain us almost nothing — a frontier model can replicate a rubric from a one-shot prompt — and lose the trust signal that comes from publishing what we measure. The moats are the live evidence pipeline, the founder skill graph, and being the validation step inside your build agent — not the rubric itself.

Listed in priority order — the first dimension contributes the most to the final VRS, the last contributes the least. Each dimension is scored on a 0 / 25 / 50 / 75 / 100 ladder. The model picks the anchor description that best matches the evidence; the score is one of those five steps, not a freeform number. Below: every step on every ladder, in plain English.

Pain intensity

01 / 10

How severe and frequent the pain is. Listed first because everything else is downstream of real pain.

0No real pain — nice-to-have at best.
25Mild annoyance, infrequent.
50Noticeable pain, workarounds exist.
75Acute pain, users actively seek solutions.
100Top-3 daily pain — users already pay for bad workarounds.

Buyer urgency

02 / 10

How quickly the buyer will spend money. Urgency is what turns interest into revenue.

0No urgency — budget would never open.
25Someday-maybe; no deadline pressure.
50This-quarter priority if someone champions it.
75Active evaluation; budget allocated.
100Buyer is shopping right now with a hard deadline.

Buyer reachability

03 / 10

How cheaply a solo founder can reach the first 100 buyers. Calibrated against your declared distribution graph.

0Buyers are invisible or locked behind gatekeepers.
25Diffuse — needs a big marketing engine to reach.
50Reachable via 2–3 channels with effort.
75Concentrated in known communities or lists.
100Founder already has direct access to this audience.

Differentiation / wedge

04 / 10

How sharp and defensible the wedge is. Without one, every other dimension is rented.

0No wedge — undifferentiated.
25Weak wedge; easy to copy.
50Clear angle but not defensible long-term.
75Strong wedge tied to a specific segment or insight.
100Durable wedge grounded in an unfair founder advantage.

Monetization clarity

05 / 10

Whether comparable buyers already pay adjacent tools at the target price.

0No evidence anyone will pay.
25Plausible willingness-to-pay, but unproven.
50Adjacent tools show WTP; price unclear.
75Clear price comp; >$25/mo seems plausible.
100Buyers already pay competitors at the target price.

Founder fit

06 / 10

Whether this founder has the skills, audience, and energy to win here vs. a generic builder. Verified GitHub signal outranks self-claim.

0Founder has no relevant edge.
25Transferable skills only.
50Some edge but no audience or distribution.
75Strong edge — domain + skills aligned.
100Unfair advantage: audience, domain, history.

Competitive landscape

07 / 10

How crowded the space is — inverted, so greenfield scores high.

0Dominated by incumbents with strong lock-in.
25Crowded; differentiation hard.
50Several players, room for a sharp wedge.
75Thin competition; clear positioning gaps.
100Greenfield or only weak / generic alternatives.

Evidence quality

08 / 10

How much of the case is observed evidence vs. founder claims and AI inference.

0Pure speculation; no external signals.
25Anecdotal; one or two conversations.
50Some observed signals (forums, reviews, search).
75Multiple independent observed signals.
100Direct pre-orders or paid pilots.

Market size (proxy)

09 / 10

Whether the reachable buyer pool can support the business at the intended price. Lighter on purpose — small-but-reachable beats big-but-crowded for solo founders.

0Market too small to matter.
25Niche that caps at < $100k ARR.
50$100k – $1M ARR ceiling.
75Clear $1M – $10M ARR ceiling.
100$10M+ ceiling with a reachable wedge.

Execution complexity (inverse)

10 / 10

How big the technical, security, and operational surface is — not how much code it takes. Agentic stacks (Claude Code, Codex, Lovable, Cursor) absorb the line-of-code burden, so the real bottleneck is the DevSecOps surface a solo founder must own (auth, multi-tenant isolation, secrets, compliance, on-call). Higher score = smaller, safer surface that an agent plus a security-aware founder can ship without leaking data.

0Novel infra, real-time guarantees at scale, regulated multi-tenant data, hardware, or research-grade ML. Agentic coding doesn't shortcut the hard parts.
25Heavy compliance or large security surface (PII at scale, payments, custom protocols, strict multi-tenant isolation). Agents produce code fast — every line needs deep DevSecOps review.
50Standard SaaS surface — auth, billing, multi-tenancy, basic observability. Agentic stack accelerates the build; founder still owns security and ops.
75Well-trodden patterns; small security surface (single-tenant, third-party auth, mostly read-only). Agentic v1 in days with a light security review.
100Single-user tool, thin wrapper over an existing API, or static site. Founder + agent ship in a weekend with minimal security and ops surface.

Each dimension also has a critical floor — a per-dimension minimum below which the verdict cannot reach Build, regardless of the overall VRS. The model commits to one of the five anchors; the final VRS is then computed in code from the per-dimension inputs, so the math is stable across runs.

Live evidence sources

For every idea, Trigvale enriches the brief with real-world signals pulled on a schedule. These are kept separate from AI-generated or user-claimed evidence — sources are tagged in the brief and rendered in distinct sections so you can see what is observed versus inferred.

Trigvale funnel: raw ideas narrowing through evidence gates into one clean verdict

Live evidence

Reddit · 247 posts / 90d

GitHub · 1.2k repos · 18k★ top

Calibrated for you

Founder fit +11

Reachability −5

Sample

test

Archetype solo-dev-tools

Source	Signal	Cadence	Status
Reddit	Post cadence in r/all for the wedge phrase, last 90 days	On evaluate + every 6h refresh	Live
GitHub	Repository count + top-repo stars for the wedge phrase	On evaluate + every 6h refresh	Live
Google Trends	Search-volume curve for the wedge phrase	Weekly, 5-year window	Roadmap
Hacker News	Story search via Algolia HN, last 180 days	On evaluate + every 6h refresh	Live
Stack Overflow	Question search by intitle, last 180 days	On evaluate + every 6h refresh	Live
dev.to	Articles tagged near the wedge phrase, last 90 days	On evaluate + every 6h refresh	Live
Product Hunt	Launches + upvotes in the same archetype	Daily, 12-month window	Roadmap
Competitor pricing pages	Diffs over 90 days for named competitors	Weekly capture	Roadmap

Live sources fire on every evaluate + every 6h via a background refresh; roadmap sources aren't implemented yet. Each brief shows which sources were available at scoring time.

Inter-rater agreement target

We hold our scoring against a hand-curated golden set of ideas with human-rated reference scores. The eval-harness gate is: verdict agreement ≥ 85% and VRS mean absolute error ≤ 8 across the set. A model upgrade or rubric change that misses the gate does not ship to users.

When you disagree with a score

Disagreement is information. Every brief lets you flag a dimension you think is mis-scored, with a short reason. Flags feed the calibration loop — they refine the anchors, surface ambiguity in the rubric, and (over time) personalise how your declared skill graph reweights specific dimensions. Disagreement does not retroactively change the verdict on the brief you flagged; it improves future briefs.

What we deliberately do not claim

The most honest thing on this page is what isn't on it. You will never see Trigvale publish:

"Ideas at VRS X reach $Y MRR at Z%."
"Trigvale was right N% of the time."
"X% of Build verdicts succeed within Y months."
"Our verdict correlates with founder success at Y%."

An idea's outcome is dominated by founder execution, marketing, distribution, timing, and luck. The rubric score is one term in a very noisy equation. With realistic data volumes (hundreds, not millions), we cannot disentangle idea quality from those confounders, so any predictive-accuracy claim would be statistically dishonest. Trigvale's job is to expose assumptions, not to forecast success.

Audit trail

Every saved brief preserves the rubric version, the model version, the evidence sources at the time of scoring, and the score breakdown. If we update the rubric or change a model, your past briefs continue to render under the conditions they were scored in, with a marker if a re-score would now produce a different result.