Methodology
How we calibrate — and what we deliberately don't claim.
A score is only useful if you trust the method behind it. This page documents how Trigvale actually works: what the rubric measures, how the evidence pipeline pulls real-world signals, what calibration target we hold ourselves to, and — critically — the kind of claim we refuse to make even though it would look good in marketing.
The rubric
Every idea is scored as a (VRS) — a 0–100 number across 10 weighted dimensions. The model scores each dimension against fixed 0/25/50/75/100 anchors that we publish in the project outline. The final number is then computed in code from the per-dimension inputs, so the math is stable across runs. The model never emits the final score directly.
The kill / pivot / test / build verdict is also computed in code from a small set of explicit gates. "Build" requires a conjunction of conditions and is rare by construction. We'd rather under-issue Build than over-issue it.
The dimensions and how each one is graded are deliberately public. Hiding them would gain us almost nothing — a frontier model can replicate a rubric from a one-shot prompt — and lose the trust signal that comes from publishing what we measure. The moats are the live evidence pipeline, the founder skill graph, and being the validation step inside your build agent — not the rubric itself.
Listed in priority order — the first dimension contributes the most to the final VRS, the last contributes the least. Each dimension is scored on a 0 / 25 / 50 / 75 / 100 ladder. The model picks the anchor description that best matches the evidence; the score is one of those five steps, not a freeform number. Below: every step on every ladder, in plain English.
Pain intensity
01 / 10How severe and frequent the pain is. Listed first because everything else is downstream of real pain.
- 0No real pain — nice-to-have at best.
- 25Mild annoyance, infrequent.
- 50Noticeable pain, workarounds exist.
- 75Acute pain, users actively seek solutions.
- 100Top-3 daily pain — users already pay for bad workarounds.
Buyer urgency
02 / 10How quickly the buyer will spend money. Urgency is what turns interest into revenue.
- 0No urgency — budget would never open.
- 25Someday-maybe; no deadline pressure.
- 50This-quarter priority if someone champions it.
- 75Active evaluation; budget allocated.
- 100Buyer is shopping right now with a hard deadline.
Buyer reachability
03 / 10How cheaply a solo founder can reach the first 100 buyers. Calibrated against your declared distribution graph.
- 0Buyers are invisible or locked behind gatekeepers.
- 25Diffuse — needs a big marketing engine to reach.
- 50Reachable via 2–3 channels with effort.
- 75Concentrated in known communities or lists.
- 100Founder already has direct access to this audience.
Differentiation / wedge
04 / 10How sharp and defensible the wedge is. Without one, every other dimension is rented.
- 0No wedge — undifferentiated.
- 25Weak wedge; easy to copy.
- 50Clear angle but not defensible long-term.
- 75Strong wedge tied to a specific segment or insight.
- 100Durable wedge grounded in an unfair founder advantage.
Monetization clarity
05 / 10Whether comparable buyers already pay adjacent tools at the target price.
- 0No evidence anyone will pay.
- 25Plausible willingness-to-pay, but unproven.
- 50Adjacent tools show WTP; price unclear.
- 75Clear price comp; >$25/mo seems plausible.
- 100Buyers already pay competitors at the target price.
Founder fit
06 / 10Whether this founder has the skills, audience, and energy to win here vs. a generic builder. Verified GitHub signal outranks self-claim.
- 0Founder has no relevant edge.
- 25Transferable skills only.
- 50Some edge but no audience or distribution.
- 75Strong edge — domain + skills aligned.
- 100Unfair advantage: audience, domain, history.
Competitive landscape
07 / 10How crowded the space is — inverted, so greenfield scores high.
- 0Dominated by incumbents with strong lock-in.
- 25Crowded; differentiation hard.
- 50Several players, room for a sharp wedge.
- 75Thin competition; clear positioning gaps.
- 100Greenfield or only weak / generic alternatives.
Evidence quality
08 / 10How much of the case is observed evidence vs. founder claims and AI inference.
- 0Pure speculation; no external signals.
- 25Anecdotal; one or two conversations.
- 50Some observed signals (forums, reviews, search).
- 75Multiple independent observed signals.
- 100Direct pre-orders or paid pilots.
Market size (proxy)
09 / 10Whether the reachable buyer pool can support the business at the intended price. Lighter on purpose — small-but-reachable beats big-but-crowded for solo founders.
- 0Market too small to matter.
- 25Niche that caps at < $100k ARR.
- 50$100k – $1M ARR ceiling.
- 75Clear $1M – $10M ARR ceiling.
- 100$10M+ ceiling with a reachable wedge.
Execution complexity (inverse)
10 / 10How big the technical, security, and operational surface is — not how much code it takes. Agentic stacks (Claude Code, Codex, Lovable, Cursor) absorb the line-of-code burden, so the real bottleneck is the DevSecOps surface a solo founder must own (auth, multi-tenant isolation, secrets, compliance, on-call). Higher score = smaller, safer surface that an agent plus a security-aware founder can ship without leaking data.
- 0Novel infra, real-time guarantees at scale, regulated multi-tenant data, hardware, or research-grade ML. Agentic coding doesn't shortcut the hard parts.
- 25Heavy compliance or large security surface (PII at scale, payments, custom protocols, strict multi-tenant isolation). Agents produce code fast — every line needs deep DevSecOps review.
- 50Standard SaaS surface — auth, billing, multi-tenancy, basic observability. Agentic stack accelerates the build; founder still owns security and ops.
- 75Well-trodden patterns; small security surface (single-tenant, third-party auth, mostly read-only). Agentic v1 in days with a light security review.
- 100Single-user tool, thin wrapper over an existing API, or static site. Founder + agent ship in a weekend with minimal security and ops surface.
Each dimension also has a critical floor — a per-dimension minimum below which the verdict cannot reach Build, regardless of the overall VRS. The model commits to one of the five anchors; the final VRS is then computed in code from the per-dimension inputs, so the math is stable across runs.
Live evidence sources
For every idea, Trigvale enriches the brief with real-world signals pulled on a schedule. These are kept separate from AI-generated or user-claimed evidence — sources are tagged in the brief and rendered in distinct sections so you can see what is observed versus inferred.

| Source | Signal | Cadence | Status |
|---|---|---|---|
| Post cadence in r/all for the wedge phrase, last 90 days | On evaluate + every 6h refresh | Live | |
| GitHub | Repository count + top-repo stars for the wedge phrase | On evaluate + every 6h refresh | Live |
| Google Trends | Search-volume curve for the wedge phrase | Weekly, 5-year window | Roadmap |
| Hacker News | Story search via Algolia HN, last 180 days | On evaluate + every 6h refresh | Live |
| Stack Overflow | Question search by intitle, last 180 days | On evaluate + every 6h refresh | Live |
| dev.to | Articles tagged near the wedge phrase, last 90 days | On evaluate + every 6h refresh | Live |
| Product Hunt | Launches + upvotes in the same archetype | Daily, 12-month window | Roadmap |
| Competitor pricing pages | Diffs over 90 days for named competitors | Weekly capture | Roadmap |
Live sources fire on every evaluate + every 6h via a background refresh; roadmap sources aren't implemented yet. Each brief shows which sources were available at scoring time.
Inter-rater agreement target
We hold our scoring against a hand-curated golden set of ideas with human-rated reference scores. The eval-harness gate is: verdict agreement ≥ 85% and VRS mean absolute error ≤ 8 across the set. A model upgrade or rubric change that misses the gate does not ship to users.
When you disagree with a score
Disagreement is information. Every brief lets you flag a dimension you think is mis-scored, with a short reason. Flags feed the calibration loop — they refine the anchors, surface ambiguity in the rubric, and (over time) personalise how your declared skill graph reweights specific dimensions. Disagreement does not retroactively change the verdict on the brief you flagged; it improves future briefs.
What we deliberately do not claim
The most honest thing on this page is what isn't on it. You will never see Trigvale publish:
- "Ideas at VRS X reach $Y MRR at Z%."
- "Trigvale was right N% of the time."
- "X% of Build verdicts succeed within Y months."
- "Our verdict correlates with founder success at Y%."
An idea's outcome is dominated by founder execution, marketing, distribution, timing, and luck. The rubric score is one term in a very noisy equation. With realistic data volumes (hundreds, not millions), we cannot disentangle idea quality from those confounders, so any predictive-accuracy claim would be statistically dishonest. Trigvale's job is to expose assumptions, not to forecast success.
Audit trail
Every saved brief preserves the rubric version, the model version, the evidence sources at the time of scoring, and the score breakdown. If we update the rubric or change a model, your past briefs continue to render under the conditions they were scored in, with a marker if a re-score would now produce a different result.