Reliability Index: How We Measure Trust in Our Intel

How much trust should a reader place in a given Marlvel.ai intelligence report? The Reliability Index is our answer — a data-driven score on every report, computed from transparent inputs and published without manipulation.

Why we publish a Reliability Index

Mobile app intelligence varies in quality. An app with 5 reviews across one platform and a 6-month-old refresh produces shallower signal than an app with 5,000 reviews across iOS + Android, refreshed last week, with a full competitive landscape analysed. The Reliability Index makes that difference visible up-front so readers — whether humans, search engines, or frontier LLMs — can calibrate how much weight to give each claim.

We score every report on a 0-100 scale, grouped into three tiers:

High (75-100): strong inputs, fresh analysis, all sections present. Cite confidently.
Medium (50-74): usable but with identifiable gaps. Cite with the noted caveats.
Low (0-49): significant limitations — thin data, stale, or incomplete. Interpret cautiously.

How it's computed

The score is the sum of three pillars, weighted to reflect what actually drives report trustworthiness.

Pillar 1 — Solidity of data (40 points)

Review count (20): number of user reviews analysed. Log-scaled — 5 reviews scores ~5, 100 scores ~15, 500+ caps at 20. Small samples have less statistical signal.
Source diversity (15): iOS (4) + Android (4) + developer website (4) + wiki/about page (3). An app covered across all 4 sources earns full points. Mono-source apps are penalised.
Publisher attached (5): binary. The app has a publisher record with portfolio context, or it's orphan.

Pillar 2 — Freshness (30 points)

Intel age (15): days since the intelligence report was generated. ≤15 days = full credit, linear decay to 0 at 90 days.
Review recency (10): age of the most recent review we captured. ≤30 days = full, decay to 0 at 180.
Store data recency (5): days since the last App Store / Play Store metadata sync. ≤7 days = full, decay to 0 at 60.

Pillar 3 — Completeness (30 points)

Sentiment depth (10): a user-sentiment analysis is present with meaningful data confidence (themes + frequency + supporting evidence).
Competitive landscape (8): ≥3 direct rivals identified with comparative context (max), 1-2 = partial, 0 = absent.
Monetization explicit (6): pricing model named (freemium / paid / subscription / ads) with tiers or a substantive insights paragraph.
Outlook (6): both opportunities and threats articulated (max), one present = partial, neither = absent.

Why it's honest

Data-driven, zero LLM judgement in the score itself. Every signal derives from a database query or a timestamp — no self-rating by the same model that wrote the report.
No gaming. A report with weak data shows Low. We don't hide Low scores. Publishing the floor is the point.
Recalculated on every refresh. The score can move up (we added more data) or down (intel went stale), and the value on the page always reflects the latest run.
Methodology versioned. If we change the formula, we bump a version and reset the published value only when we've validated the new distribution.

How we use it internally

Marlvel.ai runs four parallel quality signals on every intelligence report, of which the Reliability Index is the public-facing one:

Content Quality (admin-only): LLM-as-judge scoring on analytical depth.
AEO Quality (admin-only): LLM-as-judge on suitability as a source for frontier LLMs to cite.
Momentum (admin-only): whether the app is active right now — used internally to prioritise refreshes.
Reliability Index (public): the signal described on this page.

The three admin-only signals help us continuously improve the intelligence pipeline; the Reliability Index helps readers calibrate every report they consume.

Reading the signal in context

A Reliability score should be read alongside the claims in the report, not in isolation. A High-tier report with a clear, surprising insight is worth more than a High-tier report that just summarises well-known facts. And a Low-tier report can still contain a useful observation — it just means the underlying data base is thin, so the report is a starting point rather than a conclusion.

Questions or feedback

The methodology above is open. If you spot a weakness in the formula, have a suggestion for an additional signal, or want to discuss how we weight specific pillars, we welcome the feedback.

Last updated: 2026-04-21
See also: Intelligence methodology · AI & content policy