Methodology.
Every claim score is an output of a small, auditable system: a persistent evidence corpus, a scraped-and-annotated post corpus, and a deterministic scoring function. This page shows where each piece comes from so reviewers can challenge any individual decision.
01Ingestion
Posts are scraped directly from X using a persistent-profile Playwright
session (a personal account logged in once and reused; no X API).
Each scraped post records the original source_url and an
archive_url pointing at web.archive.org for that exact post.
Scraping a handle takes ~30–60 seconds and returns the latest 40 posts,
plus profile metadata.
02Claim extraction
For each post we extract a small structured object: claim_family,
speaker_stance (supports / refutes), strength_language,
scope_language, citation_signal,
absolutes_present, anecdote_signal,
acknowledges_uncertainty, acknowledges_conflicting,
sells_matching_product.
In the current iteration these are annotated by hand (Opus 4.7 in-session)
so we can iterate fast on the algorithm. The production path is one
claude -p call per account with an Opus model, emitting the
same structured JSON. Both paths feed the same downstream scorer.
03Evidence cards
The truth corpus is a single evidence/cards.json file. Each
card declares one normalized claim family with:
- evidence_level (H1–H5)
- direction (supports / contradicts / mixed / insufficient)
- certainty_modifiers (GRADE-style: risk_of_bias, inconsistency, indirectness, imprecision, publication_bias)
- scope (population, intervention, outcome, not_supported_for)
- safety_flag (low / low_to_moderate / moderate / high)
- commercial_sensitivity (generic / supplement-sold-frequently / procedure-sold / protocol-sold)
- sources (list of
{type, url, label}citations)
04Source hierarchy
Sources are prioritized in this order when assigning a level:
- Major guidelines — USPSTF, ACC/AHA, ESC/EAS, WHO, FDA, CDC.
- Cochrane systematic reviews — structured certainty grading.
- Landmark RCTs — NEJM / Lancet / JAMA class; large, blinded.
- High-quality meta-analyses / reviews — independent, pre-registered where possible.
- Large prospective cohorts — Framingham, UK Biobank, PURE, NHS/HPFS, EPIC.
- Regulator advisories — FDA / FTC / EFSA / NIEHS / NIH ODS fact sheets.
05Domain coverage (v1)
06Scoring
See Algorithm for the per-claim 100-point
rubric (A–F components) and the account aggregation formula. The two hard
caps are documented there too. None of the scoring is hidden in a model
prompt — it is deterministic Python in pipeline/score.py and
pipeline/aggregate.py.
07How to challenge a score
- Open the account's Analyzer page and find the claim whose score you dispute.
- Click View on X to confirm the post text we analyzed.
- Click Archive snapshot to confirm the post existed and we didn't miswrite its text.
- Look at the card's Sources list. If we mis-graded the evidence, cite a stronger source.
- Look at the per-component breakdown (A–F). If you disagree with scoping/strength/uncertainty annotation, cite the exact wording in the post.
Every score is a linear combination of transparently-extracted features
against a published evidence card. If an input is wrong, the fix is
visible in the diff to cards.json or the post annotation.
08Known limitations (v1)
- Post extraction scoped to the last 40 tweets per account; long-tail history not considered.
- Reposts detected but not scored (endorsement modeling is future work).
- Long threads collapsed into single-tweet scope.
- Media (images, video) is ignored — many substantive claims live in podcast clips and threads rather than single tweets.
- The commercial-overlap map is maintained manually per account. At scale this should derive from bio / linktree scrapes.
- "Anti-establishment rhetoric" is not a separate detector; the H1-with-selling cap partially approximates it.