Reference

Methodology

Plain-language guide to how the dashboard works. For formulas, edge cases, and design tradeoffs, see TECHNICAL-REFERENCE.md.

In one sentence

See which industries define your region and how current news is stressing or sparing them — and why.

The app answers three questions for a US state:

Structure — What does this economy depend on? (Census employment + specialization)
Events — What are people reading about, and does it touch those sectors? (GDELT + local RSS)
Effect — How is news pressure showing up, without flickering every refresh? (stored industry health)

What you see on screen (two layers)

The map overlay stacks macro context on top of industry intelligence. They use different data — do not read the GDP bar as “manufacturing GDP.”

┌─────────────────────────────────────────┐
│  REGION PICKER + STABILITY GAUGE         │  ← BLS + BEA macro (state-wide)
│  GDP · Unemployment · Inflation         │     relative 0–100 scores
├─────────────────────────────────────────┤
│  REGIONAL INDUSTRY PANEL                  │  ← Census CBP (structure)
│  · Specialized sectors (LQ ≥ 1.2)       │     + news shocks (health bars)
│  · Industry health 0–100 per sector       │
│  · Underrepresented sectors (LQ ≤ 0.8)    │
└─────────────────────────────────────────┘
         News feed / headlines below
         Score A on cards · Score B in panel

| What you see | What it means | Source | |--------------|---------------|--------| | Overall gauge (top) | State-wide economic context + news blend | BLS, BEA, GDELT/RSS | | GDP / Unemployment / Inflation bars | How this state ranks vs other states right now | BLS + BEA (GDP = YoY growth %, not dollar size) | | Industry health bars | Stress on your specialized sectors from news | Census structure + news shocks | | LQ & employment share | Which industries this region is built around | US Census County Business Patterns | | Impact index on news cards | How loud/intense a story cluster is today | GDELT or derived RSS metrics |

Architecture: three layers

flowchart TB
  subgraph L1["Layer 1 — Taxonomy"]
    NAICS["NAICS 2-digit sectors<br/>(Healthcare 62 = baseline)"]
  end

  subgraph L2["Layer 2 — Structure (Census)"]
    CBP["County Business Patterns<br/>employment by sector"]
    LQ["Location Quotient LQ<br/>Specialized if LQ ≥ 1.2"]
    SI["state_industries table<br/>industry_health starts at 100"]
  end

  subgraph L3["Layer 3 — News spark"]
    GDELT["GDELT events<br/>native CAMEO + Goldstein"]
    RSS["Curated RSS<br/>lexicon tone + derived CAMEO"]
    CLUSTER["Story clustering"]
    SHOCK["Shock router → subtract health<br/>once per story, then recover slowly"]
  end

  NAICS --> CBP --> LQ --> SI
  GDELT --> CLUSTER
  RSS --> CLUSTER
  CLUSTER --> SHOCK --> SI

Layer 1 is the sector vocabulary (manufacturing, healthcare, transportation, etc.).

Layer 2 is structure: Census tells us where each state is specialized. This changes slowly (monthly Census refresh).

Layer 3 is events: headlines become shocks that move industry health down when adverse, then ease back up when things quiet down.

How data flows (pipelines)

Two background jobs feed the database. The website only reads Supabase.

flowchart LR
  subgraph monthly["Monthly · economic indicators"]
    BLS["BLS unemployment<br/>+ regional CPI"]
    BEA["BEA state GDP growth"]
    EI["economic_indicators table"]
    BLS --> EI
    BEA --> EI
  end

  subgraph sixhr["Every ~6 hours · stability"]
    G["GDELT US events"]
    R["Curated RSS ingest"]
    C["Cluster + bias NLP"]
    CENSUS["Census CBP<br/>(bootstrap or CENSUS_REFRESH=1)"]
    SH["Shock router + recovery"]
    SS["stability_scores table"]
    NA["news_articles table"]

    G --> NA
    R --> NA
    NA --> C
    CENSUS --> SI2["state_industries"]
    C --> SH --> SI2
    EI --> SS
    NA --> SS
    SI2 --> SS
  end

  DB[(Supabase)] --> UI["Next.js dashboard"]
  EI --> DB
  SS --> DB
  NA --> DB
  SI2 --> DB

| Job | Command | Schedule | |-----|---------|----------| | Stability (news + scores + industry shocks) | npm run cron:stability | GitHub Actions ~every 6 hours | | Shock routing only (verify) | npm run verify:industry-shocks | Manual — skips GDELT/RSS ingest | | Economic indicators (GDP, jobs, inflation) | npm run cron:economic-indicators | 1st of month, 08:00 UTC |

Secrets for Actions: map SUPABASE_URL → NEXT_PUBLIC_SUPABASE_URL, plus BLS_API_KEY, BEA_API_KEY, optional CENSUS_API_KEY. See repo workflows under .github/workflows/.

How news affects industry health (end-to-end)

This is the full path from headline to the health bars in the industry panel — ingest, enrichment, shock writer, recovery, and UI.

flowchart TD
  subgraph ingest ["Ingest every ~6h"]
    GDELT[GDELT events]
    RSS[Curated RSS]
    NA[news_articles]
    GDELT --> NA
    RSS --> NA
  end

  subgraph enrich [Enrichment]
    CAMEO["gdelt_cameo_root OR derived_cameo_root"]
    GEO[target_state_fips]
    CLUSTER[story clustering]
    IMPACT[cluster_impact_index]
    NA --> CAMEO
    NA --> GEO
    NA --> CLUSTER
    CLUSTER --> IMPACT
  end

  subgraph shock [Shock writer]
    ROUTE[routeShocksFromRecentNews]
    DEDUP["dedupe by event_cluster_id"]
    IDEM[shocks_applied idempotency]
    HEALTH[state_industries.industry_health]
    RECOVER[runIndustryHealthDecay daily gate]
    CAMEO --> ROUTE
    GEO --> ROUTE
    IMPACT --> ROUTE
    ROUTE --> DEDUP --> IDEM --> HEALTH
    RECOVER --> HEALTH
  end

  subgraph ui [UI]
    PANEL[RegionalIndustryPanel]
    HEALTH --> PANEL
  end

Pipeline order in code: GDELT ingest → RSS ingest → story clustering → Census baseline (if needed) → shock routing → daily recovery. See scripts/update-stability.ts.

Verifying shocks in production

After deploy or migration, confirm the path is live:

npm run verify:industry-shocks   # shock router + recovery only (~1 min)
npm run cron:stability           # full pipeline (~15+ min)

Expect on first run: shocksApplied > 0, shocksLoggedTotal rising, and some state_industries rows with industry_health < 100. A second verify run should show skippedAlreadyApplied ≈ prior total and shocksApplied: 0 (idempotency).

News → industry: the path in plain English

sequenceDiagram
  participant Headline
  participant Ingest as GDELT or RSS
  participant Enrich as Tone + CAMEO + region
  participant Cluster as Group similar stories
  participant Route as Map CAMEO → NAICS sector
  participant Health as industry_health

  Headline->>Ingest: New article row
  Ingest->>Enrich: GDELT native codes OR RSS dual-path
  Enrich->>Cluster: Jaccard title similarity (~96h window, 0.18 threshold)
  Cluster->>Route: One shock per cluster (idempotent)
  Route->>Health: Subtract points if adverse;<br/>fan out to specialized states if national
  Note over Health: Recovery +5% of deficit<br/>max once per 24h after quiet

GDELT vs curated RSS (same scoring idea, different origin)

| | GDELT | Curated RSS (NBC, NPR, BBC, …) | |---|-----------|-------------------------------------| | Event type & intensity | Native CAMEO + Goldstein from event DB | Derived from headline/lead keywords | | Tone | GDELT AvgTone | Lexicon word counts → derived_avg_tone_lexicon | | Provenance label | metrics_provenance = gdelt | derived_rss | | UI rule | Show GDELT fields | Show derived_* fields; never fake zeros as real |

Both paths share the same preprocess slice (title + first ~3 sentences) for RSS enrichment so tone and event extraction see the same text.

Two scores — do not mix them up

  SCORE A (flash)              SCORE B (momentum)
  ─────────────────            ────────────────────
  "How loud is this            "How strained are this
   story cluster               state's specialized
   right now?"                 industries?"

  On news cards                In industry panel bars
  Spikes with volume           Moves slowly; recovers
  Log impact index             Stored industry_health

Score A — Cluster impact index

|avg Goldstein| × ln(articles + 1) × (1 + |avg tone|)

Higher = more articles, more conflictual/cooperative intensity, stronger tone. Shown on clusters and article cards. Every processed article row gets a cluster_impact_index (including intentional singletons with event_cluster_id = null and cluster_member_count = 1).

Cluster processing status (read path)

| Status | Meaning | |--------|---------| | clustered | event_cluster_id set and cluster_member_count > 1 | | singleton | processed one-off story (event_cluster_id null, cluster_member_count = 1) | | missing_title | cannot cluster without a title | | unprocessed | unexpected state — inspect row |

Run npm run diagnose:clustering for a read-only coverage report.

Score B — Industry health

Starts at 100 per (state, NAICS sector).
Adverse news shocks subtract points (cooperative events do not add fake “bonus health”).
After 24h quiet, recovery closes 5% of the gap to 100, at most once per day per sector.
Each logical story applies once (shocks_applied table prevents cron reruns from stacking damage).

State gauge (top of panel) — flat blend: 20% GDP growth + 25% unemployment + 25% inflation + 30% average specialized industry health. Component bars show the macro inputs only (growth, jobs, inflation).

Where each number comes from

Macro header (not Census)

| Metric | Agency | Raw signal | Normalization | |--------|--------|------------|---------------| | GDP | BEA | State GDP year-over-year growth (%) | 0–100 vs average state growth (higher growth = higher score) | | Unemployment | BLS | State unemployment rate % | Inverted 0–100 (lower rate = higher score) | | Inflation | BLS | Regional CPI YoY (4 US regions → 50 states) | Inverted 0–100 (lower inflation = higher score) |

Census does not publish GDP, inflation, or unemployment rate. It does publish employment by industry — that powers the industry panel, not the header bars.

Industry structure (Census)

| Field | Meaning | |-------|---------| | LQ (Location Quotient) | Local share of jobs in a sector ÷ national share. ≥ 1.2 = Specialized. | | emp_count / share | Jobs in that sector from County Business Patterns | | industry_health | Layer 3 only — news-driven, not from Census |

LQ formula:

LQ = (state jobs in sector / state total jobs)
   ÷ (US jobs in sector / US total jobs)

Bootstrap: first cron run pulls Census if state_industries is empty. Set CENSUS_REFRESH=1 to force a refresh.

Trust labels (provenance)

We label metrics so zeros are not mistaken for “neutral news”:

| Label | Meaning | |-------|---------| | gdelt | Goldstein, tone, and CAMEO from GDELT event row | | derived_rss | Computed from RSS text; stored in derived_* columns | | unknown | No reliable signal — UI shows — |

Display helpers in lib/article-metric-source.ts pick the right column for each row.

Rank thresholds

| Overall score | Label | |---------------|-------| | ≥ 66 | Stable | | 33 – 65 | At Risk | | < 33 | Unstable |

Scores are relative (state vs state at the same time), not an official government index.

Limitations (honest scope)

Macro lag — BLS monthly, BEA quarterly; industry health reacts faster via news.
Inflation is regional — four BLS CPI regions, not state-specific CPI.
GDELT & RSS are automated — machine coding and keyword CAMEO can miss nuance.
Consensus facts ≠ verified truth — “agreed wording” across outlets, not fact-checking.
Sparse states — few articles → gauge leans on macro; industry shocks need routable CAMEO.
Coarse CAMEO → NAICS routing — conflict events mostly map to transportation (48) or finance (52), not headline-specific sectors like manufacturing or healthcare. Health bars may move on a sector that does not match the story topic.
Two “overall” numbers — top gauge (macro blend) vs panel “Overall industry health” (specialized sectors only).

Design decisions (short)

| Topic | Choice | Why | |-------|--------|-----| | Story clustering | Title Jaccard (0.18 / 96h) | Fast, no paid embeddings on cron | | Bias / factual gating | RoBERTa on Lambda | Replaced brittle word lists (2026) | | Industry shocks | Once per cluster + DB idempotency | Prevents “permanently broken” sectors | | GDELT CAMEO | Native EventRootCode persisted | RSS keyword map is fallback only | | Census vs BLS/BEA | Both, different jobs | Census = structure; BLS/BEA = macro rates |

Extended rationale, alternatives, and formulas: TECHNICAL-REFERENCE.md.

File map (for developers)

| Concern | Primary modules | |---------|-----------------| | Macro fetch | lib/economic-fetcher.ts, scripts/update-economic-indicators.ts | | Stability cron | scripts/update-stability.ts | | Shock verify (manual) | scripts/verify-industry-shocks.ts | | Census LQ | lib/census-cbp-lq.ts | | News ingest | lib/gdelt-fetcher.ts, lib/ingest-us-outlet-rss.ts | | RSS enrichment | lib/text-slice-enrichment.ts, lib/lexicon-tone.ts, lib/cameo-goldstein.ts | | Clustering | lib/story-clustering.ts | | Industry health math | lib/industry-health.ts | | Shock routing | lib/shock-routing-logic.ts, lib/industry-shock-router.ts | | UI data loaders | lib/state-industries-data.ts, app/page.tsx |

Spec documents: STABILITY_PIPELINE_SPEC.md, DUAL_PATH-ENRICHMENT.MD.

Pipeline Visuals

How a headline becomes industry health

The moving cards show where data is merged, filtered, routed, and stored before it changes a state's sector health.

Raw inputs

GDELT and RSS enter one table

GDELT gives event signals; RSS fills local coverage gaps. Both become normalized news rows.

Why cleaning matters

Events and articles do not match 1:1

One event can have many URLs, while one article can mention several events. We collapse before scoring.

Story unit

Route to a state or national pool

GDELT geo, source registry, and text clues resolve FIPS. Unclear stories stay national.

Industry routing

Map story meaning to an industry

Today CAMEO gives coarse NAICS hints. Planned embeddings will match story content to richer NAICS profiles.

Shock size

Impact uses intensity plus volume

Goldstein, tone, and article count create an impact index. Only adverse direction subtracts health.

Persistence

Apply once, then recover slowly

The shock key prevents repeated damage. Quiet sectors close 5% of the gap back to 100 each day.

Next pipeline

Story candidates replace raw rows

The planned unit is a canonical story with event candidates, state confidence, NAICS confidence, and provenance.