Skip to content

Use Forbes 400 data to improve $100M+ AGI bracket synthesis #613

@MaxGhenis

Description

@MaxGhenis

Summary

Use Forbes 400 wealth data to improve the $100M+ AGI bracket in the disaggregated PUF. This builds on #606 (aggregate record disaggregation) by adding a public data source for the extreme top tail.

Motivation

The $100M+ AGI bucket (RECID 999999) has 349 returns representing ~$135.7B in AGI. Our current approach generates synthetic AGI from a truncated Pareto distribution, but Forbes 400 provides individual-level wealth data for ~400 households that heavily overlap with this bucket. Using Forbes as a backbone would:

  • Ground the AGI distribution in observable (public) data rather than parametric assumptions
  • Enable industry-based archetype assignment (tech → LTCG-heavy, finance → mixed, real estate → partnership-heavy)
  • Provide a validation target: does our synthetic income distribution produce plausible wealth accumulation?

Approach

Step 1: Wealth-to-income model

Map Forbes net worth to expected income components:

  • LTCG: net_worth * realization_rate (calibrate to SOI realization rates by wealth class, ~2-5%)
  • Qualified dividends: net_worth * dividend_yield (~1-2% for diversified portfolios)
  • Partnership/S-corp: Industry-dependent; real estate and finance allocate more here
  • Wages: Known from proxy filings for public company founders; assume minimal otherwise
  • Interest: net_worth * risk_free_rate * bond_allocation

Step 2: Industry-to-archetype mapping

Forbes reports primary industry. Map to existing archetypes:

  • Technology → LTCG-heavy (founder stock sales)
  • Finance/Investments → mixed-financial
  • Real estate → partnership-heavy
  • Diversified → mixed-financial
  • Food/Beverage, Retail → wage-heavy or LTCG-heavy depending on active vs. passive

Step 3: Generate Forbes-derived synthetic records

For the $100M+ bucket:

  1. Use Forbes 400 as backbone for top ~349 records (or top N by estimated income)
  2. Apply wealth-to-income model to get AGI and component shares
  3. Rescale so weighted totals match the aggregate record's known totals
  4. Fill secondary variables using archetype donors (same as current approach)

Step 4: Validation

  • Compare synthetic AGI distribution against ProPublica tax data (where available)
  • Check total Forbes 400 wealth vs. Fed SCF+ top wealth estimates
  • Verify income composition shares against SOI Table 1.4 for $10M+ bracket

Data sources

  • Forbes 400 list (annual, public): net worth + industry + source of wealth
  • Forbes real-time billionaires API (if available)
  • SOI realization rates by AGI class
  • Fed SCF+ for wealth-income relationship calibration

Challenges

  • Wealth ≠ income: Buy-Borrow-Die strategies mean many billionaires report low taxable income
  • Forbes estimates have ~20% error bars
  • Need to handle unrealized gains carefully (they're wealth but not income until sold)
  • Some Forbes 400 members may not file US returns
  • Realization rate is the key parameter and varies enormously year-to-year

Depends on

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions