-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
Summary
Use Forbes 400 wealth data to improve the $100M+ AGI bracket in the disaggregated PUF. This builds on #606 (aggregate record disaggregation) by adding a public data source for the extreme top tail.
Motivation
The $100M+ AGI bucket (RECID 999999) has 349 returns representing ~$135.7B in AGI. Our current approach generates synthetic AGI from a truncated Pareto distribution, but Forbes 400 provides individual-level wealth data for ~400 households that heavily overlap with this bucket. Using Forbes as a backbone would:
- Ground the AGI distribution in observable (public) data rather than parametric assumptions
- Enable industry-based archetype assignment (tech → LTCG-heavy, finance → mixed, real estate → partnership-heavy)
- Provide a validation target: does our synthetic income distribution produce plausible wealth accumulation?
Approach
Step 1: Wealth-to-income model
Map Forbes net worth to expected income components:
- LTCG:
net_worth * realization_rate(calibrate to SOI realization rates by wealth class, ~2-5%) - Qualified dividends:
net_worth * dividend_yield(~1-2% for diversified portfolios) - Partnership/S-corp: Industry-dependent; real estate and finance allocate more here
- Wages: Known from proxy filings for public company founders; assume minimal otherwise
- Interest:
net_worth * risk_free_rate * bond_allocation
Step 2: Industry-to-archetype mapping
Forbes reports primary industry. Map to existing archetypes:
- Technology → LTCG-heavy (founder stock sales)
- Finance/Investments → mixed-financial
- Real estate → partnership-heavy
- Diversified → mixed-financial
- Food/Beverage, Retail → wage-heavy or LTCG-heavy depending on active vs. passive
Step 3: Generate Forbes-derived synthetic records
For the $100M+ bucket:
- Use Forbes 400 as backbone for top ~349 records (or top N by estimated income)
- Apply wealth-to-income model to get AGI and component shares
- Rescale so weighted totals match the aggregate record's known totals
- Fill secondary variables using archetype donors (same as current approach)
Step 4: Validation
- Compare synthetic AGI distribution against ProPublica tax data (where available)
- Check total Forbes 400 wealth vs. Fed SCF+ top wealth estimates
- Verify income composition shares against SOI Table 1.4 for $10M+ bracket
Data sources
- Forbes 400 list (annual, public): net worth + industry + source of wealth
- Forbes real-time billionaires API (if available)
- SOI realization rates by AGI class
- Fed SCF+ for wealth-income relationship calibration
Challenges
- Wealth ≠ income: Buy-Borrow-Die strategies mean many billionaires report low taxable income
- Forbes estimates have ~20% error bars
- Need to handle unrealized gains carefully (they're wealth but not income until sold)
- Some Forbes 400 members may not file US returns
- Realization rate is the key parameter and varies enormously year-to-year
Depends on
- Improve top-tail income representation in enhanced CPS #606 (aggregate record disaggregation — in progress)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels