Skip to content

Zenodo analysis: material category section empty due to PQG schema change #57

@rdhyee

Description

@rdhyee

Problem

The Material Category Analysis section on tutorials/zenodo_isamples_analysis.qmd shows empty results after the column alias fix in PR #56.

Root cause

The Jan 2026 wide parquet (isamples_202601_wide_h3.parquet) stores material categories as p__has_material_category BIGINT[] — an array of row IDs (foreign keys) pointing to IdentifiedConcept nodes in the narrow format. The old export format had has_material_category as a plain string ("rock", "sediment", etc.).

The current fix maps has_material_category to NULL so the page loads without errors, but the material breakdown charts are empty.

Fix options

  1. Pre-compute a lookup table in a small parquet file mapping row IDs → concept labels (similar to facet_summaries.parquet), and join at query time
  2. Add denormalized string columns to a future wide parquet build (e.g., has_material_category_label VARCHAR)
  3. Rewrite queries to join wide + narrow at runtime (expensive for browser-based DuckDB-WASM)

Option 1 is probably the best balance of effort vs. result.

Affected sections

  • Section 9: Material Category Analysis (bar chart, drill-down by source)
  • Any query referencing has_material_category as a string

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions