Flowsheet ETL: import and sync flowsheet data from tubafrenzy#258
Open
jakebromberg wants to merge 5 commits intomainfrom
Open
Flowsheet ETL: import and sync flowsheet data from tubafrenzy#258jakebromberg wants to merge 5 commits intomainfrom
jakebromberg wants to merge 5 commits intomainfrom
Conversation
added 5 commits
March 21, 2026 20:00
Add legacy_release_id to library, legacy_entry_id to flowsheet, and legacy_show_id to shows tables with unique indexes. These columns map tubafrenzy IDs to Backend-Service IDs, enabling deduplication when the flowsheet ETL imports historical data and syncs ongoing entries. Update library-etl to populate legacy_release_id on insert and backfill existing rows that are missing it. Update the mirror middleware to persist legacy_entry_id after mirroring entries to tubafrenzy and legacy_show_id after mirroring shows.
Implements the @wxyc/flowsheet-etl job with two modes: - Bulk load: parses a MySQL dump file, imports ~71K shows and ~2.6M flowsheet entries - Incremental sync: queries tubafrenzy via MirrorSQL for new shows and entries since last run The ETL uses legacy_entry_id and legacy_show_id unique indexes for deduplication, ensuring entries mirrored from Backend-Service are not re-imported. Album IDs are resolved via the legacy_release_id mapping populated by the library-etl. Includes MySQL dump parser (handles escaped strings, NULL, numeric values), data transformation layer (entry type mapping, timestamp conversion, string truncation), and unit tests for both modules.
Add unit tests verifying findExistingAlbum returns id + legacy_release_id when the album exists, returns null when it doesn't, and correctly returns legacy_release_id as null for albums that haven't been backfilled yet. Export findExistingAlbum for testability. Fix Prettier formatting across all new files.
Truncate show_name to 128 chars in transformShow (tubafrenzy allows 255, Backend allows 128). Validate show_id references against the set of imported shows before inserting entries, setting show_id to null for the 19 orphan entries that reference deleted shows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
legacy_release_id,legacy_entry_id, andlegacy_show_idcolumns with unique indexes for deduplication between Backend-Service and tubafrenzylegacy_release_idon insert and backfill existing rows@wxyc/flowsheet-etljob with bulk load (MySQL dump parser) and incremental sync (MirrorSQL) modesTest plan
tests/unit/jobs/flowsheet-etl/)@wxyc/database,@wxyc/library-etl, and@wxyc/flowsheet-etlCloses #257