New condition for compressing data#148
Merged
KriFos1 merged 3 commits intoPython-Ensemble-Toolbox:mainfrom Mar 27, 2026
Merged
Conversation
New keyword compress_data in COMPRESS in the input file for specifying the data type that should be compressed. In the old version the estimated noise level from the compression was never used.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the wavelet-compression flow to compress only the observation/prediction data type specified by a new compress_data keyword, and to use the compression-estimated noise when setting data variance.
Changes:
- Add
compress_datato the sparse representation configuration extracted from the COMPRESS block. - Switch compression triggers in observation/prediction organization from “size matches mask” to “datatype matches
compress_data”. - Override
DATAVARusing wavelet-estimated noise (est_noise**2) for the compressed datatype.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/pipt/misc_tools/extract_tools.py |
Adds compress_data to parsed sparse configuration. |
src/pipt/loop/ensemble.py |
Changes when obs data is compressed and when datavar is overridden by wavelet-estimated noise. |
src/pipt/loop/assimilation.py |
Updates forecast post-processing to select the compressed datatype by name and recompress predicted data. |
Comments suppressed due to low confidence (2)
src/pipt/loop/assimilation.py:486
- In
post_process_forecast,vintageis reset to 0 inside thefor key in pred_data:loop and then incremented, but it isn’t used after that increment. This looks like leftover logic from the previous mask/size-based condition and can be removed to avoid confusion about how vintages are tracked.
# Reset vintage
vintage = 0
# Store according to sparse_info
if key == self.ensemble.sparse_info['compress_data'] and pred_data[key] is not None:
# If first entry in pred_data_tmp
if pred_data_tmp[i] is None:
pred_data_tmp[i] = {key: pred_data[key]}
else:
pred_data_tmp[i][key] = pred_data[key]
# Update vintage
vintage += 1
src/pipt/loop/ensemble.py:524
_org_data_varnow overridesdatavarbased solely ondatatype[j] == self.sparse_info['compress_data']and then indexesself.sparse_data[vintage]. If the corresponding true-data compression didn’t run (e.g., non-.npzinput /N/A/use_ensemblemode) or ifvintageexceeds the number of storedsparse_dataentries, this will raiseIndexErrorand/or set a variance vector with a different length thanobs_data. Add a guard that ensures the vintage exists and that the estimated-noise vector length matches the (compressed) observation length before overriding.
if self.sparse_info is not None and self.datavar[i][datatype[j]] is not None and \
datatype[j]==self.sparse_info['compress_data']:
# compute var from sparse_data
est_noise = np.power(self.sparse_data[vintage].est_noise, 2)
self.datavar[i][datatype[j]] = est_noise # override the given value
vintage = vintage + 1
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
KriFos1
approved these changes
Mar 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New keyword compress_data in COMPRESS in the input file for specifying the data type that should be compressed. In the old version the estimated noise level from the compression was never used.