Streamline parcels-benchmarks by VeckoTheGecko · Pull Request #42 · Parcels-code/parcels-benchmarks

VeckoTheGecko · 2026-03-27T16:11:18Z

This PR reworks parcels-benchmarks in a way (I hope) is much easier to work with. Follow the README and let me know what you think.

Changes:

Replaces the parcels_benchmarks internal package (which provided the CLI tool for adding dataset hashes etc.). Now instead:
- An intake-xarray catalog is defined in catalogs/parcels-benchmarks/catalog.yml. The top of the file has a comment which contains the link to the ZIP to be downloaded.
  - This streamlines our approach, making it easier for the benchmarking scripts to go straight from data on disk to xarray dataset.
  - We can use other options available via intake
  - This approach allows us to get familiar with intake which will likely be used for our HPC systems after v4 is released.
- A script (scripts/download-catalog.py) downloads the data for a catalog and takes a output_dir (both via CLI args). This uses curl to download the dataset, and then unzips all nested zip files (deleting the original zips). This script also copies the catalog file into the output_dir (which is good since the datasets in the catalog are defined relative to this catalog file).
  - If a catalog is already downloaded (i.e., if the folder already exists) its skipped
  - Pro: The use of curl here means this approach is quite transparent - one can easily see download speeds and decide to cancel
  - Con: There is no longer the concept of "known hashes" - this is something we can get back if we want in future¹
- Pixi is used, via the setup-data task, to download all the datasets.
  - This makes our data approach much more flexible should we want to change it in future
Requires a PARCELS_BENCHMARKS_DATA_FOLDER environment variable to be explicitly set which is then acts as the working space for the data. This environment variable is used in the download and benchmarking code.

We needed the following things to ease development:

Download all datasets before running benchmarks
Make it transparent the download progress of datasets

given we are the sole owners of our data sources I don't think this is a concern ↩

Just expect it to be provided in the datasets.json

All datasets are "example" datasets

With placeholder code. Will fix in another PR

for more information, see https://pre-commit.ci

VeckoTheGecko · 2026-03-27T16:11:52Z

Not all the benchmarks are running. Once this is merged I'll fix the rest in #40 .

Let me know what you think of this @fluidnumerics-joe

VeckoTheGecko · 2026-03-27T16:14:11Z

Oh, and since Parcels is now a submodule I think you'll need to do git submodule update --init --recursive (if you aren't doing a fresh clone from the README)

VeckoTheGecko added 30 commits March 25, 2026 18:49

Update README

fd2c217

Update gitignore

b75d0b3

Add Parcels as submodule

295cb7b

Add sandbox environment

98855a5

Rename benchmarks.json to datasets.json

bb967c6

typo

4d59a44

update function name

1f8777c

No need to have this default here

96b3646

Just expect it to be provided in the datasets.json

Add pydantic

c56cf69

Migrate to pydantic

2cec8b4

Assert no duplicate dataset names

64fd72c

refactor

b281743

Rename function

96e24a7

All datasets are "example" datasets

Add download-catalogue option

2d83e26

Update catalogue

4570368

Move files and add tasks

9fa1963

Add catalogue

bddcbdc

Move file

99cf82d

Update folder location

070bcf0

Add PARCELS_BENCHMARKS_DATA_FOLDER env var

96b97d8

Use curl instead

98bb9e7

Rename files (catalogue to catalog and yaml->yml)

42d7f53

Add task descriptions

fc37186

Update readme

e582cff

Remove parcels_benchmarks internal package

2a064be

Update script to unpack zips correctly

c3299ba

Update toml and lock

0f3a93a

Add comment

9a25838

Update catalogs regardless of folder existing

0c2cd03

Fix catalogues

0c2b6cf

VeckoTheGecko and others added 4 commits March 27, 2026 16:55

Migrate fesom ingestion to intake

c2eb313

Update MOI

b4da946

With placeholder code. Will fix in another PR

Update ASV conf

6fb1b7f

[pre-commit.ci] auto fixes from pre-commit.com hooks

515b767

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streamline parcels-benchmarks#42

Streamline parcels-benchmarks#42
VeckoTheGecko wants to merge 34 commits intomainfrom
improvements

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

VeckoTheGecko commented Mar 27, 2026

Download all datasets before running benchmarks

Footnotes

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

VeckoTheGecko commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant