Open
Conversation
Just expect it to be provided in the datasets.json
All datasets are "example" datasets
With placeholder code. Will fix in another PR
for more information, see https://pre-commit.ci
Contributor
Author
|
Not all the benchmarks are running. Once this is merged I'll fix the rest in #40 . Let me know what you think of this @fluidnumerics-joe |
Contributor
Author
|
Oh, and since Parcels is now a submodule I think you'll need to do |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR reworks parcels-benchmarks in a way (I hope) is much easier to work with. Follow the README and let me know what you think.
Changes:
Replaces the
parcels_benchmarksinternal package (which provided the CLI tool for adding dataset hashes etc.). Now instead:intake-xarraycatalog is defined incatalogs/parcels-benchmarks/catalog.yml. The top of the file has a comment which contains the link to the ZIP to be downloaded.scripts/download-catalog.py) downloads the data for acatalogand takes aoutput_dir(both via CLI args). This uses curl to download the dataset, and then unzips all nested zip files (deleting the original zips). This script also copies the catalog file into theoutput_dir(which is good since the datasets in the catalog are defined relative to this catalog file).curlhere means this approach is quite transparent - one can easily see download speeds and decide to cancelsetup-datatask, to download all the datasets.Requires a
PARCELS_BENCHMARKS_DATA_FOLDERenvironment variable to be explicitly set which is then acts as the working space for the data. This environment variable is used in the download and benchmarking code.We needed the following things to ease development:
Download all datasets before running benchmarks
Footnotes
given we are the sole owners of our data sources I don't think this is a concern ↩