Upstream VssStoreBuilder and VssStore to lightning-persister#4323
Upstream VssStoreBuilder and VssStore to lightning-persister#4323tnull wants to merge 13 commits intolightningdevkit:mainfrom
VssStoreBuilder and VssStore to lightning-persister#4323Conversation
|
🎉 This PR is now ready for review! |
0817a96 to
2e8b7cd
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4323 +/- ##
==========================================
- Coverage 86.18% 86.17% -0.01%
==========================================
Files 160 160
Lines 107536 107536
Branches 107536 107536
==========================================
- Hits 92680 92670 -10
- Misses 12231 12237 +6
- Partials 2625 2629 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The former is done here lightningdevkit/vss-client#56, for |
0b39d8d to
c633078
Compare
c633078 to
a343406
Compare
|
Now rebased / updated to incorporate latest VSS changes from LDK Node. Should be good for review. LDK Node draft PR: lightningdevkit/ldk-node#837 |
d72fb40 to
124e08d
Compare
Since the beginning of VSS we intended to upstream the corresponding `KVStore` implementation to `lightning-persister`. Here, we finally do it. Signed-off-by: Elias Rohrer <dev@tnull.de>
Bump `vss-client-ng` 0.4 -> 0.5, add `build_with_sigs_auth()` method, rename `build()` to `build_with_lnurl()`, replace `RandEntropySource` with `VssEntropySource` backed by LDK's `RandomBytes`, drop `rand` from the `vss` feature in favor of `getrandom`, re-export `vss_client` from the crate root, remove `AuthProviderSetupFailed` error variant, and update tests to use `VssStoreBuilder`. Co-Authored-By: HAL 9000 Signed-off-by: Elias Rohrer <dev@tnull.de>
Fix test import path: `crate::io::test_utils` -> `crate::test_utils`. Co-Authored-By: HAL 9000
Stop `list_all_keys` on empty page tokens Handle both `None` and empty `next_page_token` values so VSS pagination terminates regardless of which sentinel the server uses. Co-Authored-By: HAL 9000
Reject missing VSS object values explicitly Convert malformed responses into `io::Error`s instead of panicking when `value` is missing from `get_object` replies. Co-Authored-By: HAL 9000
124e08d to
dd583b0
Compare
Use an MSRV-compatible closure in `write_internal` Replace the async closure syntax in `write_internal` with the older `move || async move` pattern supported on Rust 1.75. Co-Authored-By: HAL 9000
Move owned buffers into `StorableBuilder` Pass the existing `Vec<u8>` to `StorableBuilder::build` instead of cloning it before every write. Co-Authored-By: HAL 9000
Use an MSRV-compatible closure in `remove_internal` Replace the async closure syntax in `remove_internal` with the older `move || async move` pattern supported on Rust 1.75. Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
Wait for the VSS server and use `checkout@v4` Poll the server before running tests so the workflow does not race the startup path, and align the new job with the repository's current `actions/checkout` version. Co-Authored-By: HAL 9000
Drop the leftover workflow debug listing Remove the stray `ls -la` output so the VSS test job only emits the signal needed for debugging failures. Co-Authored-By: HAL 9000
Keep the V0 prefix and extraction logic aligned with stores that write empty-namespace keys without any `#` separator. Co-Authored-By: HAL 9000
dd583b0 to
9686cb9
Compare
| bitcoin = "0.32.2" | ||
| lightning = { version = "0.3.0", path = "../lightning" } | ||
| lightning-macros = { version = "0.2.0", path = "../lightning-macros", optional = true } | ||
| tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread"] } |
There was a problem hiding this comment.
Bug: The code uses tokio::sync::Mutex extensively (7+ call sites in vss_store.rs), but the tokio dependency only declares features = ["rt-multi-thread"]. The sync feature is required for tokio::sync::Mutex to be available.
This likely compiles today only because vss-client-ng transitively enables tokio/sync. If that crate ever changes its tokio feature requirements, this will break.
| tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread"] } | |
| tokio = { version = "1.35", optional = true, default-features = false, features = ["rt-multi-thread", "sync"] } |
| .worker_threads(INTERNAL_RUNTIME_WORKERS) | ||
| .max_blocking_threads(INTERNAL_RUNTIME_WORKERS) | ||
| .build() | ||
| .unwrap(); |
There was a problem hiding this comment.
Bug: .unwrap() will panic if the OS fails to create the runtime (e.g., resource limits, thread creation failure). Since this function returns io::Result<Self>, the error should be propagated instead:
| .unwrap(); | |
| .build()?; |
| #[test] | ||
| fn vss_read_write_remove_list_persist() { | ||
| let vss_base_url = std::env::var("TEST_VSS_BASE_URL").unwrap(); | ||
| let mut rng = rng(); | ||
| let rand_store_id: String = (0..7).map(|_| rng.sample(Alphanumeric) as char).collect(); | ||
| let mut vss_seed = [0u8; 32]; | ||
| rng.fill_bytes(&mut vss_seed); | ||
| let vss_xprv = Xpriv::new_master(bitcoin::Network::Testnet, &vss_seed).unwrap(); | ||
| let vss_store = VssStoreBuilder::new(vss_xprv, vss_base_url, rand_store_id) | ||
| .build_with_fixed_headers(HashMap::new()) | ||
| .unwrap(); | ||
| do_read_write_remove_list_persist(&vss_store); |
There was a problem hiding this comment.
Bug: This test will panic at runtime. VssStore::new (line 128) and all KVStoreSync methods (lines 207, 238, 272, 290) use tokio::task::block_in_place, which panics if called from outside a multi-threaded tokio runtime. This #[test] runs on a plain OS thread with no tokio runtime context.
The second test (vss_read_write_remove_list_persist_in_runtime_context) correctly uses #[tokio::test(flavor = "multi_thread", worker_threads = 1)].
Either make this test #[tokio::test(flavor = "multi_thread")] as well, or handle the non-runtime case in the constructor/sync methods:
fn block_on_internal<F: Future>(runtime: &tokio::runtime::Runtime, fut: F) -> F::Output {
match tokio::runtime::Handle::try_current() {
Ok(_) => tokio::task::block_in_place(|| runtime.handle().block_on(fut)),
Err(_) => runtime.handle().block_on(fut),
}
}If the intent is that VssStore always requires a tokio runtime context (even for sync usage), this needs to be documented on the struct/builder, and this test should be updated.
| let wallet_data_present = !response.key_versions.is_empty(); | ||
| if wallet_data_present { | ||
| // If the wallet data is present, it means we're not running for the first time. | ||
| Ok(VssSchemaVersion::V0) |
There was a problem hiding this comment.
Potential data loss: When V0 is detected, the schema version is NOT written to storage. This means every future startup re-runs the detection heuristic (checking for bdk_wallet# prefix). If a V0 store ever loses all its bdk_wallet data (e.g., wallet reset, selective key deletion) while retaining other data (channel monitors, etc.), the next startup will misclassify it as a fresh V1 store. This causes:
- All existing V0 keys become inaccessible (V1 uses obfuscated namespace prefixes, V0 uses plaintext)
- Decryption fails (V1 uses the store key as AAD, V0 uses empty AAD)
Consider persisting VssSchemaVersion::V0 to storage (like V1 does) to make the detection durable:
if wallet_data_present {
// Persist V0 schema version so future startups don't rely on the heuristic
let schema_version = VssSchemaVersion::V0;
let encoded_version = schema_version.encode();
let storable_builder = StorableBuilder::new(VssEntropySource(entropy_source));
let vss_version = -1;
let aad = store_key.as_bytes();
let storable = storable_builder.build(encoded_version, vss_version, &data_encryption_key, aad);
let request = PutObjectRequest {
store_id: store_id.clone(),
global_version: None,
transaction_items: vec![KeyValue { key: store_key, version: vss_version, value: storable.encode_to_vec() }],
delete_items: vec![],
};
// Best-effort write; if it fails, we'll re-detect next time
let _ = client.put_object(&request).await;
Ok(schema_version)
} else {| // `PeerManager::process_pending_events` -> `ChannelManager::get_and_clear_pending_msg_events` | ||
| // would deadlock when trying to acquire sync `Mutex` locks that are held by the thread | ||
| // currently being blocked waiting on the VSS operation to finish. | ||
| internal_runtime: Option<tokio::runtime::Runtime>, |
There was a problem hiding this comment.
Hmm, does it make sense to upstream this hack or should we go ahead and split the store struct into an async and sync version? I assume pretty soon LDK Node will only need/want the async version anyway?
There was a problem hiding this comment.
Hmm, does it make sense to upstream this hack or should we go ahead and split the store struct into an async and sync version? I assume pretty soon LDK Node will only need/want the async version anyway?
Yes, we plan to eventually, but a) not quite sure on the timeline b) not quite sure if we consider async_persist fully ready-to-go yet? I guess we could also consider only upstreaming an async version, though that would mean we can't directly switch to it downstream?
There was a problem hiding this comment.
b) not quite sure if we consider async_persist fully ready-to-go yet?
We intend to support it fully in 0.3, and ideally LDK Node will switch to it when using 0.3.
I guess we could also consider only upstreaming an async version, though that would mean we can't directly switch to it downstream?
LDK Node could still use the async version and do the wrapping with an internal runtime before calling into the async version from sync code, no?
| Ok(keys) | ||
| } | ||
|
|
||
| async fn execute_locked_write< |
There was a problem hiding this comment.
Can we DRY this up with the filesystem store at all?
There was a problem hiding this comment.
Hmm, one option we discussed a few months ago was to offer a generic wrapper/adapter that does all the necessary write-locking so that the behavior doesn't need to be replicated in each implementation / by the user. We decided against that at the time, but maybe it's worth revisiting, in particular given lightningdevkit/ldk-node#692 will likely introduce yet another version of this for the async-trait adapter.
I imagine we could even force the correct behavior through use of a slightly different trait (something like WriteOrderedKVStore) that the wrapper/adapter implements and that VssStore could eventually implement directly when we switch to use the actual key-level versioning?
Any thoughts (cc @joostjager)?
| // Check if any `bdk_wallet` data was written by listing keys under the respective | ||
| // (unobfuscated) prefix. | ||
| const V0_BDK_WALLET_PREFIX: &str = "bdk_wallet#"; | ||
| let request = ListKeyVersionsRequest { |
There was a problem hiding this comment.
Hmm, its a bit weird to upstream LDK Node-specific migration logic.
There was a problem hiding this comment.
Yeah, I somewhat agree, though I think the schema versioning is really a property/feature of VssStore, not so much of LDK Node? I guess we could find a way to not do the auto-migration in new, but even if we keep this specific v0 -> v1 migration step in LDK Node, we still might want to keep the notion of schema versions in VssStore?
There was a problem hiding this comment.
we still might want to keep the notion of schema versions in VssStore?
Makes sense, though I'm also not really sure this shouldn't belong in vss-client.
There was a problem hiding this comment.
we still might want to keep the notion of schema versions in VssStore?
Makes sense, though I'm also not really sure this shouldn't belong in vss-client.
Hmm, the vss-client has no notion of the used schema/KVStore/etc., also as it doesn't have a dependency on lightning?
|
🔔 1st Reminder Hey @tankyleo! This PR has been waiting for your review. |
Since the beginning of VSS we intended to upstream the corresponding
KVStoreimplementation tolightning-persister.Here, we finally do it. I'll put this in draft until lightningdevkit/ldk-node#755 lands so we can directly upstream the version already upgraded for pubkey-auth.