Skip to content

Memory and async manager improvements (support embedded)#833

Open
jsmnbom wants to merge 17 commits intobuttplugio:masterfrom
jsmnbom:embedded
Open

Memory and async manager improvements (support embedded)#833
jsmnbom wants to merge 17 commits intobuttplugio:masterfrom
jsmnbom:embedded

Conversation

@jsmnbom
Copy link

@jsmnbom jsmnbom commented Mar 7, 2026

Hi, I'm developing an embedded device for controlling devices using buttplug.io. These are the changes I've had to make to buttplug.io to make things work downstream. I have tried to clean them up and explain as best as I can why stuff is changed.

Feel like to cherry pick or even wholesale ignore this PR btw. I just figured I should at least attempt to upstream my changes rather than trying to keep a fork up to date.

The most critical changes are improvements to memory usage (esp in the device manager), AsyncManager as a trait, and the ability to serialize the DeviceConfigurationManager internals directly.

My data from doing the memory improvements in pasted below. But it basically went from using 10MB to 1,2MB, and with a lot less churn.

The AsyncManager replacement is needed since tokio isn't really able to run on embedded. Rather than creating a new feature specifically for my target board (and bring in a bunch more dependencies), I made it so that downstream crates can register which AsyncManager to use. They just need to make sure to do it before any other buttplug code is run.

The last change about serialization of DeviceConfigurationManager internals, is because using json for storing the device configs ends up being not great on embedded systems since it's not a very efficient format to store nor decode. I think using it as the primary choice in the library is still the correct choice overall, but with these changes it is possible for me to use a build script like: https://gist.github.com/jsmnbom/1be695db3b08725847669bd9a39d520d and then decode it like so:

static BUTTPLUG_DATA: &[u8] = include_bytes!("./data/data.bin.gz");

#[derive(Debug, Deserialize)]
pub struct ButtplugData {
  base_communication_specifiers: HashMap<CompactString, Vec<ProtocolCommunicationSpecifier>>,
  base_device_definitions: Vec<(ServerDeviceDefinition, Vec<BaseDeviceIdentifier>)>,
  base_device_definitions_count: usize,
}

impl ButtplugData {
  pub fn load() -> anyhow::Result<DeviceConfigurationManagerBuilder> {
    let data = miniz_oxide::inflate::decompress_to_vec(BUTTPLUG_DATA)
      .map_err(|_| anyhow::anyhow!("Decompression error!"))?;
    let data: SelfDescribed<Self> = postcard::from_bytes(&data)
      .map_err(|e| anyhow::anyhow!("Postcard deserialization error: {:?}", e))?;
    let data = data.0;

    let base_communication_specifiers = data.base_communication_specifiers;
    let mut base_device_definitions = HashMap::with_capacity(data.base_device_definitions_count);
    for (def, ids) in data.base_device_definitions {
      let def = Arc::new(def);
      for id in ids {
        base_device_definitions.insert(id, def.clone());
      }
    }

    Ok(DeviceConfigurationManagerBuilder::new(
      base_communication_specifiers,
      base_device_definitions,
    ))
  }
}

With this data.gz.bin ends up being 57 kB which is a lot less than the original json file, and it's a lot faster and more efficient to load too.

Smaller changes include not using regex. This is simply because it's a pretty heavy dependency that ends up going unused if jsonschema is also unused. Thereby saving quite a lot of binary size.

small note: I believe there may be an issue with the way the runtime features of buttplug_core is propagated currently. I think the other crates depending on buttplug_core with default features means that the tokio runtime is technically always turned on.

let dcm = load_protocol_configs(&None, &None, false)
  .unwrap()
  .finish()
  .unwrap()
152,582 memory events (80,169 allocs, 72,413 frees)
Memory allocated: 52.81 MB
Memory freed: 42.91 MB
Overall change: 9.9 MB

Optimize device configuration loading

  • Make JSON validation return the parsed value to avoid double parsing.
  • Allow parsing of hashmaps to DeviceConfigurationManagerBuilder to avoid cloning.
  • Other techniques to avoid cloning of data during loading.
  • Add a version.rs file generated by build.rs to avoid parsing the config version from JSON at runtime.
85,641 memory events (46,468 allocs, 39,173 frees)
Memory allocated: 29.02 MB
Memory freed: 20.54 MB
Overall change: 8679.99 KB

Use better data structures for device configuration management

ServerDeviceAttributes::features: BTreeMap -> LiteMap
ServerDeviceDefinition::features: BTreeMap -> LiteMap

85,514 memory events (46,336 allocs, 39,178 frees)
Memory allocated: 12.44 MB
Memory freed: 10.63 MB
Overall change: 1851.4 KB

Use of CompactStr + custom RangeInclusive + InputCommandType as enumflags2::bitflags

77,712 memory events (40,577 allocs, 37,135 frees)
Memory allocated: 10.15 MB
Memory freed: 9049.7 KB
Overall change: 1345.34 KB

Use Arc for ServerDeviceDefinition in DeviceConfigurationManager::base_device_definitions to avoid cloning the definition for each identifier.

76,250 memory events (40,113 allocs, 36,137 frees)
Memory allocated: 9306.27 KB.
Memory freed: 8110.62 KB.
Overall change: 1195.66 KB.

Decrease cloning when using ServerDeviceDefinitionBuilder.

73,221 memory events (38,600 allocs, 34,621 frees)
Memory allocated: 7641.52 KB
Memory freed: 6446 KB
Overall change: 1195.52 KB

jsmnbom added 10 commits March 7, 2026 15:48
- AsyncManager is now a trait. The used async runtime is determined by the feature flag, but users can also set a global async manager at runtime if they want to use a custom async runtime or if the default one doesn't work for their use case.
- All async_manager::spawn calls now take a tracing::Span parameter to ensure proper tracability of async tasks.
- The span names for each task may not be perfect atm, but should still be better than none.
- The `spawn_with_handle` function has been removed since it was never used.
- Since sleep functions depend on the async runtime, they have been moved to the AsyncManager trait as well.
- Moved to using tracing::Instrument directly instead of the tracing_futures crate.
- Move usages of tokio::spawn to async_manager::spawn, except in intiface-engine which is assumed to be tokio runtime only.

Unsure how to handle nintendo joycons, since it has feature gates, so it has been left mostly alone.

LLM disclosure: LLMs were very mildly used to assist with this refactor, primarily just autocomplete for changing the spawn and sleep calls. I have tested the changes using cargo test and a simple in-process client connection - i have not been able to make wasm work, but i have tested a custom AsyncManager implementation.
- Make JSON validation return the parsed value to avoid double parsing.
- Allow parsing of hashmaps to DeviceConfigurationManagerBuilder to avoid cloning.
- Other techniques to avoid cloning of data during loading.
- Add a version.rs file generated by build.rs to avoid parsing the config version from JSON at runtime.
…e_device_definitions to avoid cloning the definition for each identifier.
…tations.

While the regex crate is normally fulled in anyways by jsonschema, it is possible to use buttplug_server without message serialization (in process connector) and also without load_protocol_configs. In those cases, the regex crate is not needed and removing it from these protocol implementations saves about 100KB of binary size on embedded targets. The regexes in question were also very simple and easily replaced with basic string parsing.

If regex is needed in the future for more complex parsing, it should ideally be added as regex_lite.
Otherwise ServerDeviceDefinitions can't be serialized
@CLAassistant
Copy link

CLAassistant commented Mar 7, 2026

CLA assistant check
All committers have signed the CLA.

@jsmnbom
Copy link
Author

jsmnbom commented Mar 10, 2026

Turns out rust is kinda bad at optimizing default trait implementations, and some methods on ProtocolHandler was duplicated 135 times in the final optimized binary...

@qdot
Copy link
Member

qdot commented Mar 11, 2026

This is gonna take a while to bring in because there's a lot going on here, but it's really interesting work so as I said on discord, totally willing to work with you here. This may need to be broken down into sub-PRs though because getting through all of this in one chunk is going to be a LOT.

Before we get to planning that though, how are we gonna handle compatibility and testing for this in the future? Do you have any sort of structured tests that can surface when new code or PRs will require optimization?

jsmnbom added 5 commits March 12, 2026 23:04
…HashMap lookup, as the number of protocols is small and this avoids the need for dynamic dispatch and heap allocation when retrieving protocol identifiers.
Removing the manual serialization of original_error makes it possible to compile buttplug without json at all (or at least with json function elided at link time)

Care should be taken to ensure that the error_message field is still populated with a useful message, as the original_error field will not be serialized.

So basically not sure this actually needs to go upstream.
@jsmnbom
Copy link
Author

jsmnbom commented Mar 14, 2026

This is gonna take a while to bring in because there's a lot going on here, but it's really interesting work so as I said on discord, totally willing to work with you here. This may need to be broken down into sub-PRs though because getting through all of this in one chunk is going to be a LOT.

Yeah that makes sense. Do you have any idea where it would be best to start?

Before we get to planning that though, how are we gonna handle compatibility and testing for this in the future? Do you have any sort of structured tests that can surface when new code or PRs will require optimization?

I am unsure how regression tests are currently implemented. Ideally i guess we could have CI comment on PRs with the data.

For now I've (with the use of agentic AI tho goddanm is it stupid sometimes :P) written a test that outputs a table of memory usage.

Before this PR:

> cargo test -p buttplug_tests --test test_memory -- --show-output 
┌───────────────────────────────────────────────────────────────────┬────────────┬────────────┬─────────────┬────────────┬────────────┬─────────────┐
│ Span                                                              │                  Own                  │                 Total                 │
│                                                                   │     +      │     -      │      Δ      │     +      │     -      │      Δ      │
├───────────────────────────────────────────────────────────────────┼────────────┼────────────┼─────────────┼────────────┼────────────┼─────────────┤
│ memory_test                                                       │        0 B │        0 B │        +0 B │  53.08 MiB │  42.93 MiB │  +10.15 MiB │
│     DeviceConfigurationManager                                    │        0 B │   9.82 MiB │   -9.82 MiB │  52.81 MiB │  42.91 MiB │   +9.90 MiB │
│         load_protocol_configs                                     │  42.67 MiB │  32.70 MiB │   +9.96 MiB │  42.67 MiB │  32.70 MiB │   +9.96 MiB │
│         DeviceConfigurationManagerBuilder.finish                  │  10.14 MiB │ 394.39 KiB │   +9.75 MiB │  10.14 MiB │ 394.39 KiB │   +9.75 MiB │
│     ServerDeviceManager                                           │        0 B │       65 B │       -65 B │ 172.22 KiB │  10.25 KiB │ +161.98 KiB │
│         ServerDeviceManagerBuilder::new                           │      192 B │        0 B │      +192 B │      192 B │        0 B │      +192 B │
│         builder.comm_manager(BtlePlugCommunicationManagerBuilder) │       65 B │        0 B │       +65 B │       65 B │        0 B │       +65 B │
│         ServerDeviceManagerBuilder.finish                         │ 171.97 KiB │  10.18 KiB │ +161.79 KiB │ 171.97 KiB │  10.18 KiB │ +161.79 KiB │
│     ButtplugServerBuilder                                         │       79 B │       15 B │       +64 B │  32.88 KiB │       15 B │  +32.87 KiB │
│         ButtplugServerBuilder.finish                              │  32.80 KiB │        0 B │  +32.80 KiB │  32.80 KiB │        0 B │  +32.80 KiB │
│     ButtplugClientInProcessConnector                              │        0 B │        0 B │        +0 B │   3.88 KiB │        0 B │   +3.88 KiB │
│         ButtplugInProcessClientConnectorBuilder::default          │        0 B │        0 B │        +0 B │        0 B │        0 B │        +0 B │
│         ButtplugClientInProcessConnectorBuilder.server            │        0 B │        0 B │        +0 B │        0 B │        0 B │        +0 B │
│         ButtplugInProcessClientConnectorBuilder.finish            │   3.88 KiB │        0 B │   +3.88 KiB │   3.88 KiB │        0 B │   +3.88 KiB │
│     ButtplugClient                                                │        0 B │        0 B │        +0 B │  73.18 KiB │  12.03 KiB │  +61.15 KiB │
│         ButtplugClient::new                                       │  48.90 KiB │        0 B │  +48.90 KiB │  48.90 KiB │        0 B │  +48.90 KiB │
│         ButtplugClient.connect                                    │  18.06 KiB │   6.06 KiB │  +12.00 KiB │  24.28 KiB │  12.03 KiB │  +12.25 KiB │
│             InProcessClientConnectorEventSenderLoop               │        0 B │        0 B │        +0 B │        0 B │        0 B │        +0 B │
│             Client Loop Span                                      │   6.22 KiB │   5.63 KiB │      +599 B │   6.22 KiB │   5.98 KiB │      +247 B │
│                 Buttplug Server Message                           │        0 B │      248 B │      -248 B │        0 B │      248 B │      -248 B │
│                 Buttplug Server Message                           │        0 B │      104 B │      -104 B │        0 B │      104 B │      -104 B │
└───────────────────────────────────────────────────────────────────┴────────────┴────────────┴─────────────┴────────────┴────────────┴─────────────┘

After:

> cargo test -p buttplug_tests --test test_memory -- --show-output 
┌───────────────────────────────────────────────────────────────────┬────────────┬──────────┬─────────────┬────────────┬───────────┬─────────────┐
│ Span                                                              │                 Own                 │                Total                 │
│                                                                   │     +      │    -     │      Δ      │     +      │     -     │      Δ      │
├───────────────────────────────────────────────────────────────────┼────────────┼──────────┼─────────────┼────────────┼───────────┼─────────────┤
│ memory_test                                                       │        0 B │      0 B │        +0 B │   7.94 MiB │  6.36 MiB │   +1.58 MiB │
│     DeviceConfigurationManager                                    │        0 B │      0 B │        +0 B │   7.71 MiB │  6.35 MiB │   +1.36 MiB │
│         load_protocol_configs                                     │   7.71 MiB │ 6.35 MiB │   +1.36 MiB │   7.71 MiB │  6.35 MiB │   +1.36 MiB │
│         DeviceConfigurationManagerBuilder.finish                  │        0 B │      0 B │        +0 B │        0 B │       0 B │        +0 B │
│     ServerDeviceManager                                           │        0 B │     65 B │       -65 B │ 125.12 KiB │     228 B │ +124.89 KiB │
│         ServerDeviceManagerBuilder::new                           │      192 B │      0 B │      +192 B │      192 B │       0 B │      +192 B │
│         builder.comm_manager(BtlePlugCommunicationManagerBuilder) │       65 B │      0 B │       +65 B │       65 B │       0 B │       +65 B │
│         ServerDeviceManagerBuilder.finish                         │ 123.47 KiB │      0 B │ +123.47 KiB │ 124.86 KiB │     163 B │ +124.71 KiB │
│             BtlePlugCommunicationManager::adapter_task            │   1.40 KiB │    163 B │   +1.24 KiB │   1.40 KiB │     163 B │   +1.24 KiB │
│             ServerDeviceManagerEventLoop                          │        0 B │      0 B │        +0 B │        0 B │       0 B │        +0 B │
│     ButtplugServerBuilder                                         │       79 B │     15 B │       +64 B │  32.88 KiB │      15 B │  +32.87 KiB │
│         ButtplugServerBuilder.finish                              │  32.80 KiB │      0 B │  +32.80 KiB │  32.80 KiB │       0 B │  +32.80 KiB │
│     ButtplugClientInProcessConnector                              │        0 B │      0 B │        +0 B │   3.88 KiB │       0 B │   +3.88 KiB │
│         ButtplugInProcessClientConnectorBuilder::default          │        0 B │      0 B │        +0 B │        0 B │       0 B │        +0 B │
│         ButtplugClientInProcessConnectorBuilder.server            │        0 B │      0 B │        +0 B │        0 B │       0 B │        +0 B │
│         ButtplugInProcessClientConnectorBuilder.finish            │   3.88 KiB │      0 B │   +3.88 KiB │   3.88 KiB │       0 B │   +3.88 KiB │
│     ButtplugClient                                                │        0 B │      0 B │        +0 B │  71.96 KiB │ 11.87 KiB │  +60.08 KiB │
│         ButtplugClient::new                                       │  48.90 KiB │      0 B │  +48.90 KiB │  48.90 KiB │       0 B │  +48.90 KiB │
│         ButtplugClient.connect                                    │  16.84 KiB │ 5.90 KiB │  +10.94 KiB │  23.05 KiB │ 11.87 KiB │  +11.18 KiB │
│             InProcessClientConnectorEventSenderLoop               │        0 B │      0 B │        +0 B │        0 B │       0 B │        +0 B │
│             ClientLoop                                            │   6.22 KiB │ 5.63 KiB │      +599 B │   6.22 KiB │  5.98 KiB │      +247 B │
│                 Buttplug Server Message                           │        0 B │    248 B │      -248 B │        0 B │     248 B │      -248 B │
│                 Buttplug Server Message                           │        0 B │    104 B │      -104 B │        0 B │     104 B │      -104 B │
└───────────────────────────────────────────────────────────────────┴────────────┴──────────┴─────────────┴────────────┴───────────┴─────────────┘

It's not perfect, but it does give a pretty great overview of where allocations are happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants