Memory and async manager improvements (support embedded)#833
Memory and async manager improvements (support embedded)#833jsmnbom wants to merge 17 commits intobuttplugio:masterfrom
Conversation
- AsyncManager is now a trait. The used async runtime is determined by the feature flag, but users can also set a global async manager at runtime if they want to use a custom async runtime or if the default one doesn't work for their use case. - All async_manager::spawn calls now take a tracing::Span parameter to ensure proper tracability of async tasks. - The span names for each task may not be perfect atm, but should still be better than none. - The `spawn_with_handle` function has been removed since it was never used. - Since sleep functions depend on the async runtime, they have been moved to the AsyncManager trait as well. - Moved to using tracing::Instrument directly instead of the tracing_futures crate. - Move usages of tokio::spawn to async_manager::spawn, except in intiface-engine which is assumed to be tokio runtime only. Unsure how to handle nintendo joycons, since it has feature gates, so it has been left mostly alone. LLM disclosure: LLMs were very mildly used to assist with this refactor, primarily just autocomplete for changing the spawn and sleep calls. I have tested the changes using cargo test and a simple in-process client connection - i have not been able to make wasm work, but i have tested a custom AsyncManager implementation.
- Make JSON validation return the parsed value to avoid double parsing. - Allow parsing of hashmaps to DeviceConfigurationManagerBuilder to avoid cloning. - Other techniques to avoid cloning of data during loading. - Add a version.rs file generated by build.rs to avoid parsing the config version from JSON at runtime.
…bitflags for InputCommandType.
…e_device_definitions to avoid cloning the definition for each identifier.
…tations. While the regex crate is normally fulled in anyways by jsonschema, it is possible to use buttplug_server without message serialization (in process connector) and also without load_protocol_configs. In those cases, the regex crate is not needed and removing it from these protocol implementations saves about 100KB of binary size on embedded targets. The regexes in question were also very simple and easily replaced with basic string parsing. If regex is needed in the future for more complex parsing, it should ideally be added as regex_lite.
Otherwise ServerDeviceDefinitions can't be serialized
|
Turns out rust is kinda bad at optimizing default trait implementations, and some methods on ProtocolHandler was duplicated 135 times in the final optimized binary... |
|
This is gonna take a while to bring in because there's a lot going on here, but it's really interesting work so as I said on discord, totally willing to work with you here. This may need to be broken down into sub-PRs though because getting through all of this in one chunk is going to be a LOT. Before we get to planning that though, how are we gonna handle compatibility and testing for this in the future? Do you have any sort of structured tests that can surface when new code or PRs will require optimization? |
…HashMap lookup, as the number of protocols is small and this avoids the need for dynamic dispatch and heap allocation when retrieving protocol identifiers.
Removing the manual serialization of original_error makes it possible to compile buttplug without json at all (or at least with json function elided at link time) Care should be taken to ensure that the error_message field is still populated with a useful message, as the original_error field will not be serialized. So basically not sure this actually needs to go upstream.
Yeah that makes sense. Do you have any idea where it would be best to start?
I am unsure how regression tests are currently implemented. Ideally i guess we could have CI comment on PRs with the data. For now I've (with the use of agentic AI tho goddanm is it stupid sometimes :P) written a test that outputs a table of memory usage. Before this PR: > cargo test -p buttplug_tests --test test_memory -- --show-output
┌───────────────────────────────────────────────────────────────────┬────────────┬────────────┬─────────────┬────────────┬────────────┬─────────────┐
│ Span │ Own │ Total │
│ │ + │ - │ Δ │ + │ - │ Δ │
├───────────────────────────────────────────────────────────────────┼────────────┼────────────┼─────────────┼────────────┼────────────┼─────────────┤
│ memory_test │ 0 B │ 0 B │ +0 B │ 53.08 MiB │ 42.93 MiB │ +10.15 MiB │
│ DeviceConfigurationManager │ 0 B │ 9.82 MiB │ -9.82 MiB │ 52.81 MiB │ 42.91 MiB │ +9.90 MiB │
│ load_protocol_configs │ 42.67 MiB │ 32.70 MiB │ +9.96 MiB │ 42.67 MiB │ 32.70 MiB │ +9.96 MiB │
│ DeviceConfigurationManagerBuilder.finish │ 10.14 MiB │ 394.39 KiB │ +9.75 MiB │ 10.14 MiB │ 394.39 KiB │ +9.75 MiB │
│ ServerDeviceManager │ 0 B │ 65 B │ -65 B │ 172.22 KiB │ 10.25 KiB │ +161.98 KiB │
│ ServerDeviceManagerBuilder::new │ 192 B │ 0 B │ +192 B │ 192 B │ 0 B │ +192 B │
│ builder.comm_manager(BtlePlugCommunicationManagerBuilder) │ 65 B │ 0 B │ +65 B │ 65 B │ 0 B │ +65 B │
│ ServerDeviceManagerBuilder.finish │ 171.97 KiB │ 10.18 KiB │ +161.79 KiB │ 171.97 KiB │ 10.18 KiB │ +161.79 KiB │
│ ButtplugServerBuilder │ 79 B │ 15 B │ +64 B │ 32.88 KiB │ 15 B │ +32.87 KiB │
│ ButtplugServerBuilder.finish │ 32.80 KiB │ 0 B │ +32.80 KiB │ 32.80 KiB │ 0 B │ +32.80 KiB │
│ ButtplugClientInProcessConnector │ 0 B │ 0 B │ +0 B │ 3.88 KiB │ 0 B │ +3.88 KiB │
│ ButtplugInProcessClientConnectorBuilder::default │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ButtplugClientInProcessConnectorBuilder.server │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ButtplugInProcessClientConnectorBuilder.finish │ 3.88 KiB │ 0 B │ +3.88 KiB │ 3.88 KiB │ 0 B │ +3.88 KiB │
│ ButtplugClient │ 0 B │ 0 B │ +0 B │ 73.18 KiB │ 12.03 KiB │ +61.15 KiB │
│ ButtplugClient::new │ 48.90 KiB │ 0 B │ +48.90 KiB │ 48.90 KiB │ 0 B │ +48.90 KiB │
│ ButtplugClient.connect │ 18.06 KiB │ 6.06 KiB │ +12.00 KiB │ 24.28 KiB │ 12.03 KiB │ +12.25 KiB │
│ InProcessClientConnectorEventSenderLoop │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ Client Loop Span │ 6.22 KiB │ 5.63 KiB │ +599 B │ 6.22 KiB │ 5.98 KiB │ +247 B │
│ Buttplug Server Message │ 0 B │ 248 B │ -248 B │ 0 B │ 248 B │ -248 B │
│ Buttplug Server Message │ 0 B │ 104 B │ -104 B │ 0 B │ 104 B │ -104 B │
└───────────────────────────────────────────────────────────────────┴────────────┴────────────┴─────────────┴────────────┴────────────┴─────────────┘After: > cargo test -p buttplug_tests --test test_memory -- --show-output
┌───────────────────────────────────────────────────────────────────┬────────────┬──────────┬─────────────┬────────────┬───────────┬─────────────┐
│ Span │ Own │ Total │
│ │ + │ - │ Δ │ + │ - │ Δ │
├───────────────────────────────────────────────────────────────────┼────────────┼──────────┼─────────────┼────────────┼───────────┼─────────────┤
│ memory_test │ 0 B │ 0 B │ +0 B │ 7.94 MiB │ 6.36 MiB │ +1.58 MiB │
│ DeviceConfigurationManager │ 0 B │ 0 B │ +0 B │ 7.71 MiB │ 6.35 MiB │ +1.36 MiB │
│ load_protocol_configs │ 7.71 MiB │ 6.35 MiB │ +1.36 MiB │ 7.71 MiB │ 6.35 MiB │ +1.36 MiB │
│ DeviceConfigurationManagerBuilder.finish │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ServerDeviceManager │ 0 B │ 65 B │ -65 B │ 125.12 KiB │ 228 B │ +124.89 KiB │
│ ServerDeviceManagerBuilder::new │ 192 B │ 0 B │ +192 B │ 192 B │ 0 B │ +192 B │
│ builder.comm_manager(BtlePlugCommunicationManagerBuilder) │ 65 B │ 0 B │ +65 B │ 65 B │ 0 B │ +65 B │
│ ServerDeviceManagerBuilder.finish │ 123.47 KiB │ 0 B │ +123.47 KiB │ 124.86 KiB │ 163 B │ +124.71 KiB │
│ BtlePlugCommunicationManager::adapter_task │ 1.40 KiB │ 163 B │ +1.24 KiB │ 1.40 KiB │ 163 B │ +1.24 KiB │
│ ServerDeviceManagerEventLoop │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ButtplugServerBuilder │ 79 B │ 15 B │ +64 B │ 32.88 KiB │ 15 B │ +32.87 KiB │
│ ButtplugServerBuilder.finish │ 32.80 KiB │ 0 B │ +32.80 KiB │ 32.80 KiB │ 0 B │ +32.80 KiB │
│ ButtplugClientInProcessConnector │ 0 B │ 0 B │ +0 B │ 3.88 KiB │ 0 B │ +3.88 KiB │
│ ButtplugInProcessClientConnectorBuilder::default │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ButtplugClientInProcessConnectorBuilder.server │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ButtplugInProcessClientConnectorBuilder.finish │ 3.88 KiB │ 0 B │ +3.88 KiB │ 3.88 KiB │ 0 B │ +3.88 KiB │
│ ButtplugClient │ 0 B │ 0 B │ +0 B │ 71.96 KiB │ 11.87 KiB │ +60.08 KiB │
│ ButtplugClient::new │ 48.90 KiB │ 0 B │ +48.90 KiB │ 48.90 KiB │ 0 B │ +48.90 KiB │
│ ButtplugClient.connect │ 16.84 KiB │ 5.90 KiB │ +10.94 KiB │ 23.05 KiB │ 11.87 KiB │ +11.18 KiB │
│ InProcessClientConnectorEventSenderLoop │ 0 B │ 0 B │ +0 B │ 0 B │ 0 B │ +0 B │
│ ClientLoop │ 6.22 KiB │ 5.63 KiB │ +599 B │ 6.22 KiB │ 5.98 KiB │ +247 B │
│ Buttplug Server Message │ 0 B │ 248 B │ -248 B │ 0 B │ 248 B │ -248 B │
│ Buttplug Server Message │ 0 B │ 104 B │ -104 B │ 0 B │ 104 B │ -104 B │
└───────────────────────────────────────────────────────────────────┴────────────┴──────────┴─────────────┴────────────┴───────────┴─────────────┘It's not perfect, but it does give a pretty great overview of where allocations are happening. |
…span memory usage.
Hi, I'm developing an embedded device for controlling devices using buttplug.io. These are the changes I've had to make to buttplug.io to make things work downstream. I have tried to clean them up and explain as best as I can why stuff is changed.
Feel like to cherry pick or even wholesale ignore this PR btw. I just figured I should at least attempt to upstream my changes rather than trying to keep a fork up to date.
The most critical changes are improvements to memory usage (esp in the device manager), AsyncManager as a trait, and the ability to serialize the DeviceConfigurationManager internals directly.
My data from doing the memory improvements in pasted below. But it basically went from using 10MB to 1,2MB, and with a lot less churn.
The AsyncManager replacement is needed since tokio isn't really able to run on embedded. Rather than creating a new feature specifically for my target board (and bring in a bunch more dependencies), I made it so that downstream crates can register which AsyncManager to use. They just need to make sure to do it before any other buttplug code is run.
The last change about serialization of DeviceConfigurationManager internals, is because using json for storing the device configs ends up being not great on embedded systems since it's not a very efficient format to store nor decode. I think using it as the primary choice in the library is still the correct choice overall, but with these changes it is possible for me to use a build script like: https://gist.github.com/jsmnbom/1be695db3b08725847669bd9a39d520d and then decode it like so:
With this data.gz.bin ends up being 57 kB which is a lot less than the original json file, and it's a lot faster and more efficient to load too.
Smaller changes include not using regex. This is simply because it's a pretty heavy dependency that ends up going unused if jsonschema is also unused. Thereby saving quite a lot of binary size.
small note: I believe there may be an issue with the way the runtime features of buttplug_core is propagated currently. I think the other crates depending on buttplug_core with default features means that the tokio runtime is technically always turned on.
Optimize device configuration loading
Use better data structures for device configuration management
ServerDeviceAttributes::features: BTreeMap -> LiteMap
ServerDeviceDefinition::features: BTreeMap -> LiteMap
Use of CompactStr + custom RangeInclusive + InputCommandType as enumflags2::bitflags
Use Arc for ServerDeviceDefinition in DeviceConfigurationManager::base_device_definitions to avoid cloning the definition for each identifier.
Decrease cloning when using ServerDeviceDefinitionBuilder.