[Blog] Fluss Rust SDK introduction blog#2934
[Blog] Fluss Rust SDK introduction blog#2934fresh-borzoni wants to merge 1 commit intoapache:mainfrom
Conversation
|
cc @luoyuxia @leekeiabstraction Please, take a look at overall structure, diagrams and visuals are pending until we like the framing and content. |
4b6efb3 to
072b613
Compare
leekeiabstraction
left a comment
There was a problem hiding this comment.
Thank you for the great post! I've added comments.
|
|
||
| When you write a record, the call is synchronous: the record gets queued into a per-bucket batch without touching the network. A background sender task picks up ready batches and ships them as RPCs to the responsible TabletServers. This follows the same pattern as both the Fluss Java client and Kafka producers. | ||
|
|
||
| The caller gets back a `WriteResultFuture`. Await it to block until the server confirms, or drop it for fire-and-forget. Either way, the server acknowledges the write with acks=all by default, so dropping the future skips the client-side wait, not the durability guarantee. |
There was a problem hiding this comment.
not the durability guarantee.
Arguably e2e durability is impacted if writes continuously failing and retried due to transient error and then client side gets restarted. I can imagine that high e2e durability minded users might want to handle future failure e.g. writing into local dead letter queue. Pretty sure advanced user would understand the implication of this but maybe not something that we want to risk being misconstrued.
|
|
||
| Batches ship automatically when they fill up or after a short timeout (100ms by default), so `flush()` isn't needed for data to reach the server. It's there for when you need to confirm that everything in flight has landed. If the write buffer fills up, new writes block until space frees up rather than silently consuming unbounded memory. | ||
|
|
||
| Fluss has two table types (primary key tables and log tables), and the Rust core has a writer for each: `UpsertWriter` for keyed upserts and deletes, `AppendWriter` for append-only log writes. Both support idempotent delivery, and `AppendWriter` can also accept Arrow `RecordBatch` directly if you already have columnar data. |
There was a problem hiding this comment.
maybe mention partial updates as well?
| We built fluss-rust on this same idea. A single Rust core implements the full Fluss client protocol (Protobuf-based RPC, record batching with backpressure, background I/O, Arrow serialization, idempotent writes, SASL authentication) and exposes it to three languages: | ||
|
|
||
| - **Rust**: directly, as the `fluss-rs` crate | ||
| - **Python**: via [PyO3](https://pyo3.rs), the Rust-Python bridge | ||
| - **C++**: via [CXX](https://cxx.rs), the Rust-C++ bridge | ||
|
|
||
| To give a sense of proportion: the Rust core is roughly 40k lines, while the Python binding is around 5k and the C++ binding around 6k. The bindings handle type conversion, async runtime bridging, and memory ownership at the language boundary, but all the protocol logic, batching, Arrow codec, and retry handling live in the shared core. |
There was a problem hiding this comment.
I wonder if we should include a diagram on this. The section reads well, adding diagram reinforces the message (and captures attention!).
IMO maybe the diagram can have fluss + rust mascots and replace the banner? More informative and also reads / shares well on sites like LinkedIn.
|
|
||
| The first is **DataFusion integration**. The Rust core already produces Arrow RecordBatches, which is exactly what DataFusion's table provider interface expects. Wiring the two together would let users run SQL queries directly over Fluss data from Rust or Python, without going through Flink. | ||
|
|
||
| The second is a **Fluss gateway service** built on top of the Rust core. Not every environment can load a native library. A lightweight Rust-based gateway could expose Fluss over HTTP or gRPC, making it accessible from any language or tool that can make a network call. The Rust SDK gives us the right foundation for that: a single process that handles the protocol, batching, and connection management, and serves multiple clients over a simple API. |
There was a problem hiding this comment.
Additionally here, I think we should mention that the community is gearing towards moving fluss-rust into fluss to streamline release & development process and is a strong signal for community's commitment for fluss-rust's continued development.
Fluss Rust SDK blog