feat: add rust kernels library for loading kernels by drbh · Pull Request #421 · huggingface/kernels

drbh · 2026-03-31T14:25:01Z

This PR adds a new client library for loading hf kernels in rust. This allow tvmffi based kernels to be called from rust and optionally integrates with candle for a better tensor ux.

Example usage with candle

use candle_core::{Device, Tensor};
use kernels::Result;
use kernels::candle::CallKernel;

fn main() -> Result<()> {
    let activation = kernels::candle::get_kernel("drbh/relu-tvm", 1)?;
    let device = activation.device()?;
    println!("Backend: {}", activation.backend());

    let x = Tensor::new(&[-1.0f32, 2.0, -3.0, 4.0, -0.5, 0.0, 1.5, -2.5], &device)?;
    let y = Tensor::zeros_like(&x)?;
    activation.call("relu", &[&y, &x])?;

    let result = y.to_vec1::<f32>()?;
    let expected = Tensor::new(&*x.to_vec1::<f32>()?, &Device::Cpu)?
        .relu()?
        .to_vec1::<f32>()?;

    println!("Input:    {:?}", x.to_vec1::<f32>()?);
    println!("TVM FFI:  {result:?}");
    println!("Candle:   {expected:?}");
    assert_eq!(result, expected);
    println!("OK");
    Ok(())
}

repo with candle and non candle examples https://github.com/drbh/hf-kernels-rust

danieldk · 2026-04-02T15:08:07Z

kernels-rs/src/backend.rs

+pub fn detect_cuda_version() -> Option<String> {
+    cuda_version_from_smi().or_else(cuda_version_from_nvcc)
+}
+
+fn cuda_version_from_smi() -> Option<String> {
+    let output = Command::new("nvidia-smi").output().ok()?;
+    if !output.status.success() {
+        return None;
+    }
+    let stdout = String::from_utf8_lossy(&output.stdout);
+    let rest = stdout.split("CUDA Version:").nth(1)?;
+    Some(rest.split_whitespace().next()?.to_string())
+}
+
+fn cuda_version_from_nvcc() -> Option<String> {
+    let output = Command::new("nvcc").arg("--version").output().ok()?;
+    let stdout = String::from_utf8_lossy(&output.stdout);
+    let after = stdout.split("release ").nth(1)?;
+    Some(after.split(',').next()?.trim().to_string())
+}


This may not be the same as the library that a framework is compiled against and dynamically loads. Also, nvidia-smi gives the driver library version, not the CUDA runtime version. We need to get it from cudart, e.g. see:

kernels/kernels/src/kernels/backends.py

Line 254 in 8ed7bb4

def _get_cuda() -> Optional[CUDA]:

libloading seems to be the most widely used library for dlopen:

https://github.com/nagisa/rust_libloading/

danieldk · 2026-04-02T15:10:10Z

kernels-rs/src/candle.rs

+    pub fn candle_device(self) -> Result<Device> {
+        match self {
+            BackendKind::Cpu => Ok(Device::Cpu),
+            #[cfg(feature = "candle-cuda")]
+            BackendKind::Cuda => Device::new_cuda(0).map_err(Into::into),
+            #[cfg(not(feature = "candle-cuda"))]
+            BackendKind::Cuda => Ok(Device::Cpu),
+            BackendKind::Xpu => Ok(Device::Cpu),
+        }
+    }


I think this can be TryFrom<BackendDevice> for Device. Not 100% sure if it works with the coherency rules, since it's a different mod in the same crate. But I think it should.

danieldk · 2026-04-02T15:12:54Z

kernels-rs/src/candle.rs

+    pub fn candle_supported(self) -> Self {
+        match self {
+            #[cfg(feature = "candle-cuda")]
+            BackendKind::Cuda => BackendKind::Cuda,
+            #[cfg(not(feature = "candle-cuda"))]
+            BackendKind::Cuda => BackendKind::Cpu,
+            other => other,
+        }
+    }


The function name is not very descriptive, maybe to_candle_supported?

danieldk · 2026-04-02T15:15:08Z

kernels-rs/src/candle.rs

+            #[allow(unreachable_patterns)]
+            _ => BackendKind::Cpu,


I think it would be better to explicitly enumerate the other variants here, so that we can rely on exhaustiveness checking when other variants get added?

Also it seems that as it is, if Candle returns a device type that we don't support, it would result in Cpu, which results in kernels that are not compatible with the device type?

danieldk · 2026-04-02T15:16:20Z

kernels-rs/src/candle.rs

+    macro_rules! ptr {
+        ($v:expr) => {
+            Ok(unsafe { $v.as_ptr().add(offset) as *mut c_void })
+        };
+    }


Remove, make explicit.

danieldk · 2026-04-02T15:19:31Z

kernels-rs/src/candle.rs

+    macro_rules! ptr {
+        ($slice:expr) => {{
+            let view = $slice.slice(offset..);
+            let (device_ptr, _sync) = view.device_ptr(&stream);
+            Ok(device_ptr as *mut c_void)
+        }};
+    }


I think rather than a macro, this could be a trait + impl? At least I think with a generic type it should work with one implementation for all cases?

danieldk · 2026-04-02T15:20:55Z

kernels-rs/src/candle.rs

+// Tensors are passed to the kernel as DLPack pointers directly into
+// candle's storage - no copies for contiguous tensors.
+pub trait CallKernel {
+    fn call(&self, func_name: &str, args: &[&Tensor]) -> Result<()>;


What if there are non-tensor argument, e.g. option bools, epsilon floats, etc.?

danieldk · 2026-04-02T15:22:45Z

kernels-rs/src/candle.rs

+        let device_type = match kind {
+            BackendKind::Cpu => tvm_ffi::DL_CPU,
+            BackendKind::Cuda => tvm_ffi::DL_CUDA,
+            BackendKind::Xpu => tvm_ffi::DL_ONEAPI,
+        };


Seems like this could use a From implementation outside the function?

drbh added 2 commits March 31, 2026 10:23

feat: add rust kernels library for loading kernels

0597b78

fix: bump candle version

939c3cc

drbh temporarily deployed to testpypi March 31, 2026 16:01 — with GitHub Actions Inactive

drbh marked this pull request as ready for review April 1, 2026 15:01

danieldk reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add rust kernels library for loading kernels#421

feat: add rust kernels library for loading kernels#421
drbh wants to merge 2 commits intomainfrom
add-kernels-rs

drbh commented Mar 31, 2026 •

edited

Loading

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

danieldk Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drbh commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example usage with candle

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drbh commented Mar 31, 2026 •

edited

Loading