Skip to content

Add BufferView: safe sub-range ndarray access for kernels#445

Draft
alanray-tech wants to merge 1 commit intoGenesis-Embodied-AI:mainfrom
alanray-tech:feature/buffer-view
Draft

Add BufferView: safe sub-range ndarray access for kernels#445
alanray-tech wants to merge 1 commit intoGenesis-Embodied-AI:mainfrom
alanray-tech:feature/buffer-view

Conversation

@alanray-tech
Copy link
Copy Markdown

@alanray-tech alanray-tech commented Apr 1, 2026

Summary

BufferView provides a safe, zero-copy sub-range view into an ndarray for kernel arguments. It rewrites view[i]arr[offset + i] at AST-translation time, requiring no IR modifications.

In debug mode (debug=True), it inserts runtime bounds assertions that report the kernel name, thread ID, file and line for every frame in the callstack.

Usage example:

import quadrants as qd
import numpy as np

qd.init(arch=qd.cuda, debug=True)

@qd.func
def func(view: qd.template(), idx: qd.i32):
    """Leaf function: performs the actual OOB access."""
    view[idx] = 99.0

@qd.kernel
def kernel(v: qd.types.buffer_view(qd.f32)):
    for i in range(v.count):
        if i == 0:
            func(v, 16)

N = 32
data = qd.ndarray(qd.f32, shape=(N,))
data.from_numpy(np.zeros(N, dtype=np.float32))

view = qd.BufferView(data, 0, 16)
kernel(view)

Output — the OOB access is caught at runtime with a clear diagnostic:

[Quadrants] version 0.4.6, llvm 22.1.0, commit 27c34a25, win, python 3.13.12
[Quadrants] Starting on arch=cuda
quadrants.lang.exception.QuadrantsAssertionError:
BufferView Out Of Range: kernel[kernel] tid=0, got index 16 (offset=0, count=16).
Callstack:
kernel (1_buffer_view_oob.py:13)
  func (1_buffer_view_oob.py:8)

BufferView also composes with higher-level abstractions. For example, the Block COO Sparse matrix library in qipc(IPC on quadrants) can use BufferView internally for its Triplet.MatrixView, BCOO.MatrixView, and Dense.VectorView types, enabling safe sub-range kernel dispatch:

import quadrants as qd
import numpy as np

qd.init(arch=qd.cuda, debug=True)

from qipc import Triplet, BCOO, Dense, mat3d

N_BLOCKS = 4
N_TRIPLETS = 7

triplet = Triplet.Matrix()
triplet.reshape(N_BLOCKS, N_BLOCKS)
triplet.reserve_triplets(N_TRIPLETS)
triplet.resize_triplets(N_TRIPLETS)
triplet.clear()


# ── Kernel uses Triplet.write(view, I, Triplet.Entry(...)) ──
@qd.kernel
def assemble_diagonal(view: Triplet.MatrixView, n_diag: qd.i32):
    for I in range(n_diag):
        block = mat3d(0.0)
        for k in range(3):
            block[k, k] = 2.0
        Triplet.write(view, I, Triplet.Entry(row=I, col=I, block=block))


@qd.kernel
def assemble_offdiag(view: Triplet.MatrixView, n_offdiag: qd.i32):
    for I in range(n_offdiag):
        block = mat3d(0.0)
        for k in range(3):
            block[k, k] = -1.0
        Triplet.write(view, I, Triplet.Entry(row=I, col=I + 1, block=block))

Files changed

File Change
python/quadrants/lang/buffer_view.py New — BufferView class
python/quadrants/types/buffer_view_type.py New — BufferViewType annotation
python/quadrants/lang/impl.py BufferView dispatch in subscript / assign
python/quadrants/lang/_func_base.py BufferViewType param handling
python/quadrants/lang/_template_mapper_hotpath.py BufferViewType cache key
python/quadrants/lang/ast/.../function_def_transformer.py AST decomposition for BufferViewType
python/quadrants/lang/__init__.py Export BufferView
python/quadrants/types/__init__.py Export buffer_view_type
python/quadrants/types/enums.py Better error message

Design notes

  • Zero C++ changes. All features are implemented purely in the Python AST transformation and runtime layers. No modifications to the IR, compiler passes, or pybind11 bindings.
  • BufferView bounds checking uses qd_assert at the AST level (Python-side), gated by cfg.debug. It does not use the IR-level CheckOutOfBound pass — this is intentional to avoid coupling with the global check_out_of_bound flag and to keep the implementation self-contained.

Test plan

BufferView provides a safe, zero-copy sub-range view into an ndarray
for kernel arguments. It rewrites view[i] to arr[offset + i] at
AST-translation time with zero IR changes.

In debug mode, inserts runtime bounds assertions with full callstack
diagnostics (kernel name, thread ID, file:line per frame).

Can be passed directly as a kernel parameter via
qd.types.buffer_view(dtype), which auto-decomposes into
(ndarray, offset, count) at compile time.

Minor: improve boundary enum error message to list valid options.
@alanray-tech alanray-tech force-pushed the feature/buffer-view branch from 7013b74 to 1d2fa27 Compare April 1, 2026 14:42
@duburcqa
Copy link
Copy Markdown
Contributor

duburcqa commented Apr 1, 2026

This API looks terrible. I'm strongly against merging this.

@alanray-tech
Copy link
Copy Markdown
Author

This API looks terrible. I'm strongly against merging this.

any better API for buffer view or just remove buffer view?

@duburcqa
Copy link
Copy Markdown
Contributor

duburcqa commented Apr 1, 2026

any better API for buffer view or just remove buffer view?

Yes! this:

N = 32
data = qd.ndarray(qd.f32, shape=(N,))
data.from_numpy(np.zeros(N, dtype=np.float32))
slice = data[:16]

@duburcqa
Copy link
Copy Markdown
Contributor

duburcqa commented Apr 1, 2026

Triplet.MatrixView should not be passed directly to kernels. We should expose a thin wrapper around it that is consistent with the existing qd ndarray / field API.

@duburcqa
Copy link
Copy Markdown
Contributor

duburcqa commented Apr 1, 2026

Triplet.write(view, I, Triplet.Entry(row=I, col=I + 1, block=block))

This API is also terrible. It should look like standard numpy code. Then it is IR translation layer (well, probably AST transform instead) that should be take care of all the boilerplate to translate python native style into the actual low-level python and/or assembly instructions.

@alanray-tech
Copy link
Copy Markdown
Author

Triplet.write(view, I, Triplet.Entry(row=I, col=I + 1, block=block))

This API is also terrible. It should look like standard numpy code. Then it is IR translation layer (well, probably AST transform instead) that should be take care of all the boilerplate to translate python native style into the actual low-level python and/or assembly instructions.

Agree, something I want to achieve is an automatic check on the triplet index(I), the submatrix index (row, col), to prevent any possible "logical out of bound" behaviour. I will appreciate it if you can design the API (I'm not so familiar with pythonic way of coding)

@duburcqa
Copy link
Copy Markdown
Contributor

duburcqa commented Apr 1, 2026

Ok let's work together on this starting from next week? Sounds like a nice project :)

@alanray-tech
Copy link
Copy Markdown
Author

sounds good.

@alanray-tech
Copy link
Copy Markdown
Author

alanray-tech commented Apr 1, 2026

To summarize my concern:

  1. BufferView to separate the ownership and accessibility, and as an abstraction for subsystem access
  2. Avoiding Logical Out Of Bound using structural accessing (like the API of TripletMatrix)
  3. Better Debug Info for human and agent.
  4. If we can manually label a buffer view to be const/non const, to prevent intended writing?

We should be cautious at any point in large numerical system.

@hughperkins hughperkins marked this pull request as draft April 1, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants