[Type] Tensor 1-16 by hughperkins · Pull Request #545 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-04-22T12:54:40Z

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

First step of the flexible-tensors series (hp/tensor-stork-N): introduce a Backend IntEnum with FIELD=0 and NDARRAY=1 that subsequent PRs will use to drive a per-tensor backend choice on the upcoming qd.tensor() factory. This PR ships the enum only — no factory, no layout, no kernel integration. Adds: - python/quadrants/_flexible.py with the Backend IntEnum. - Re-export through quadrants/__init__.py so users access it as qd.Backend. - tests/python/test_flexible_backend.py covering symbol export, IntEnum semantics, lookup by name/value, distinct members, and rejection of invalid values. - docs/source/user_guide/flexible_tensors.md seeded with a one-section user-facing description of qd.Backend, registered in the Core Concepts toctree of user_guide/index.md. Tests pass locally; sphinx make html succeeds with no new warnings on the flexible_tensors.md page.

Adds qd.tensor(dtype, shape, *, backend=Backend.FIELD, **kwargs) — a thin dispatcher over qd.field and qd.ndarray that selects the underlying allocator via the qd.Backend enum. Extra kwargs pass through verbatim, so backend- specific options (e.g. order= for fields) keep working. Default backend is Backend.FIELD to match existing Quadrants behaviour. Adds: - tensor() in python/quadrants/_flexible.py with a _coerce_backend helper that gives a clear error on invalid backend values. - Re-export through quadrants/__init__.py. - tests/python/test_flexible_factory.py: 10 tests covering default backend, explicit FIELD/NDARRAY selection, IntEnum coercion of int values, dtype propagation, int-shape normalisation, invalid-backend error, kwargs pass- through, and end-to-end kernel round-trip on each backend. - docs/source/user_guide/flexible_tensors.md: new "Allocating a tensor with qd.tensor()" section with a runnable example for each backend. - See-also link from tensor_types.md to the new page. All tests pass; sphinx build clean.

Element-typed companions to qd.tensor: dispatch over qd.Vector.field / qd.Vector.ndarray and qd.Matrix.field / qd.Matrix.ndarray via the same backend= keyword. kwargs pass through verbatim. Adds: - tensor_vec(n, dtype, shape, *, backend=, **kwargs) and tensor_mat(n, m, dtype, shape, *, backend=, **kwargs). - Re-export through quadrants/__init__.py. - 11 tests covering type-equivalence to the underlying Vector/Matrix factories on each backend, invalid-backend rejection, and end-to-end kernel round-trips for vec and mat on both backends. - "Vector and matrix tensors" section in the user guide. All tests pass; sphinx build clean.

Returns the kernel-argument annotation appropriate for a given backend: qd.template() instance for FIELD, qd.types.ndarray() instance for NDARRAY. Mirrors the Genesis V_ANNOTATION = qd.types.ndarray() if use_ndarray else qd.template pattern as a single first-class call so users can pick a backend in one place and reuse the annotation across kernels. Adds: - tensor_annotation(backend) in _flexible.py. - Re-export through __init__.py. - 6 tests: return type per backend, invalid-backend rejection, IntEnum coercion, end-to-end kernel use on each backend. - "Annotating kernel arguments" section in the user guide. All tests pass; sphinx build clean.

needs_grad already passes through the factory **kwargs (PRs 2-3); this PR makes that contract explicit with tests and a user-guide section. Adds: - 6 tests covering grad allocation, primal+grad kernel round-trip on field and ndarray scalar backends, and grad on tensor_vec/tensor_mat field. - "Gradients" section in the user guide showing the needs_grad= usage on both backends with a runnable example. Documented limitation: qd.Vector.ndarray and qd.Matrix.ndarray do not currently accept needs_grad — that's an upstream Quadrants limitation and is noted in a code comment inside the test file. All tests pass; sphinx build clean.

Adds a layout= tuple kwarg to qd.tensor() that picks the physical memory nesting order. The tuple is a permutation of range(ndim), outermost first (layout=(1, 0) for 2-D = transposed storage = order='ji'). Phase 2 ships the field side only. Non-identity layout on the ndarray backend raises NotImplementedError; identity layouts (None / range(ndim)) work on both backends. Full ndarray support requires the AST subscript rewrite, which lands in PR 8/13. Adds: - _layout_to_order() validator + translator (ValueError on length / non- permutation, returns None for identity). - layout= and order=-rejection wiring in tensor(). - 41 tests: identity & default behaviour on both backends, all rank-3 and rank-4 permutations on field, kernel canonical-indexing round-trip, rejection of bad layouts, rejection of order= kwarg, NotImplementedError on ndarray non-identity, layout= + needs_grad= composition. - "Controlling physical layout" section in the user guide with a clear caveat about ndarray support coming later. All tests pass; sphinx build clean.

…trip Test-only PR closing the gap left by PRs 5 and 6: the combination of layout= and needs_grad= was checked at allocation time, but not yet exercised through a kernel write/read on the .grad buffer with a non- identity layout. This pins down the pre-impl POC Q3b finding (grad SNode inherits the primal's axis_seq) so any upstream Quadrants regression in this area surfaces immediately. Adds 8 tests: - rank-2 transposed-storage primal+grad kernel round-trip - rank-3 (2, 0, 1) primal+grad kernel round-trip - all 6 rank-3 layout permutations with primal+grad written and read All tests pass; no production-code or doc changes.

…write Plumbs an optional canonical-axis permutation (``_qd_layout``) through the ndarray kernel-argument flow and into a single AST hook in ``build_Subscript`` that permutes user-supplied canonical indices into physical-storage order before forwarding. Touched files: - ``lang/any_array.py``: AnyArray gains a ``_qd_layout`` attribute that defaults to None (legacy / identity behaviour) and propagates verbatim through the ``.grad`` property. - ``lang/kernel_arguments.py``: ``decl_ndarray_arg`` accepts a layout kwarg and threads it into the AnyArray it returns. - ``lang/_template_mapper_hotpath.py``: ``_extract_arg`` for Ndarray / AnyArray / external-array branches now appends a trailing layout slot to the features tuple. None for legacy arrays. The slot becomes part of the kernel cache key automatically. - ``lang/ast/ast_transformers/function_def_transformer.py``: unpacks the new feature slot and forwards it to ``decl_ndarray_arg``. - ``lang/ast/ast_transformer.py``: ``build_Subscript`` permutes ``node.slice.ptr`` when ``node.value.ptr._qd_layout`` is set. None / identity layouts are no-ops, so existing IR is byte-identical. - ``_flexible.py``: private ``_with_layout(ndarray, layout)`` helper that tags an existing Ndarray. Used by the new tests; not part of the public API. The user-facing ``qd.tensor(..., backend=NDARRAY, layout=...)`` enable lands in PR 13. Tests: - 14 new tests in ``test_flexible_ndarray_layout_subscript.py``: untagged unaffected, identity layout byte-identical to no tag, rank-2 transpose matching transposed storage, all rank-3 permutations end-to-end, AugAssign through layout, grad inheriting layout, _with_layout validation. - Full Quadrants suite still passes: 2853 passed, 172 skipped, 4 xfailed, 1 xpassed (test_typing.py and test_pyi_stubs.py have pre-existing unrelated environment failures). Drive-bys (these arguably belong in earlier PRs but consolidating here to avoid an N-way force-push rebase across the stacked branches): - ``tests/python/test_api.py``: extend the public-symbol allow-list with Backend, tensor, tensor_annotation, tensor_mat, tensor_vec. - ``tests/python/test_flexible_factory.py``: PR 2's ``order=`` pass- through test was forward-incompatible with PR 6's ``order=`` rejection; switched to ``offset=``. All flexible-tensors tests pass; sphinx build clean.

… ndarrays Test-only PR pinning down the trickier subscript-rewrite paths that PR 8 covered only at the basic-AugAssign level: - All standard operators on the same canonical cell (+=, -=, *=, //=, %=) through one composite kernel. - Each AugAssign operator in its own kernel: +=, -=, *=, //=, %=, &=, |=, ^= - Read-and-write of the same canonical index in one statement (``x[i, j] = x[i, j] * 2 + x[i, j]``). - Neighbour dependence along a canonical axis (cumulative scan). - Mixed layout-tagged + untagged ndarrays in the same kernel — the rewrite must apply only to the tagged operand. - Three layout-tagged operands consumed by one composite expression. 11 tests, all pass.

…atterns Test-only PR. Investigation found that in-kernel rebinding (``y = x``) is **not** supported by Quadrants for any ndarray — that's an upstream limitation that raises ``QuadrantsTypeError: Invalid constant scalar data type: AnyArray`` regardless of layout. The test file's docstring spells this out explicitly. This PR pins down the aliasing patterns that Quadrants does support and that flexible-tensors layout metadata must propagate through: 1. Same Ndarray passed twice to the same kernel: both AnyArrays get the same layout via the runtime feature tuple. 2. Same Ndarray reused across two consecutive kernel calls: layout persists. 3. Repeated ``.grad`` access inside a single kernel: each access inherits layout from the parent AnyArray. 4. Same Ndarray bound to two kernels with different parameter names but compatible annotations: layout travels with the value, not the annotation. 5. Tagged + untagged ndarrays in the same kernel: layout isolation per argument. 6. Two separately-allocated tagged ndarrays: independence. 6 tests, all pass.

Test-only PR. PR 8's parametrized rank-3 test exercises every permutation on a single canonical cell. PR 11 widens that: - Rank 4: every permutation (24), full-grid value comparison. - Rank 5: 5 representative permutations (identity, full reverse, inner swap, cyclic shift, adjacent pair swaps), full-grid checks (32 cells). - Rank 6: 3 representative permutations, full-grid checks (64 cells). - Rank 4 + AugAssign + needs_grad on a non-trivial layout. - Rank 4 cross-check: tagged-with-layout vs direct-with-permuted-iteration produce the same physical buffer. Quadrants supports up to ``quadrants_max_num_indices=12``, so 6-D is well within the safe range; 5-D and 6-D ndranges become expensive at higher sizes so each axis is kept at 2. 34 tests, all pass.

Test-only PR. PR 8 plumbed _qd_layout through TemplateMapper features, making it part of the kernel cache key automatically. This file pins that contract via direct Kernel._primal.mapper.mapping inspection. Why this matters: if two different layouts shared a single compiled kernel, the AST subscript rewrite would fire exactly once (for the layout chosen at first compile), and subsequent calls with a different layout would silently mis-index or crash. 6 tests: - Two layouts on the same kernel produce two cache entries. - Untagged (None) vs identity-tagged ((0, 1)) are distinct cache entries (documents this for future "normalise identity to None" decisions). - Re-using the same layout reuses the cache entry. - Switching back and forth does not pollute the cache (still 2 entries). - Layout slot is the trailing element of the per-arg feature tuple. - Distinct kernels keep their mappers separate. All pass.

Removes the PR-6-era NotImplementedError gate. With the AST-rewrite plumbing landed in PR 8 and the cache-key contract pinned in PR 12, the public factory now wires non-identity layouts straight through to ndarray-backed tensors. Behaviour: - shape= is the **canonical** shape the user indexes inside kernels. - The factory allocates the underlying ndarray at the *physical* (permuted) shape: physical[k] = canonical[layout[k]]. - The instance is auto-tagged with _qd_layout, so kernel subscripts x[i, j, ...] are translated to physical access by build_Subscript. - Identity layouts (None or range(ndim)) collapse to no tag — same as the FIELD path — so untagged + identity-tagged stay byte-identical. - order= still forbidden as a kwarg (single source of truth: layout=). - ValueError up front for wrong-length / non-permutation layouts. Tests (test_flexible_factory_layout_ndarray.py, 18 cases): - No-tag / identity collapse - Physical shape and tag presence after rank-2 and rank-3 calls - Validation (length, permutation, order kwarg) - Rank-2 transpose matches the no-layout reference under transpose - Rank-2 explicit value spot-checks - Rank-3 every permutation parametrized - AugAssign through factory-allocated tensors - needs_grad layout inheritance - Cache-key distinction (factory-tagged path) Drive-by: PR-6 NotImplementedError tests in test_flexible_layout.py flipped to assert the factory now succeeds, with depth coverage delegated to the new file. All 180 flexible-tensors tests pass. Doc: flexible_tensors.md "Controlling physical layout" gains an "`layout=` on the ndarray backend" subsection with a worked example and a note about the current physical-shape `tensor.shape` quirk.

Adds a polymorphic kernel-argument annotation that dispatches at call time based on the runtime value type. The same @qd.kernel can now accept either a Field (treated like qd.template()) or a flexible-tensor Ndarray / AnyArray (treated like qd.types.ndarray()), with each branch producing its own cache entry. Why: Q2 of the pre-impl POCs showed that today's Genesis usage is homogeneous-per-run (so qd.tensor_annotation(backend) covers it), but backend-sweep benchmarks and library code that doesn't want to know how its callers allocate tensors need the same kernel object alive for both branches. PR 14 unblocks that without forcing the cost on homogeneous callers. Implementation: - _flexible.py: - _TensorTAnnotation(Template): subclass so the upfront slot detection in _func_base.py registers it as a template slot. - tensor_t = _TensorTAnnotation(): module singleton, exported via __all__ (and from quadrants/__init__.py). - _TENSOR_T_FIELD_MARKER / _TENSOR_T_NDARRAY_MARKER: cache-key salts to disambiguate the two branches. - _template_mapper_hotpath.py: - In _extract_arg, special-case _TensorTAnnotation before the template path. Dispatches by isinstance(arg, (Ndarray, AnyArray)): ndarray branch reuses the ndarray_type feature path (5-tuple) prefixed with NDARRAY marker; field branch falls through to the template path prefixed with FIELD marker. - function_def_transformer.py: - _decl_and_create_variable special-cases _TensorTAnnotation, reads the marker from this_arg_features, and dispatches to either decl_ndarray_arg or the template global_vars lookup. - _func_base.py: - Allows isinstance(annotation, template) (catches Template subclasses including tensor_t) in the up-front kernel-argument annotation validator. - In _recursive_set_args, retargets needed_arg_type from _TensorTAnnotation to a default ndarray_type.NdarrayType when v is an Ndarray (non-Ndarray values follow the template no-op launch path). Tests (tests/python/test_flexible_tensor_t.py, 9 cases): - Singleton identity + Template-subclass invariant - Accepts ndarray, ndarray with non-identity layout, field - Same kernel object accepts both backends and produces 2 cache entries - Re-using the same backend reuses the cache entry - Layouts on the same backend stay distinct in the cache - tensor_t is exposed via the qd namespace Drive-by: - test_api.py: tensor_t added to the public-API allow-list. - _func_base.py: extra `isinstance(annotation, template)` branch in the annotation validator (needed for any Template subclass). Docs: flexible_tensors.md gains a "Polymorphic kernel arguments" section explaining when to reach for tensor_t vs tensor_annotation, with a worked example. Full regression on cluster: 2951 passed, 0 failed.

The intro paragraph linked to `#fields` and `#ndarrays`, which never existed as headings on this page. The next paragraph already points readers at `tensor_types.md` for the underlying primitives, so just demote the broken links to plain code spans to satisfy markdown-link-check. Made-with: Cursor

Drops the 'flexible' prefix from filenames and identifiers introduced in this branch series so the user-visible names are simply 'tensor'. Also strips PR-N back-references that will be meaningless once these PRs land. Touches only files owned by this series (no changes to external/ or unrelated tests).

- black/ruff/pylint cleanups in python/quadrants/_tensor.py and the three new tensor test files (plus the import-block ordering nudged by the rename in python/quadrants/__init__.py). The pylint disable for the intentional late imports is hoisted to the module level so it does not force black to balloon the import lines (which then re-broke ruff). - Replace the broken `#fields` / `#ndarrays` in-page anchors in tensor.md with a single link to tensor_types.md, which is the actual page where field vs ndarray are described. Made-with: Cursor

…r-stork-7 # Conflicts: # python/quadrants/_tensor.py

…r-stork-8

…r-stork-9

…r-stork-10

…or-stork-11

…or-stork-12

…or-stork-13

…or-stork-14 # Conflicts: # docs/source/user_guide/tensor.md # python/quadrants/__init__.py # python/quadrants/_tensor.py # tests/python/test_api.py

Adds smoke tests that round-trip qd.Vector.tensor and qd.Matrix.tensor through a kernel signature annotated with qd.Tensor on both the field and ndarray backends. This pins the documented contract that qd.Tensor is the single polymorphic annotation across all factory variants (scalar, vector, matrix) and both backends.

Make every Python accessor on a layout-tagged Ndarray return the *canonical* view: shape, to_numpy(), from_numpy(), to_dlpack() (and therefore from_dlpack -> torch). _qd_layout becomes a purely internal performance hint; Genesis's qd_to_python / qd_to_torch / qd_to_numpy keep working unchanged. Quadrants changes: - _ndarray.py: factor out _invert_layout / _is_identity_layout helpers, apply np.transpose(arr, invperm(layout)) at the tail of _ndarray_to_numpy, and the symmetric permute on _ndarray_from_numpy (with shape validation now against canonical shape). to_dlpack passes _qd_layout through to the C++ binding. - dlpack_funcs.{h,cpp} + export_lang.cpp: ndarray_to_dlpack accepts an optional layout vector and exposes a canonical shape with permuted strides — strided DLPack export, no data movement. - _ndarray_pickle.py: serialise the canonical shape so the round-trip works in canonical terms (layout tag still intentionally dropped). - docs/source/user_guide/tensor.md: new "Interop with NumPy and PyTorch" section pinning the canonical-view contract. Tests: - test_tensor_layout_interop.py (new): parametrized over both backends and a representative layout set, covering to_numpy / from_numpy / to_dlpack round-trips, grad accessors, identity-layout no-op paths, and a Genesis-shaped (n_dofs, _B) + layout=(1, 0) smoke test. - existing layout test files (factory/aliasing/augassign/subscript/ higher_rank/grad): assertions migrated from the old physical-view contract to the new canonical-view contract.

…rnel `_make_fill_kernel` was using `qd.grouped(x)`, which yields *physical* indices on a layout-tagged ndarray. Combined with the canonical->physical AST rewrite on `x[I]`, this produced a double-permutation: the kernel ended up writing physical-flat values, and `to_numpy()`'s canonical transpose then read them back at the wrong positions. Switch to `qd.grouped(qd.ndrange(*shape))` so `I` is a canonical multi-index and the AST rewrite handles the physical translation.

…ernel Two fixes for tests that still leaked the old physical-view contract: 1. test_tensor_layout_interop._make_fill_kernel: the previous attempt switched from `qd.grouped(x)` to `qd.grouped(qd.ndrange(*shape))` to iterate the canonical index space. But the AST rewrite at `build_Subscript` only fires when the subscript arity matches `_qd_layout` length, so `x[I]` (single Vector index) bypasses the canonical->physical permutation and writes at canonical positions into the smaller physical buffer — silently OOB on permuted layouts, producing 75% partially-correct, 25% scrambled output. Switch to explicit `x[i, j] = ...` / `x[i, j, k] = ...` (one kernel per rank), matching the working pattern every other layout test in the suite uses. The AST rewrite then sees a 2- or 3-arg subscript that matches the layout length and applies the permutation. 2. test_tensor_annotation.test_tensor_accepts_ndarray_with_layout: asserted the old physical-view shape `(N, M)` and physical indexing `arr[3, 2]`. Updated to canonical `(M, N)` / `arr[2, 3]`.

build_Subscript only applied the canonical->physical layout permutation when the subscript arity matched len(_qd_layout). The single-Vector form ``x[I]`` (where I comes from ``qd.grouped(qd.ndrange(...))`` or ``qd.grouped(x)``) bypassed the rewrite and wrote at canonical indices into the smaller physical buffer -- silently OOB on non-square permuted layouts. Detect a single Matrix/Vector index whose rank matches len(layout), unpack into N scalar component subscripts, then permute. Both the Matrix (python backend) and Expr-with-tensor-shape (real kernels) forms are handled. Add regression tests covering both ``qd.grouped(qd.ndrange(...))`` and ``qd.grouped(x)`` sources, for rank 2 / 3 across both backends, plus a cross-check that ``x[I]`` and ``x[i, j]`` produce byte-identical output on the same layout-tagged tensor.

…nsors Pair with the build_Subscript fix: now that ``ndarray[I]`` permutes I canonical->physical when ``ndarray`` is layout-tagged, the bridge kernels ``ndarray_to_ext_arr`` and ``ext_arr_to_ndarray`` must iterate the *untagged* numpy operand so I stays canonical and the AST rewrite routes it to the right physical position. Iterating ``grouped(ndarray)`` would yield physical indices that the rewrite would then incorrectly re-permute, scrambling the copy. With the bridge kernels canonical-driven, ``_ndarray_to_numpy`` / ``_ndarray_from_numpy`` no longer need a python-side ``np.transpose`` fixup -- they just allocate / validate at the canonical shape and let the kernel do the right thing. Untagged ndarrays see canonical == physical and pay no extra cost.

- ``qd.Matrix`` only has one ``tensor`` classmethod; remove the duplicate from the expected list so the sorted comparison matches the dedup'd ``dir()`` output. - ``qd.Ndarray`` and its subclasses now expose ``shape`` as a ``@property`` (canonical-view contract for layout-tagged ndarrays); add it to the expected lists for ``Ndarray``, ``ScalarNdarray``, ``MatrixNdarray``, ``VectorNdarray``.

- _with_layout now tags the companion grad ndarray when needs_grad=True, so kernel code reading x.grad[...] uses the same canonical->physical AST rewrite as x[...]. Drops the explicit grad propagation in qd.tensor() since _with_layout handles it centrally. - build_struct_for: on `for I in qd.grouped(layout_tagged_ndarray)`, reorder the runtime-delivered physical loop indices into canonical order before binding I, so x[I] round-trips correctly through build_Subscript's permutation. - Skip field-backend dlpack tests with non-identity layout (pre-existing SNode-order limitation; field tensors that need dlpack must use identity order or the ndarray backend). - Fix test_layout_field_kernel_canonical_indexing_rank2 to pin the field backend explicitly (default is now NDARRAY).

Seven new tests in tests/python/test_tensor_layout_interop.py: - grouped-struct-for rank-3 all permutations (catches bugs that rank-2 self-inverse layouts hide, e.g. confusing layout with invperm). - .grad.to_numpy() rank-3: guards grad-tag propagation beyond rank 2. - xfail multi-target `for i, j in x` on layout-tagged ndarray, pinning the documented limitation so it flips red if lifted. - pickle round-trip: canonical shape is preserved, _qd_layout is intentionally dropped (per 8.7 of the design doc). - fill(val) and copy_from(src) round-trip on layout-tagged ndarrays. - .grad.to_dlpack() canonical-view, exercising the permuted-strides code path on the grad buffer. - Mixed kernel args: layout-tagged + untagged ndarray in the same kernel (the Genesis migration pattern).

Field never supported pickling and adding it requires re-allocating SNodes after the runtime is materialized (problematic). The easier path to symmetry is to remove ``__reduce__`` from ``Ndarray`` and document that neither backend supports pickle. Linesearch (the immediate Genesis migration target) doesn't pickle. Removes: - ``Ndarray.__reduce__`` and the ``_ndarray_pickle`` import - ``python/quadrants/lang/_ndarray_pickle.py`` - ``tests/python/test_pickle.py`` (9 upstream pickle tests) - ``test_pickle_layout_tagged_ndarray_roundtrip_drops_layout`` from the layout-interop test file (added in stork-15) Users who need to persist tensor data should ``to_numpy()`` and pickle the resulting array; reconstruct on the other side via ``from_numpy()``.

Previously ``field_to_dlpack`` (C++) rejected fields whose SNode chain placed axes in any order other than i, j, k, ..., while ``ndarray_to_dlpack`` (also C++) honoured the layout permutation and exposed a *canonical* view via permuted strides. That made ``tensor.to_dlpack()`` an asymmetric operation between the two backends and broke the "freely switch backend / layout" contract: ``qd.tensor(..., backend=qd.Backend.FIELD, layout=(1, 0))`` would raise where the same allocation under ``Backend.NDARRAY`` would not. Field side: - Replace the validate-only ``validate_axis_ordering`` with ``extract_memory_layout_order``, which walks the SNode chain root -> place and returns the canonical-axis index at each successive memory axis (outermost first). For ``order='ji'`` this yields ``{1, 0}``. - ``field_to_dlpack`` now consumes that vector exactly the same way ``ndarray_to_dlpack`` consumes its ``layout`` argument: build physical shape + strides, then expose canonical shape + permuted strides via the inverse permutation. Element axes (n, m for VectorField / MatrixField) sit innermost and are passed through unchanged. - Reject SNode chains whose memory-layout vector is not a permutation of {0, ..., ndim-1} (non-contiguous axis identifiers like qd.i + qd.l). Tests: - ``test_dlpack_non_sequenced_axes`` previously asserted RuntimeError for a (i, k, j)-ordered field; flip it to assert the canonical ``(3, 4, 2)`` shape with a non-contiguous stride layout. - ``test_to_dlpack_canonical_shape_rank{2,3}`` and ``test_genesis_shaped_dofs_batch_layout`` no longer skip the field backend for non-identity layouts — they assert the same canonical view on both backends.

Closes the remaining FIELD-vs-NDARRAY surface gaps so a single ``qd.tensor(...)`` call lets downstream code switch backend (and layout) freely: - ``Ndarray.to_torch(device=None)`` / ``Ndarray.from_torch(arr)`` — thin wrappers around the existing ``ndarray_to_ext_arr`` / ``ext_arr_to_ndarray`` bridge kernels (the kernels accept torch tensors via the same external-array interface as numpy arrays). Layout-tagged ndarrays produce a canonical view because the bridge kernels iterate the untagged external buffer canonically. - ``MatrixNdarray.to_torch`` / ``MatrixNdarray.from_torch`` and ``VectorNdarray.to_torch`` / ``VectorNdarray.from_torch`` — parallel methods built on a new ``_ndarray_matrix_to_torch`` / ``_ndarray_matrix_from_torch`` helper pair that mirrors the existing matrix-numpy helpers (they just allocate / accept a torch tensor instead of a numpy array and dispatch the same ``ndarray_matrix_to_ext_arr`` / ``ext_arr_to_ndarray_matrix`` kernels). - ``Ndarray.to_numpy(dtype=None)`` — accepts the same optional dtype cast ``Field.to_numpy`` already supports. - ``Ndarray.layout`` and ``Field.layout`` — public read-only property returning the canonical-axis-permutation tuple (or ``None`` for identity). Symmetric introspection accessor; downstream code can branch on layout without knowing which backend produced the tensor. ``qd.tensor(..., backend=qd.Backend.FIELD, layout=...)`` now stashes the permutation on the resulting field so ``Field.layout`` reports it (the SNode chain still encodes it physically; this is purely an introspection convenience for the unified factory). - Drops the now-vestigial ``Ndarray.layout = Layout.AOS`` data attribute (only consumed by the deleted ``_ndarray_pickle.py``; kept as a Python attribute on Mesh which is unrelated). Tests: ``test_api.py`` updated to expect ``layout`` on every Ndarray/Field subclass, and ``from_torch`` / ``to_torch`` on every ``Ndarray`` subclass.

``for i, j in layout_tagged_x`` previously delivered the runtime's *physical* loop indices straight into the user names, so ``i`` ended up holding the canonical-axis-1 value when ``_qd_layout = (1, 0)``. The grouped form (``for I in qd.grouped(x)``) was already canonicalised in stork-15; this commit closes the multi-target gap. In ``build_struct_for``'s non-grouped branch, when the iter target carries a non-identity ``_qd_layout``, allocate hidden physical ``Expr`` slots, pass those to ``begin_frontend_struct_for`` (so the runtime can fill them with physical indices), and create the user-visible names bound to ``phys_vars[invperm[canonical_idx]]``. Symmetric to the canonical->physical rewrite in :func:`build_Subscript`. Layout tagging now flows through to fields too (stashed by ``qd.tensor()`` for ``Backend.FIELD`` in the previous commit), so the fix applies uniformly to both backends — verified by parameterising the previously-xfail test over ``BACKENDS`` and all rank-2 layouts. Replaces ``test_multi_target_struct_for_on_layout_tagged_ndarray_xfail`` with the new ``test_multi_target_struct_for_on_layout_tagged_tensor``.

Adds ``test_tensor_backend_symmetry.py`` — a focused suite that pins the contract that the entire user-facing tensor surface behaves identically on ``Backend.FIELD`` and ``Backend.NDARRAY`` for any layout (identity or non-identity). Each fixed asymmetry from §8.9 of the design doc gets a parametrised test (backend × layout): - ``tensor.layout`` reports the user-supplied permutation (or None). - ``to_torch`` / ``from_torch`` round-trip with canonical-view semantics regardless of layout. - ``to_numpy(dtype=...)`` accepts the dtype kwarg on both backends. - ``pickle.dumps(tensor)`` raises symmetrically on both backends. - ``qd.tensor(..., order=...)`` is rejected on both backends (ditto any unknown kwarg) — the field-only ``order=`` escape hatch is closed off in the unified factory. - ``needs_grad=True`` works on both backends. - ``tensor.shape`` is canonical on both backends. If a future change re-introduces an asymmetry, one of these tests will fail loudly.

Updates the "Interop with NumPy and PyTorch" section to: - list ``to_torch(device=...)`` / ``from_torch(...)`` / ``layout`` / ``to_numpy(dtype=...)`` alongside the existing accessors, - explicitly note that the surface is identical on both backends so switching ``backend=`` requires no other call-site change, - show ``a.layout == (1, 0)`` as an introspection example, - show a ``to_torch`` / ``from_torch`` round-trip. Removes the prior implicit "field-only" claim about ``to_dlpack``; both backends now expose a canonical view via permuted strides under non-identity layout.

When ``qd.tensor()`` started stashing ``_qd_layout`` on the resulting Field (so ``Field.layout`` could introspect symmetrically with ``Ndarray.layout``), the existing canonical->physical AST rewrites in ``build_Subscript`` and ``build_struct_for`` started firing on fields too — and double-permuting them. Fields have no need for the rewrite: their SNode hierarchy already translates canonical indices to permuted physical addresses at the IR level via the ``order=`` keyword. Adds an ``isinstance(value, Ndarray)`` gate at every layout-rewrite site (one in ``build_Subscript``, two in ``build_struct_for``). Layout- tagged fields now flow through the legacy IR path unchanged; ndarrays get the same canonical-view treatment they had in stork-15. Caught by ``test_to_torch_canonical_view_round_trips[layout1-field]`` on the cluster.

The previous attempt to gate the canonical->physical AST rewrite on ``isinstance(node.value.ptr, Ndarray)`` didn't work: ``node.value.ptr`` is an IR-level expression object, not the original Ndarray, so the isinstance check missed and the rewrite was silently skipped on ndarrays too — caught by ``test_from_torch_canonical_round_trips`` where two canonical positions collided into the same physical offset because the layout tag was being ignored. Switch to attribute-name-based gating instead: - Ndarrays continue to use ``_qd_layout``, which is what ``build_Subscript`` / ``build_struct_for`` look for. Reverts the ``isinstance`` check at every rewrite site. - Fields use a separate ``_qd_field_layout`` attribute, set by ``qd.tensor()`` for ``Backend.FIELD`` and read by ``Field.layout``. The AST never sees ``_qd_layout`` on a field, so it never tries to double-permute their already-canonical IR. This keeps ``Field.layout`` and ``Ndarray.layout`` symmetric at the Python user level while keeping the IR rewrite strictly ndarray-only. Caught by cluster runs of test_tensor_backend_symmetry.py.

Made-with: Cursor

This reverts commit 9e78302.

- _ndarray_pickle.py: drop unreachable Layout.SOA check (Ndarray.layout is now a permutation tuple/None property, never a Layout enum). - test_pickle.py: drop the now-unfeasible test_pickle_soa_raises (cannot set the read-only layout property; the SOA branch is dead code). - test_tensor_backend_symmetry.py: split test_pickle_raises_on_both_backends into test_pickle_ndarray_works (round-trip) + test_pickle_field_raises. Pre-existing asymmetry preserved per scope clarification; symmetric pickle is now planned for the Tensor wrapper (§8.11).

- _tensor.py: use setattr to tag _qd_field_layout on field / grad so pyright doesn't flag the dynamic attribute on pybind classes (reportAttributeAccessIssue / reportOptionalMemberAccess). - args_hasher.py: guard len(obj.shape) with `obj.shape or ()` now that Ndarray.shape can return None during _reset (reportArgumentType). Made-with: Cursor

hughperkins added 30 commits April 20, 2026 08:59

hughperkins added 30 commits April 21, 2026 14:28

Merge remote-tracking branch 'origin/hp/tensor-stork-6' into hp/tenso…

920a695

…r-stork-7 # Conflicts: # python/quadrants/_tensor.py

Merge remote-tracking branch 'origin/hp/tensor-stork-7' into hp/tenso…

97b01a0

…r-stork-8

Merge remote-tracking branch 'origin/hp/tensor-stork-8' into hp/tenso…

579e83a

…r-stork-9

Merge remote-tracking branch 'origin/hp/tensor-stork-9' into hp/tenso…

5890cae

…r-stork-10

Merge remote-tracking branch 'origin/hp/tensor-stork-10' into hp/tens…

3bec4da

…or-stork-11

Merge remote-tracking branch 'origin/hp/tensor-stork-11' into hp/tens…

b0687a3

…or-stork-12

Merge remote-tracking branch 'origin/hp/tensor-stork-12' into hp/tens…

54e34f6

…or-stork-13

Merge remote-tracking branch 'origin/hp/tensor-stork-13' into hp/tens…

963eae3

…or-stork-14 # Conflicts: # docs/source/user_guide/tensor.md # python/quadrants/__init__.py # python/quadrants/_tensor.py # tests/python/test_api.py

test_api: add Field.layout to expected public surface

d76e620

Apply pre-commit formatting fixes

0256ad2

Made-with: Cursor

Revert "tensor: drop ndarray pickle for backend symmetry with field"

c0efcf5

This reverts commit 9e78302.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Type] Tensor 1-16#545

[Type] Tensor 1-16#545
hughperkins wants to merge 217 commits intomainfrom
hp/tensor-stork-16

hughperkins commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hughperkins commented Apr 22, 2026

Brief Summary

Walkthrough

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant