Skip to content

[Type] Tensor 1-16#545

Draft
hughperkins wants to merge 217 commits intomainfrom
hp/tensor-stork-16
Draft

[Type] Tensor 1-16#545
hughperkins wants to merge 217 commits intomainfrom
hp/tensor-stork-16

Conversation

@hughperkins
Copy link
Copy Markdown
Collaborator

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

First step of the flexible-tensors series (hp/tensor-stork-N): introduce a
Backend IntEnum with FIELD=0 and NDARRAY=1 that subsequent PRs will use to
drive a per-tensor backend choice on the upcoming qd.tensor() factory.

This PR ships the enum only — no factory, no layout, no kernel integration.

Adds:
- python/quadrants/_flexible.py with the Backend IntEnum.
- Re-export through quadrants/__init__.py so users access it as qd.Backend.
- tests/python/test_flexible_backend.py covering symbol export, IntEnum
  semantics, lookup by name/value, distinct members, and rejection of
  invalid values.
- docs/source/user_guide/flexible_tensors.md seeded with a one-section
  user-facing description of qd.Backend, registered in the Core Concepts
  toctree of user_guide/index.md.

Tests pass locally; sphinx make html succeeds with no new warnings on the
flexible_tensors.md page.
Adds qd.tensor(dtype, shape, *, backend=Backend.FIELD, **kwargs) — a thin
dispatcher over qd.field and qd.ndarray that selects the underlying allocator
via the qd.Backend enum. Extra kwargs pass through verbatim, so backend-
specific options (e.g. order= for fields) keep working.

Default backend is Backend.FIELD to match existing Quadrants behaviour.

Adds:
- tensor() in python/quadrants/_flexible.py with a _coerce_backend helper
  that gives a clear error on invalid backend values.
- Re-export through quadrants/__init__.py.
- tests/python/test_flexible_factory.py: 10 tests covering default backend,
  explicit FIELD/NDARRAY selection, IntEnum coercion of int values, dtype
  propagation, int-shape normalisation, invalid-backend error, kwargs pass-
  through, and end-to-end kernel round-trip on each backend.
- docs/source/user_guide/flexible_tensors.md: new "Allocating a tensor with
  qd.tensor()" section with a runnable example for each backend.
- See-also link from tensor_types.md to the new page.

All tests pass; sphinx build clean.
Element-typed companions to qd.tensor: dispatch over qd.Vector.field /
qd.Vector.ndarray and qd.Matrix.field / qd.Matrix.ndarray via the same
backend= keyword. kwargs pass through verbatim.

Adds:
- tensor_vec(n, dtype, shape, *, backend=, **kwargs) and
  tensor_mat(n, m, dtype, shape, *, backend=, **kwargs).
- Re-export through quadrants/__init__.py.
- 11 tests covering type-equivalence to the underlying Vector/Matrix
  factories on each backend, invalid-backend rejection, and end-to-end
  kernel round-trips for vec and mat on both backends.
- "Vector and matrix tensors" section in the user guide.

All tests pass; sphinx build clean.
Returns the kernel-argument annotation appropriate for a given backend:
qd.template() instance for FIELD, qd.types.ndarray() instance for NDARRAY.

Mirrors the Genesis V_ANNOTATION = qd.types.ndarray() if use_ndarray
else qd.template pattern as a single first-class call so users can pick a
backend in one place and reuse the annotation across kernels.

Adds:
- tensor_annotation(backend) in _flexible.py.
- Re-export through __init__.py.
- 6 tests: return type per backend, invalid-backend rejection, IntEnum
  coercion, end-to-end kernel use on each backend.
- "Annotating kernel arguments" section in the user guide.

All tests pass; sphinx build clean.
needs_grad already passes through the factory **kwargs (PRs 2-3); this PR
makes that contract explicit with tests and a user-guide section.

Adds:
- 6 tests covering grad allocation, primal+grad kernel round-trip on field
  and ndarray scalar backends, and grad on tensor_vec/tensor_mat field.
- "Gradients" section in the user guide showing the needs_grad= usage on
  both backends with a runnable example.

Documented limitation: qd.Vector.ndarray and qd.Matrix.ndarray do not
currently accept needs_grad — that's an upstream Quadrants limitation and
is noted in a code comment inside the test file.

All tests pass; sphinx build clean.
Adds a layout= tuple kwarg to qd.tensor() that picks the physical memory
nesting order. The tuple is a permutation of range(ndim), outermost first
(layout=(1, 0) for 2-D = transposed storage = order='ji').

Phase 2 ships the field side only. Non-identity layout on the ndarray
backend raises NotImplementedError; identity layouts (None / range(ndim))
work on both backends. Full ndarray support requires the AST subscript
rewrite, which lands in PR 8/13.

Adds:
- _layout_to_order() validator + translator (ValueError on length / non-
  permutation, returns None for identity).
- layout= and order=-rejection wiring in tensor().
- 41 tests: identity & default behaviour on both backends, all rank-3 and
  rank-4 permutations on field, kernel canonical-indexing round-trip,
  rejection of bad layouts, rejection of order= kwarg, NotImplementedError
  on ndarray non-identity, layout= + needs_grad= composition.
- "Controlling physical layout" section in the user guide with a clear
  caveat about ndarray support coming later.

All tests pass; sphinx build clean.
…trip

Test-only PR closing the gap left by PRs 5 and 6: the combination of
layout= and needs_grad= was checked at allocation time, but not yet
exercised through a kernel write/read on the .grad buffer with a non-
identity layout.

This pins down the pre-impl POC Q3b finding (grad SNode inherits the
primal's axis_seq) so any upstream Quadrants regression in this area
surfaces immediately.

Adds 8 tests:
- rank-2 transposed-storage primal+grad kernel round-trip
- rank-3 (2, 0, 1) primal+grad kernel round-trip
- all 6 rank-3 layout permutations with primal+grad written and read

All tests pass; no production-code or doc changes.
…write

Plumbs an optional canonical-axis permutation (``_qd_layout``) through the
ndarray kernel-argument flow and into a single AST hook in
``build_Subscript`` that permutes user-supplied canonical indices into
physical-storage order before forwarding.

Touched files:
- ``lang/any_array.py``: AnyArray gains a ``_qd_layout`` attribute that
  defaults to None (legacy / identity behaviour) and propagates verbatim
  through the ``.grad`` property.
- ``lang/kernel_arguments.py``: ``decl_ndarray_arg`` accepts a layout
  kwarg and threads it into the AnyArray it returns.
- ``lang/_template_mapper_hotpath.py``: ``_extract_arg`` for
  Ndarray / AnyArray / external-array branches now appends a trailing
  layout slot to the features tuple. None for legacy arrays. The slot
  becomes part of the kernel cache key automatically.
- ``lang/ast/ast_transformers/function_def_transformer.py``: unpacks
  the new feature slot and forwards it to ``decl_ndarray_arg``.
- ``lang/ast/ast_transformer.py``: ``build_Subscript`` permutes
  ``node.slice.ptr`` when ``node.value.ptr._qd_layout`` is set. None
  / identity layouts are no-ops, so existing IR is byte-identical.
- ``_flexible.py``: private ``_with_layout(ndarray, layout)`` helper that
  tags an existing Ndarray. Used by the new tests; not part of the
  public API. The user-facing ``qd.tensor(..., backend=NDARRAY, layout=...)``
  enable lands in PR 13.

Tests:
- 14 new tests in ``test_flexible_ndarray_layout_subscript.py``: untagged
  unaffected, identity layout byte-identical to no tag, rank-2 transpose
  matching transposed storage, all rank-3 permutations end-to-end,
  AugAssign through layout, grad inheriting layout, _with_layout
  validation.
- Full Quadrants suite still passes: 2853 passed, 172 skipped, 4 xfailed,
  1 xpassed (test_typing.py and test_pyi_stubs.py have pre-existing
  unrelated environment failures).

Drive-bys (these arguably belong in earlier PRs but consolidating here
to avoid an N-way force-push rebase across the stacked branches):
- ``tests/python/test_api.py``: extend the public-symbol allow-list with
  Backend, tensor, tensor_annotation, tensor_mat, tensor_vec.
- ``tests/python/test_flexible_factory.py``: PR 2's ``order=`` pass-
  through test was forward-incompatible with PR 6's ``order=`` rejection;
  switched to ``offset=``.

All flexible-tensors tests pass; sphinx build clean.
… ndarrays

Test-only PR pinning down the trickier subscript-rewrite paths that PR 8
covered only at the basic-AugAssign level:

- All standard operators on the same canonical cell (+=, -=, *=, //=, %=)
  through one composite kernel.
- Each AugAssign operator in its own kernel: +=, -=, *=, //=, %=, &=, |=, ^=
- Read-and-write of the same canonical index in one statement
  (``x[i, j] = x[i, j] * 2 + x[i, j]``).
- Neighbour dependence along a canonical axis (cumulative scan).
- Mixed layout-tagged + untagged ndarrays in the same kernel — the rewrite
  must apply only to the tagged operand.
- Three layout-tagged operands consumed by one composite expression.

11 tests, all pass.
…atterns

Test-only PR. Investigation found that in-kernel rebinding (``y = x``) is
**not** supported by Quadrants for any ndarray — that's an upstream
limitation that raises ``QuadrantsTypeError: Invalid constant scalar
data type: AnyArray`` regardless of layout. The test file's docstring
spells this out explicitly.

This PR pins down the aliasing patterns that Quadrants does support and
that flexible-tensors layout metadata must propagate through:

1. Same Ndarray passed twice to the same kernel: both AnyArrays get the
   same layout via the runtime feature tuple.
2. Same Ndarray reused across two consecutive kernel calls: layout
   persists.
3. Repeated ``.grad`` access inside a single kernel: each access
   inherits layout from the parent AnyArray.
4. Same Ndarray bound to two kernels with different parameter names but
   compatible annotations: layout travels with the value, not the
   annotation.
5. Tagged + untagged ndarrays in the same kernel: layout isolation per
   argument.
6. Two separately-allocated tagged ndarrays: independence.

6 tests, all pass.
Test-only PR. PR 8's parametrized rank-3 test exercises every permutation
on a single canonical cell. PR 11 widens that:

- Rank 4: every permutation (24), full-grid value comparison.
- Rank 5: 5 representative permutations (identity, full reverse, inner
  swap, cyclic shift, adjacent pair swaps), full-grid checks (32 cells).
- Rank 6: 3 representative permutations, full-grid checks (64 cells).
- Rank 4 + AugAssign + needs_grad on a non-trivial layout.
- Rank 4 cross-check: tagged-with-layout vs direct-with-permuted-iteration
  produce the same physical buffer.

Quadrants supports up to ``quadrants_max_num_indices=12``, so 6-D is well
within the safe range; 5-D and 6-D ndranges become expensive at higher
sizes so each axis is kept at 2.

34 tests, all pass.
Test-only PR. PR 8 plumbed _qd_layout through TemplateMapper features,
making it part of the kernel cache key automatically. This file pins
that contract via direct Kernel._primal.mapper.mapping inspection.

Why this matters: if two different layouts shared a single compiled
kernel, the AST subscript rewrite would fire exactly once (for the
layout chosen at first compile), and subsequent calls with a different
layout would silently mis-index or crash.

6 tests:
- Two layouts on the same kernel produce two cache entries.
- Untagged (None) vs identity-tagged ((0, 1)) are distinct cache entries
  (documents this for future "normalise identity to None" decisions).
- Re-using the same layout reuses the cache entry.
- Switching back and forth does not pollute the cache (still 2 entries).
- Layout slot is the trailing element of the per-arg feature tuple.
- Distinct kernels keep their mappers separate.

All pass.
Removes the PR-6-era NotImplementedError gate. With the AST-rewrite
plumbing landed in PR 8 and the cache-key contract pinned in PR 12,
the public factory now wires non-identity layouts straight through to
ndarray-backed tensors.

Behaviour:
- shape= is the **canonical** shape the user indexes inside kernels.
- The factory allocates the underlying ndarray at the *physical*
  (permuted) shape: physical[k] = canonical[layout[k]].
- The instance is auto-tagged with _qd_layout, so kernel subscripts
  x[i, j, ...] are translated to physical access by build_Subscript.
- Identity layouts (None or range(ndim)) collapse to no tag — same as
  the FIELD path — so untagged + identity-tagged stay byte-identical.
- order= still forbidden as a kwarg (single source of truth: layout=).
- ValueError up front for wrong-length / non-permutation layouts.

Tests (test_flexible_factory_layout_ndarray.py, 18 cases):
- No-tag / identity collapse
- Physical shape and tag presence after rank-2 and rank-3 calls
- Validation (length, permutation, order kwarg)
- Rank-2 transpose matches the no-layout reference under transpose
- Rank-2 explicit value spot-checks
- Rank-3 every permutation parametrized
- AugAssign through factory-allocated tensors
- needs_grad layout inheritance
- Cache-key distinction (factory-tagged path)

Drive-by: PR-6 NotImplementedError tests in test_flexible_layout.py
flipped to assert the factory now succeeds, with depth coverage
delegated to the new file. All 180 flexible-tensors tests pass.

Doc: flexible_tensors.md "Controlling physical layout" gains an
"`layout=` on the ndarray backend" subsection with a worked example
and a note about the current physical-shape `tensor.shape` quirk.
Adds a polymorphic kernel-argument annotation that dispatches at call
time based on the runtime value type. The same @qd.kernel can now
accept either a Field (treated like qd.template()) or a flexible-tensor
Ndarray / AnyArray (treated like qd.types.ndarray()), with each branch
producing its own cache entry.

Why: Q2 of the pre-impl POCs showed that today's Genesis usage is
homogeneous-per-run (so qd.tensor_annotation(backend) covers it), but
backend-sweep benchmarks and library code that doesn't want to know
how its callers allocate tensors need the same kernel object alive
for both branches. PR 14 unblocks that without forcing the cost on
homogeneous callers.

Implementation:
- _flexible.py:
  - _TensorTAnnotation(Template): subclass so the upfront slot
    detection in _func_base.py registers it as a template slot.
  - tensor_t = _TensorTAnnotation(): module singleton, exported via
    __all__ (and from quadrants/__init__.py).
  - _TENSOR_T_FIELD_MARKER / _TENSOR_T_NDARRAY_MARKER: cache-key
    salts to disambiguate the two branches.
- _template_mapper_hotpath.py:
  - In _extract_arg, special-case _TensorTAnnotation before the
    template path. Dispatches by isinstance(arg, (Ndarray, AnyArray)):
    ndarray branch reuses the ndarray_type feature path (5-tuple)
    prefixed with NDARRAY marker; field branch falls through to the
    template path prefixed with FIELD marker.
- function_def_transformer.py:
  - _decl_and_create_variable special-cases _TensorTAnnotation,
    reads the marker from this_arg_features, and dispatches to either
    decl_ndarray_arg or the template global_vars lookup.
- _func_base.py:
  - Allows isinstance(annotation, template) (catches Template
    subclasses including tensor_t) in the up-front kernel-argument
    annotation validator.
  - In _recursive_set_args, retargets needed_arg_type from
    _TensorTAnnotation to a default ndarray_type.NdarrayType when v
    is an Ndarray (non-Ndarray values follow the template no-op
    launch path).

Tests (tests/python/test_flexible_tensor_t.py, 9 cases):
- Singleton identity + Template-subclass invariant
- Accepts ndarray, ndarray with non-identity layout, field
- Same kernel object accepts both backends and produces 2 cache entries
- Re-using the same backend reuses the cache entry
- Layouts on the same backend stay distinct in the cache
- tensor_t is exposed via the qd namespace

Drive-by:
- test_api.py: tensor_t added to the public-API allow-list.
- _func_base.py: extra `isinstance(annotation, template)` branch
  in the annotation validator (needed for any Template subclass).

Docs: flexible_tensors.md gains a "Polymorphic kernel arguments"
section explaining when to reach for tensor_t vs tensor_annotation,
with a worked example.

Full regression on cluster: 2951 passed, 0 failed.
The intro paragraph linked to `#fields` and `#ndarrays`, which never
existed as headings on this page. The next paragraph already points
readers at `tensor_types.md` for the underlying primitives, so just
demote the broken links to plain code spans to satisfy
markdown-link-check.

Made-with: Cursor
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
Drops the 'flexible' prefix from filenames and identifiers introduced
in this branch series so the user-visible names are simply 'tensor'.
Also strips PR-N back-references that will be meaningless once these
PRs land. Touches only files owned by this series (no changes to
external/ or unrelated tests).
- black/ruff/pylint cleanups in python/quadrants/_tensor.py and the
  three new tensor test files (plus the import-block ordering nudged
  by the rename in python/quadrants/__init__.py). The pylint disable
  for the intentional late imports is hoisted to the module level so
  it does not force black to balloon the import lines (which then
  re-broke ruff).
- Replace the broken `#fields` / `#ndarrays` in-page anchors in
  tensor.md with a single link to tensor_types.md, which is the
  actual page where field vs ndarray are described.

Made-with: Cursor
…r-stork-7

# Conflicts:
#	python/quadrants/_tensor.py
…or-stork-14

# Conflicts:
#	docs/source/user_guide/tensor.md
#	python/quadrants/__init__.py
#	python/quadrants/_tensor.py
#	tests/python/test_api.py
Adds smoke tests that round-trip qd.Vector.tensor and qd.Matrix.tensor
through a kernel signature annotated with qd.Tensor on both the field
and ndarray backends. This pins the documented contract that qd.Tensor
is the single polymorphic annotation across all factory variants
(scalar, vector, matrix) and both backends.
Make every Python accessor on a layout-tagged Ndarray return the
*canonical* view: shape, to_numpy(), from_numpy(), to_dlpack() (and
therefore from_dlpack -> torch). _qd_layout becomes a purely internal
performance hint; Genesis's qd_to_python / qd_to_torch / qd_to_numpy
keep working unchanged.

Quadrants changes:

- _ndarray.py: factor out _invert_layout / _is_identity_layout helpers,
  apply np.transpose(arr, invperm(layout)) at the tail of
  _ndarray_to_numpy, and the symmetric permute on _ndarray_from_numpy
  (with shape validation now against canonical shape). to_dlpack
  passes _qd_layout through to the C++ binding.
- dlpack_funcs.{h,cpp} + export_lang.cpp: ndarray_to_dlpack accepts an
  optional layout vector and exposes a canonical shape with permuted
  strides — strided DLPack export, no data movement.
- _ndarray_pickle.py: serialise the canonical shape so the round-trip
  works in canonical terms (layout tag still intentionally dropped).
- docs/source/user_guide/tensor.md: new "Interop with NumPy and PyTorch"
  section pinning the canonical-view contract.

Tests:

- test_tensor_layout_interop.py (new): parametrized over both backends
  and a representative layout set, covering to_numpy / from_numpy /
  to_dlpack round-trips, grad accessors, identity-layout no-op paths,
  and a Genesis-shaped (n_dofs, _B) + layout=(1, 0) smoke test.
- existing layout test files (factory/aliasing/augassign/subscript/
  higher_rank/grad): assertions migrated from the old physical-view
  contract to the new canonical-view contract.
…rnel

`_make_fill_kernel` was using `qd.grouped(x)`, which yields *physical*
indices on a layout-tagged ndarray. Combined with the canonical->physical
AST rewrite on `x[I]`, this produced a double-permutation: the kernel
ended up writing physical-flat values, and `to_numpy()`'s canonical
transpose then read them back at the wrong positions.

Switch to `qd.grouped(qd.ndrange(*shape))` so `I` is a canonical
multi-index and the AST rewrite handles the physical translation.
…ernel

Two fixes for tests that still leaked the old physical-view contract:

1. test_tensor_layout_interop._make_fill_kernel: the previous attempt
   switched from `qd.grouped(x)` to `qd.grouped(qd.ndrange(*shape))` to
   iterate the canonical index space. But the AST rewrite at
   `build_Subscript` only fires when the subscript arity matches
   `_qd_layout` length, so `x[I]` (single Vector index) bypasses the
   canonical->physical permutation and writes at canonical positions
   into the smaller physical buffer — silently OOB on permuted layouts,
   producing 75% partially-correct, 25% scrambled output.

   Switch to explicit `x[i, j] = ...` / `x[i, j, k] = ...` (one kernel
   per rank), matching the working pattern every other layout test in
   the suite uses. The AST rewrite then sees a 2- or 3-arg subscript
   that matches the layout length and applies the permutation.

2. test_tensor_annotation.test_tensor_accepts_ndarray_with_layout:
   asserted the old physical-view shape `(N, M)` and physical indexing
   `arr[3, 2]`. Updated to canonical `(M, N)` / `arr[2, 3]`.
build_Subscript only applied the canonical->physical layout permutation
when the subscript arity matched len(_qd_layout). The single-Vector
form ``x[I]`` (where I comes from ``qd.grouped(qd.ndrange(...))`` or
``qd.grouped(x)``) bypassed the rewrite and wrote at canonical indices
into the smaller physical buffer -- silently OOB on non-square
permuted layouts.

Detect a single Matrix/Vector index whose rank matches len(layout),
unpack into N scalar component subscripts, then permute. Both the
Matrix (python backend) and Expr-with-tensor-shape (real kernels)
forms are handled.

Add regression tests covering both ``qd.grouped(qd.ndrange(...))`` and
``qd.grouped(x)`` sources, for rank 2 / 3 across both backends, plus a
cross-check that ``x[I]`` and ``x[i, j]`` produce byte-identical
output on the same layout-tagged tensor.
…nsors

Pair with the build_Subscript fix: now that ``ndarray[I]`` permutes I
canonical->physical when ``ndarray`` is layout-tagged, the bridge
kernels ``ndarray_to_ext_arr`` and ``ext_arr_to_ndarray`` must iterate
the *untagged* numpy operand so I stays canonical and the AST rewrite
routes it to the right physical position. Iterating ``grouped(ndarray)``
would yield physical indices that the rewrite would then incorrectly
re-permute, scrambling the copy.

With the bridge kernels canonical-driven, ``_ndarray_to_numpy`` /
``_ndarray_from_numpy`` no longer need a python-side ``np.transpose``
fixup -- they just allocate / validate at the canonical shape and let
the kernel do the right thing. Untagged ndarrays see canonical ==
physical and pay no extra cost.
- ``qd.Matrix`` only has one ``tensor`` classmethod; remove the
  duplicate from the expected list so the sorted comparison matches
  the dedup'd ``dir()`` output.
- ``qd.Ndarray`` and its subclasses now expose ``shape`` as a
  ``@property`` (canonical-view contract for layout-tagged ndarrays);
  add it to the expected lists for ``Ndarray``, ``ScalarNdarray``,
  ``MatrixNdarray``, ``VectorNdarray``.
- _with_layout now tags the companion grad ndarray when needs_grad=True,
  so kernel code reading x.grad[...] uses the same canonical->physical
  AST rewrite as x[...]. Drops the explicit grad propagation in
  qd.tensor() since _with_layout handles it centrally.
- build_struct_for: on `for I in qd.grouped(layout_tagged_ndarray)`,
  reorder the runtime-delivered physical loop indices into canonical
  order before binding I, so x[I] round-trips correctly through
  build_Subscript's permutation.
- Skip field-backend dlpack tests with non-identity layout (pre-existing
  SNode-order limitation; field tensors that need dlpack must use
  identity order or the ndarray backend).
- Fix test_layout_field_kernel_canonical_indexing_rank2 to pin the
  field backend explicitly (default is now NDARRAY).
Seven new tests in tests/python/test_tensor_layout_interop.py:

- grouped-struct-for rank-3 all permutations (catches bugs that rank-2
  self-inverse layouts hide, e.g. confusing layout with invperm).
- .grad.to_numpy() rank-3: guards grad-tag propagation beyond rank 2.
- xfail multi-target `for i, j in x` on layout-tagged ndarray, pinning
  the documented limitation so it flips red if lifted.
- pickle round-trip: canonical shape is preserved, _qd_layout is
  intentionally dropped (per 8.7 of the design doc).
- fill(val) and copy_from(src) round-trip on layout-tagged ndarrays.
- .grad.to_dlpack() canonical-view, exercising the permuted-strides
  code path on the grad buffer.
- Mixed kernel args: layout-tagged + untagged ndarray in the same
  kernel (the Genesis migration pattern).
Field never supported pickling and adding it requires re-allocating
SNodes after the runtime is materialized (problematic). The easier path
to symmetry is to remove ``__reduce__`` from ``Ndarray`` and document
that neither backend supports pickle. Linesearch (the immediate Genesis
migration target) doesn't pickle.

Removes:
- ``Ndarray.__reduce__`` and the ``_ndarray_pickle`` import
- ``python/quadrants/lang/_ndarray_pickle.py``
- ``tests/python/test_pickle.py`` (9 upstream pickle tests)
- ``test_pickle_layout_tagged_ndarray_roundtrip_drops_layout`` from the
  layout-interop test file (added in stork-15)

Users who need to persist tensor data should ``to_numpy()`` and pickle
the resulting array; reconstruct on the other side via ``from_numpy()``.
Previously ``field_to_dlpack`` (C++) rejected fields whose SNode chain
placed axes in any order other than i, j, k, ..., while
``ndarray_to_dlpack`` (also C++) honoured the layout permutation and
exposed a *canonical* view via permuted strides. That made
``tensor.to_dlpack()`` an asymmetric operation between the two
backends and broke the "freely switch backend / layout" contract:
``qd.tensor(..., backend=qd.Backend.FIELD, layout=(1, 0))`` would raise
where the same allocation under ``Backend.NDARRAY`` would not.

Field side:
- Replace the validate-only ``validate_axis_ordering`` with
  ``extract_memory_layout_order``, which walks the SNode chain root ->
  place and returns the canonical-axis index at each successive memory
  axis (outermost first). For ``order='ji'`` this yields ``{1, 0}``.
- ``field_to_dlpack`` now consumes that vector exactly the same way
  ``ndarray_to_dlpack`` consumes its ``layout`` argument: build physical
  shape + strides, then expose canonical shape + permuted strides via
  the inverse permutation. Element axes (n, m for VectorField /
  MatrixField) sit innermost and are passed through unchanged.
- Reject SNode chains whose memory-layout vector is not a permutation
  of {0, ..., ndim-1} (non-contiguous axis identifiers like qd.i + qd.l).

Tests:
- ``test_dlpack_non_sequenced_axes`` previously asserted RuntimeError
  for a (i, k, j)-ordered field; flip it to assert the canonical
  ``(3, 4, 2)`` shape with a non-contiguous stride layout.
- ``test_to_dlpack_canonical_shape_rank{2,3}`` and
  ``test_genesis_shaped_dofs_batch_layout`` no longer skip the field
  backend for non-identity layouts — they assert the same canonical
  view on both backends.
Closes the remaining FIELD-vs-NDARRAY surface gaps so a single
``qd.tensor(...)`` call lets downstream code switch backend (and layout)
freely:

- ``Ndarray.to_torch(device=None)`` / ``Ndarray.from_torch(arr)`` —
  thin wrappers around the existing ``ndarray_to_ext_arr`` /
  ``ext_arr_to_ndarray`` bridge kernels (the kernels accept torch
  tensors via the same external-array interface as numpy arrays).
  Layout-tagged ndarrays produce a canonical view because the bridge
  kernels iterate the untagged external buffer canonically.
- ``MatrixNdarray.to_torch`` / ``MatrixNdarray.from_torch`` and
  ``VectorNdarray.to_torch`` / ``VectorNdarray.from_torch`` —
  parallel methods built on a new ``_ndarray_matrix_to_torch`` /
  ``_ndarray_matrix_from_torch`` helper pair that mirrors the
  existing matrix-numpy helpers (they just allocate / accept a torch
  tensor instead of a numpy array and dispatch the same
  ``ndarray_matrix_to_ext_arr`` / ``ext_arr_to_ndarray_matrix``
  kernels).
- ``Ndarray.to_numpy(dtype=None)`` — accepts the same optional dtype
  cast ``Field.to_numpy`` already supports.
- ``Ndarray.layout`` and ``Field.layout`` — public read-only property
  returning the canonical-axis-permutation tuple (or ``None`` for
  identity). Symmetric introspection accessor; downstream code can
  branch on layout without knowing which backend produced the tensor.
  ``qd.tensor(..., backend=qd.Backend.FIELD, layout=...)`` now stashes
  the permutation on the resulting field so ``Field.layout`` reports
  it (the SNode chain still encodes it physically; this is purely an
  introspection convenience for the unified factory).
- Drops the now-vestigial ``Ndarray.layout = Layout.AOS`` data
  attribute (only consumed by the deleted ``_ndarray_pickle.py``;
  kept as a Python attribute on Mesh which is unrelated).

Tests: ``test_api.py`` updated to expect ``layout`` on every
Ndarray/Field subclass, and ``from_torch`` / ``to_torch`` on every
``Ndarray`` subclass.
``for i, j in layout_tagged_x`` previously delivered the runtime's
*physical* loop indices straight into the user names, so ``i`` ended up
holding the canonical-axis-1 value when ``_qd_layout = (1, 0)``. The
grouped form (``for I in qd.grouped(x)``) was already canonicalised in
stork-15; this commit closes the multi-target gap.

In ``build_struct_for``'s non-grouped branch, when the iter target
carries a non-identity ``_qd_layout``, allocate hidden physical
``Expr`` slots, pass those to ``begin_frontend_struct_for`` (so the
runtime can fill them with physical indices), and create the
user-visible names bound to ``phys_vars[invperm[canonical_idx]]``.
Symmetric to the canonical->physical rewrite in :func:`build_Subscript`.

Layout tagging now flows through to fields too (stashed by
``qd.tensor()`` for ``Backend.FIELD`` in the previous commit), so the
fix applies uniformly to both backends — verified by parameterising
the previously-xfail test over ``BACKENDS`` and all rank-2 layouts.

Replaces ``test_multi_target_struct_for_on_layout_tagged_ndarray_xfail``
with the new ``test_multi_target_struct_for_on_layout_tagged_tensor``.
Adds ``test_tensor_backend_symmetry.py`` — a focused suite that pins
the contract that the entire user-facing tensor surface behaves
identically on ``Backend.FIELD`` and ``Backend.NDARRAY`` for any layout
(identity or non-identity).

Each fixed asymmetry from §8.9 of the design doc gets a parametrised
test (backend × layout):
- ``tensor.layout`` reports the user-supplied permutation (or None).
- ``to_torch`` / ``from_torch`` round-trip with canonical-view
  semantics regardless of layout.
- ``to_numpy(dtype=...)`` accepts the dtype kwarg on both backends.
- ``pickle.dumps(tensor)`` raises symmetrically on both backends.
- ``qd.tensor(..., order=...)`` is rejected on both backends (ditto
  any unknown kwarg) — the field-only ``order=`` escape hatch is
  closed off in the unified factory.
- ``needs_grad=True`` works on both backends.
- ``tensor.shape`` is canonical on both backends.

If a future change re-introduces an asymmetry, one of these tests
will fail loudly.
Updates the "Interop with NumPy and PyTorch" section to:
- list ``to_torch(device=...)`` / ``from_torch(...)`` / ``layout`` /
  ``to_numpy(dtype=...)`` alongside the existing accessors,
- explicitly note that the surface is identical on both backends so
  switching ``backend=`` requires no other call-site change,
- show ``a.layout == (1, 0)`` as an introspection example,
- show a ``to_torch`` / ``from_torch`` round-trip.

Removes the prior implicit "field-only" claim about ``to_dlpack``;
both backends now expose a canonical view via permuted strides under
non-identity layout.
When ``qd.tensor()`` started stashing ``_qd_layout`` on the resulting
Field (so ``Field.layout`` could introspect symmetrically with
``Ndarray.layout``), the existing canonical->physical AST rewrites in
``build_Subscript`` and ``build_struct_for`` started firing on fields
too — and double-permuting them. Fields have no need for the rewrite:
their SNode hierarchy already translates canonical indices to permuted
physical addresses at the IR level via the ``order=`` keyword.

Adds an ``isinstance(value, Ndarray)`` gate at every layout-rewrite
site (one in ``build_Subscript``, two in ``build_struct_for``). Layout-
tagged fields now flow through the legacy IR path unchanged; ndarrays
get the same canonical-view treatment they had in stork-15.

Caught by ``test_to_torch_canonical_view_round_trips[layout1-field]``
on the cluster.
The previous attempt to gate the canonical->physical AST rewrite on
``isinstance(node.value.ptr, Ndarray)`` didn't work: ``node.value.ptr``
is an IR-level expression object, not the original Ndarray, so the
isinstance check missed and the rewrite was silently skipped on
ndarrays too — caught by ``test_from_torch_canonical_round_trips``
where two canonical positions collided into the same physical offset
because the layout tag was being ignored.

Switch to attribute-name-based gating instead:
- Ndarrays continue to use ``_qd_layout``, which is what
  ``build_Subscript`` / ``build_struct_for`` look for. Reverts the
  ``isinstance`` check at every rewrite site.
- Fields use a separate ``_qd_field_layout`` attribute, set by
  ``qd.tensor()`` for ``Backend.FIELD`` and read by ``Field.layout``.
  The AST never sees ``_qd_layout`` on a field, so it never tries to
  double-permute their already-canonical IR.

This keeps ``Field.layout`` and ``Ndarray.layout`` symmetric at the
Python user level while keeping the IR rewrite strictly ndarray-only.

Caught by cluster runs of test_tensor_backend_symmetry.py.
- _ndarray_pickle.py: drop unreachable Layout.SOA check (Ndarray.layout is
  now a permutation tuple/None property, never a Layout enum).
- test_pickle.py: drop the now-unfeasible test_pickle_soa_raises (cannot
  set the read-only layout property; the SOA branch is dead code).
- test_tensor_backend_symmetry.py: split test_pickle_raises_on_both_backends
  into test_pickle_ndarray_works (round-trip) + test_pickle_field_raises.
  Pre-existing asymmetry preserved per scope clarification; symmetric
  pickle is now planned for the Tensor wrapper (§8.11).
- _tensor.py: use setattr to tag _qd_field_layout on field / grad so
  pyright doesn't flag the dynamic attribute on pybind classes
  (reportAttributeAccessIssue / reportOptionalMemberAccess).
- args_hasher.py: guard len(obj.shape) with `obj.shape or ()` now
  that Ndarray.shape can return None during _reset (reportArgumentType).

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant