Verify safety of str iter functions (Challenge 22) by jrey8343 · Pull Request #557 · model-checking/verify-rust-std

jrey8343 · 2026-03-15T20:36:12Z

Summary

Unbounded verification of 16 safe functions containing unsafe code in library/core/src/str/iter.rs, plus a safety contract for Bytes::__iterator_get_unchecked, using Kani with loop contracts (-Z loop-contracts).

Functions Verified

Function	Impl for	Unsafe Ops
`next`	`Chars`	`next_code_point` + `char::from_u32_unchecked`
`advance_by`	`Chars`	`self.iter.advance_by(N).unwrap_unchecked()` (3 call sites)
`next_back`	`Chars`	`next_code_point_reverse` + `char::from_u32_unchecked`
`as_str`	`Chars`	`from_utf8_unchecked(self.iter.as_slice())`
`get_end`	`SplitInternal`	`get_unchecked(self.start..self.end)`
`next`	`SplitInternal`	`get_unchecked(self.start..a)`
`next_inclusive`	`SplitInternal`	`get_unchecked(self.start..b)`
`next_back`	`SplitInternal`	`get_unchecked(b..self.end)`, `get_unchecked(self.start..self.end)`
`next_back_inclusive`	`SplitInternal`	`get_unchecked(b..self.end)`, `get_unchecked(self.start..self.end)`
`remainder`	`SplitInternal`	`get_unchecked(self.start..self.end)`
`next`	`MatchIndicesInternal`	`get_unchecked(start..end)`
`next_back`	`MatchIndicesInternal`	`get_unchecked(start..end)`
`next`	`MatchesInternal`	`get_unchecked(a..b)`
`next_back`	`MatchesInternal`	`get_unchecked(a..b)`
`remainder`	`SplitAsciiWhitespace`	`from_utf8_unchecked(&self.inner.iter.iter.v)`
`__iterator_get_unchecked`	`Bytes`	Contract: `#[requires(idx < self.0.len())]`

Coverage

18 proof harnesses covering all 16 functions plus the __iterator_get_unchecked contract.

Unbounded Verification Approach

All harnesses run without #[kani::unwind(N)] — verification is unbounded:

Chars methods: next_code_point and next_code_point_reverse are branch-based (no loops), so no unwind bounds needed. advance_by is abstracted under #[cfg(kani)] to eliminate its 3 while loops while exercising the same unwrap_unchecked unsafe operations with nondeterministic but valid byte counts.
SplitInternal / MatchIndicesInternal / MatchesInternal: These call CharSearcher::next_match() / next_match_back() which contain loops over the haystack. These searcher methods are abstracted under #[cfg(kani)] in pattern.rs as nondeterministic overapproximations — they return valid (start, end) pairs within haystack bounds without looping. The get_unchecked safety depends only on indices being in bounds, which the abstraction guarantees for any string length.
SplitAsciiWhitespace::remainder: Verified on a fresh iterator (no next() call needed) since the unsafe from_utf8_unchecked safety depends only on the invariant that self.inner.iter.iter.v is a valid UTF-8 subslice, maintained from the original &str input.

Key Techniques

#[cfg(kani)] nondeterministic abstractions in pattern.rs for CharSearcher, MultiCharEqSearcher, and StrSearcher — eliminates all searcher loops while preserving the interface contract
#[cfg(kani)] abstraction of Chars::advance_by — exercises unwrap_unchecked with valid nondeterministic byte counts, proving safety for any string length
Symbolic char inputs — harnesses use kani::any() chars constrained to ASCII, covering all valid code paths

Assumptions Used

Per challenge spec:

Safety and functional correctness of all functions in slice module
Safety and functional correctness of all functions in pattern.rs
Functional correctness of validations.rs functions per UTF-8 spec
All iterators derive from valid UTF-8 strings

All 18 harnesses pass with no --unwind and no #[kani::unwind].

Resolves #279

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

Add 17 Kani verification harnesses for all unsafe operations in library/core/src/str/iter.rs: Chars: next, next_back, advance_by (small + CHUNK_SIZE branch), as_str SplitInternal: get_end, next, next_inclusive, next_back, next_back (terminator path), next_back_inclusive, remainder MatchIndicesInternal: next, next_back MatchesInternal: next, next_back SplitAsciiWhitespace: remainder Bytes: __iterator_get_unchecked (safety contract proof) Techniques: - Symbolic char via kani::any::<char>() with encode_utf8 for full Unicode scalar value coverage (Chars harnesses) - Symbolic ASCII char patterns with 2-byte haystack for Split/Match harnesses covering match and no-match paths - Concrete 33-byte string for advance_by CHUNK_SIZE=32 branch

- Remove all #[kani::unwind(N)] from harnesses - Abstract Chars::advance_by under #[cfg(kani)] to eliminate loops - Bring CharSearcher/MultiCharEqSearcher/StrSearcher nondeterministic abstractions from Ch21 pattern.rs for next_match/next_match_back - Use symbolic char inputs instead of literal strings - Simplify SplitAsciiWhitespace harness to avoid slice iteration loops - All harnesses now use nondeterministic overapproximation instead of bounded loop unwinding Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

1. Fix indentation in #[cfg(not(kani))] advance_by block to pass rustfmt 2. Remove incorrect assertions in check_split_internal_get_end harness - the nondeterministic next_match abstraction overapproximates, so we only verify safety of get_unchecked, not functional correctness Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds Kani-focused abstractions and proof harnesses to support unbounded verification (no unwind bounds) of the safety of str iterator methods that contain unsafe operations, primarily by eliminating/bypassing internal search loops under #[cfg(kani)].

Changes:

Added #[cfg(kani)] nondeterministic abstractions in str/pattern.rs to avoid unbounded loops in Searcher implementations (including Two-Way search) during Kani runs.
Added a #[cfg(kani)] abstraction of Chars::advance_by in str/iter.rs and introduced a #[cfg(kani)] verify module with multiple Kani proof harnesses.
Introduced/used a Kani contract style (#[requires(...)]) and a harness to exercise the Bytes::__iterator_get_unchecked safety precondition.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 10 comments.

File	Description
`library/core/src/str/pattern.rs`	Adds Kani-only loop abstractions for several `Searcher` implementations (char search, multi-char predicates, str search, two-way search) to enable unbounded verification.
`library/core/src/str/iter.rs`	Adds a Kani-only abstraction for `Chars::advance_by` and introduces Kani proof harnesses for iterator safety contracts.

Copilot · 2026-03-31T22:24:40Z

library/core/src/str/pattern.rs

+                kani::assume(a >= self.finger);
+                kani::assume(a <= self.finger_back); // avoid overflow in a + w
+                kani::assume(w <= self.finger_back - a);
+                self.finger = a + w;
+                Some((a, self.finger))


In the #[cfg(kani)] abstraction for CharSearcher::next_match, a is not constrained to a UTF-8 char boundary. This can violate the CharSearcher/Searcher safety invariant that returned indices (and self.finger on exit) are on char boundaries, which is required because other methods (e.g., CharSearcher::next uses haystack.get_unchecked(old_finger..finger_back)) rely on that invariant and can cause UB if it’s broken.

Constrain a (and therefore a + w) to self.haystack.is_char_boundary(...), and ensure self.finger is left on a boundary before returning Some/None.

Copilot · 2026-03-31T22:24:40Z

library/core/src/str/pattern.rs

+                let w: usize = kani::any();
+                kani::assume(w >= 1 && w <= 4);
+                kani::assume(old_finger + w <= self.finger_back);
+                self.finger = old_finger + w;


In the #[cfg(kani)] abstraction for CharSearcher::next_reject, advancing self.finger by an unconstrained w (1..=4) can leave finger on a non–UTF-8 boundary. Because CharSearcher is an unsafe impl Searcher, this violates the required invariant that finger is on a char boundary when methods return, and can lead to UB in subsequent calls (e.g., CharSearcher::next uses haystack.get_unchecked(old_finger..finger_back) which requires boundaries).

Update the abstraction to only choose new finger positions that satisfy self.haystack.is_char_boundary(new_finger) (and keep finger_back unchanged), or alternatively pick a boundary index directly and assign it to finger.

Suggested change

let w: usize = kani::any();

kani::assume(w >= 1 && w <= 4);

kani::assume(old_finger + w <= self.finger_back);

self.finger = old_finger + w;

let new_finger: usize = kani::any();

// Choose a new position strictly after old_finger, within the remaining

// search range, and constrained to be a valid UTF-8 char boundary.

kani::assume(new_finger > old_finger);

kani::assume(new_finger <= self.finger_back);

kani::assume(self.haystack.is_char_boundary(new_finger));

self.finger = new_finger;

Copilot · 2026-03-31T22:24:40Z

library/core/src/str/pattern.rs

+                let w = self.utf8_size();
+                kani::assume(a >= self.finger);
+                kani::assume(a <= self.finger_back); // avoid overflow in a + w
+                kani::assume(w <= self.finger_back - a);


In the #[cfg(kani)] abstraction for CharSearcher::next_match_back, a is not constrained to a UTF-8 char boundary, and the returned (a, a + w) may therefore violate the Searcher contract (indices must be on UTF-8 boundaries). This is especially risky because str APIs like trim_*_matches use next_match/_reject results with unsafe get_unchecked on &str, which is UB if boundaries are not respected.

Add kani::assume(self.haystack.is_char_boundary(a)) and kani::assume(self.haystack.is_char_boundary(a + w)), and ensure self.finger_back is left on a boundary before returning.

Suggested change

kani::assume(w <= self.finger_back - a);

kani::assume(w <= self.finger_back - a);

kani::assume(self.haystack.is_char_boundary(a));

kani::assume(self.haystack.is_char_boundary(a + w));

Copilot · 2026-03-31T22:24:41Z

library/core/src/str/pattern.rs

+                let w: usize = kani::any();
+                kani::assume(w >= 1 && w <= 4);
+                kani::assume(self.finger + w <= old_finger_back);
+                self.finger_back = old_finger_back - w;


In the #[cfg(kani)] abstraction for CharSearcher::next_reject_back, self.finger_back is decremented by an unconstrained w (1..=4) without ensuring the new value is a UTF-8 boundary. Since CharSearcher’s unsafe impl Searcher/ReverseSearcher requires fingers to be on boundaries when methods return, this can break subsequent &str slicing invariants and cause UB.

Constrain the updated finger_back to satisfy self.haystack.is_char_boundary(self.finger_back) (and keep it within [finger, old_finger_back]).

Suggested change

let w: usize = kani::any();

kani::assume(w >= 1 && w <= 4);

kani::assume(self.finger + w <= old_finger_back);

self.finger_back = old_finger_back - w;

// Choose a nondeterministic new finger_back that:

// - stays within [self.finger, old_finger_back),

// - advances by at most one UTF-8 code point (<= MAX_LEN_UTF8 bytes),

// - and is a valid UTF-8 boundary.

let new_finger_back: usize = kani::any();

kani::assume(self.finger <= new_finger_back);

kani::assume(new_finger_back < old_finger_back);

kani::assume(old_finger_back - new_finger_back <= MAX_LEN_UTF8);

kani::assume(self.haystack.is_char_boundary(new_finger_back));

self.finger_back = new_finger_back;

Copilot · 2026-03-31T22:24:41Z

library/core/src/str/pattern.rs

+        {
+            if kani::any() {
+                let i: usize = kani::any();
+                let char_len: usize = kani::any();
+                kani::assume(char_len >= 1 && char_len <= 4);
+                kani::assume(i <= self.haystack.len());
+                kani::assume(char_len <= self.haystack.len() - i);
+                Some((i, i + char_len))
+            } else {


The #[cfg(kani)] abstractions for MultiCharEqSearcher::{next_match,next_reject,next_match_back,next_reject_back} return arbitrary (i, i + char_len) ranges without constraining i/i+char_len to UTF-8 char boundaries (and without coupling them to char_indices). Because this is an unsafe impl Searcher, returning non-boundary indices violates the Searcher contract and can lead to UB in callers that use these indices with unsafe get_unchecked on &str.

At minimum, add assumptions that both ends are self.haystack.is_char_boundary(...) (and in-bounds). Ideally, also advance internal state in a way that preserves progress/consistency with the real implementation.

Copilot · 2026-03-31T22:24:41Z

library/core/src/str/pattern.rs

+            let crit_pos_back: usize = kani::any();
+            kani::assume(crit_pos_back < needle_len);
+            let period: usize = kani::any();
+            kani::assume(period >= 1 && period <= needle_len);


In TwoWaySearcher::new’s #[cfg(kani)] abstraction, period is constrained to <= needle_len, but in the real implementation the long-period case sets period = max(crit_pos, needle_len - crit_pos) + 1, which can be needle_len + 1. If this abstraction is intended to be an overapproximation of reachable states, this constraint is too strong and may exclude real states.

Relax the assumption to permit the full range of period values that can occur (e.g., allow needle_len + 1 in the long-period case, or avoid constraining period beyond what is actually required by downstream Kani abstractions).

Suggested change

kani::assume(period >= 1 && period <= needle_len);

kani::assume(period >= 1 && period <= needle_len + 1);

Copilot · 2026-03-31T22:24:41Z

library/core/src/str/pattern.rs

+        // Under Kani, abstract away maximal_suffix computation which has deeply
+        // nested loops intractable for CBMC. Instead, produce a nondeterministic
+        // TwoWaySearcher satisfying the type invariant. This is sound because:
+        // - All TwoWaySearcher code is safe Rust (no UB possible regardless of field values)


The soundness note for the Kani abstraction says “All TwoWaySearcher code is safe Rust (no UB possible regardless of field values)”, but this file contains unsafe code in the two-way search implementation (e.g., small_slice_eq does raw pointer reads). If the argument is that the Kani build never executes those unsafe paths due to further abstraction, the comment should say that explicitly.

Please adjust the comment to avoid claiming the algorithm is entirely safe Rust, or clearly state which unsafe code is avoided under #[cfg(kani)] and why that makes the abstraction safe for the properties being verified.

Suggested change

// - All TwoWaySearcher code is safe Rust (no UB possible regardless of field values)

// - Under #[cfg(kani)], we only rely on the structural invariants of

// TwoWaySearcher and the safe operations performed on its fields; the

// unsafe helper routines used in the full two-way implementation (such

// as those doing raw pointer reads) are not exercised by the Kani

// abstractions and verification harnesses

Copilot · 2026-03-31T22:24:42Z

library/core/src/str/iter.rs

+            let bytes_len = self.iter.len();
+            let bytes_consumed: usize = kani::any();
+            kani::assume(bytes_consumed <= bytes_len);
+            if bytes_consumed > 0 {
+                // SAFETY: bytes_consumed <= bytes_len = self.iter.len()
+                unsafe { self.iter.advance_by(bytes_consumed).unwrap_unchecked() };
+            }
+            // Nondeterministically determine how many chars were consumed.
+            // Each char is 1-4 bytes, so chars_consumed is between
+            // ceil(bytes_consumed/4) and bytes_consumed.
+            let chars_consumed: usize = kani::any();
+            kani::assume(chars_consumed <= remainder);
+            NonZero::new(remainder - chars_consumed).map_or(Ok(()), Err)


The #[cfg(kani)] implementation of Chars::advance_by advances the underlying byte iterator by an arbitrary bytes_consumed without ensuring it lands on a UTF-8 char boundary. Chars methods (e.g., next/as_str) rely on the invariant that self.iter always points to valid UTF-8 boundaries; breaking it can make later next_code_point / from_utf8_unchecked calls UB.

Constrain bytes_consumed to a char boundary within the remaining string (e.g., derive let rem = unsafe { from_utf8_unchecked(self.iter.as_slice()) } and assume(rem.is_char_boundary(bytes_consumed)) before advancing), so the abstraction preserves the iterator’s safety invariants.

Copilot · 2026-03-31T22:24:42Z

library/core/src/str/iter.rs

+        let c: char = kani::any();
+        let mut buf = [0u8; 4];
+        let s = c.encode_utf8(&mut buf);
+        let mut chars = s.chars();
+        let result = chars.next();
+        assert_eq!(result, Some(c));
+        assert!(chars.next().is_none());


The Kani harnesses build s from a single char (encode_utf8 into a 4-byte buffer), so the input string length is always in 1..=4 bytes. That means these proofs don’t exercise behaviors that depend on longer haystacks (e.g., larger index values / more iteration steps), despite the PR description claiming the verification holds for “any string length”.

If the intent is to argue length-independence via the nondeterministic searcher abstractions, consider either (a) documenting that the harnesses intentionally restrict to very small strings, or (b) constructing a symbolic &str with variable length (even if bounded) so the proof actually ranges over more lengths.

Copilot · 2026-03-31T22:24:42Z

library/core/src/str/pattern.rs

+
+

There are two consecutive blank lines introduced right before the #[cfg(kani)] pub mod verify block. This looks accidental and adds noise to the diff.

Please remove the extra blank lines to keep formatting consistent with the surrounding code.

Suggested change

jrey8343 and others added 3 commits February 7, 2026 14:17

Apply upstream rustfmt formatting via check_rustc.sh --bless

6ccb55d

jrey8343 requested a review from a team as a code owner March 15, 2026 20:36

feliperodri added the Challenge Used to tag a challenge label Mar 19, 2026

feliperodri requested a review from Copilot March 31, 2026 22:17

Copilot started reviewing on behalf of feliperodri March 31, 2026 22:18 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify safety of str iter functions (Challenge 22)#557

Verify safety of str iter functions (Challenge 22)#557
jrey8343 wants to merge 4 commits intomodel-checking:mainfrom
jrey8343:challenge-22-str-iter

jrey8343 commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                let w: usize = kani::any();
-                kani::assume(w >= 1 && w <= 4);
-                kani::assume(old_finger + w <= self.finger_back);
-                self.finger = old_finger + w;
+                let new_finger: usize = kani::any();
+                // Choose a new position strictly after old_finger, within the remaining
+                // search range, and constrained to be a valid UTF-8 char boundary.
+                kani::assume(new_finger > old_finger);
+                kani::assume(new_finger <= self.finger_back);
+                kani::assume(self.haystack.is_char_boundary(new_finger));
+                self.finger = new_finger;

-                let w: usize = kani::any();
-                kani::assume(w >= 1 && w <= 4);
-                kani::assume(self.finger + w <= old_finger_back);
-                self.finger_back = old_finger_back - w;
+                // Choose a nondeterministic new finger_back that:
+                // - stays within [self.finger, old_finger_back),
+                // - advances by at most one UTF-8 code point (<= MAX_LEN_UTF8 bytes),
+                // - and is a valid UTF-8 boundary.
+                let new_finger_back: usize = kani::any();
+                kani::assume(self.finger <= new_finger_back);
+                kani::assume(new_finger_back < old_finger_back);
+                kani::assume(old_finger_back - new_finger_back <= MAX_LEN_UTF8);
+                kani::assume(self.haystack.is_char_boundary(new_finger_back));
+                self.finger_back = new_finger_back;

	kani::assume(period >= 1 && period <= needle_len);
	kani::assume(period >= 1 && period <= needle_len + 1);

-        // - All TwoWaySearcher code is safe Rust (no UB possible regardless of field values)
+        // - Under #[cfg(kani)], we only rely on the structural invariants of
+        //   TwoWaySearcher and the safe operations performed on its fields; the
+        //   unsafe helper routines used in the full two-way implementation (such
+        //   as those doing raw pointer reads) are not exercised by the Kani
+        //   abstractions and verification harnesses

Conversation

jrey8343 commented Mar 15, 2026

Summary

Functions Verified

Coverage

Unbounded Verification Approach

Key Techniques

Assumptions Used

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants