Skip to content

Regex support in Python→Laurel translation#623

Merged
MikaelMayer merged 19 commits intomainfrom
feat-laurel-regex
Mar 24, 2026
Merged

Regex support in Python→Laurel translation#623
MikaelMayer merged 19 commits intomainfrom
feat-laurel-regex

Conversation

@ssomayyajula
Copy link
Contributor

Description of changes:

Add re.compile, re.match, re.search, re.fullmatch for Python->Laurel

  • Laurel prelude: from_regex constructor on Any, re_Match composite type with bodiless methods.
  • Core prelude: re_to_regex normalizer (str|Pattern→regex), re_fullmatch_bool/re_match_bool/re_search_bool with correct anchoring, re_compile, re_fullmatch, re_match, re_search with specs.
  • PyFactory: re_compile_str with concreteEval calling pythonRegexToCore.
  • ReToCore: fixed bug — added LMonoTy type annotations to all regex expression builders so concreteEval output survives the SMT encoder (which rejects unannotated ops). The old pipeline never triggered this because the old Python→Core translator panics on re.match and DiffTestCore round-trips through text which re-annotates everything.
  • LaurelToCoreTranslator: map Laurel UserDefined "regex" to Core regex sort.
  • StrataMain: pass ReFactory to Laurel pipeline's verifyCore.

Test: end-to-end test_regex.py covering fullmatch/match/search with literal and compiled patterns, character classes, quantifiers, alternation, dot, and anchoring differences between modes.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add re.compile, re.match, re.search, re.fullmatch flowing through
Laurel to Core's SMT-LIB string theory.

Laurel prelude: from_regex constructor on Any, re_Match composite type
with uninterpreted method stubs.

Core prelude: re_to_regex normalizer (str|Pattern→regex),
re_fullmatch_bool/re_match_bool/re_search_bool with correct anchoring,
re_compile, re_fullmatch, re_match, re_search with disjunctive requires.

PyFactory: re_compile_str with concreteEval calling pythonRegexToCore.

ReToCore: fix latent bug — add LMonoTy type annotations to all regex
expression builders so concreteEval output survives the SMT encoder
(which rejects unannotated ops). The old pipeline never triggered this
because the old Python→Core translator panics on re.match and
DiffTestCore round-trips through text which re-annotates everything.

LaurelToCoreTranslator: map Laurel UserDefined "regex" to Core regex sort.
StrataMain: pass ReFactory to Laurel pipeline's verifyCore.

Test: end-to-end test_regex.py covering fullmatch/match/search with
literal and compiled patterns, character classes, quantifiers,
alternation, dot, and anchoring differences between modes.
@ssomayyajula ssomayyajula requested a review from a team March 19, 2026 23:15
shigoel and others added 5 commits March 21, 2026 16:43
Replace single re_compile_str (hardcoded .fullmatch mode) with three
mode-specific factory functions: re_fullmatch_str, re_match_str,
re_search_str. Each compiles the pattern with the correct MatchMode
so anchors (^/$) are handled properly.

re.compile is now a semantic no-op (returns the pattern string unchanged).
The match mode is applied at the point where matching happens, not at
compile time. This fixes incorrect behavior for patterns with explicit
anchors used across different match modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add re_pattern_error factory function that surfaces genuinely malformed
  patterns as RePatternError, modeling Python's re.error. Unimplemented
  features return NoError (Python would succeed; sound over-approximation
  comes from mode-specific factory staying uninterpreted).
- Prelude checks re_pattern_error first and returns exception(err) for
  pattern errors before attempting the match.
- Add RePatternError variant to the Laurel Error datatype.
- Remove dead reCompileFunc (PyReCompile) — never wired into either
  pipeline.
- Replace old candidate-translation comment with architecture overview
  documenting the double-parse methodology.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ~130 new test assertions covering anchors across all modes,
  escaped metacharacters, colon patterns, multi-char anchor patterns,
  anchors inside alternation, compile+anchor interaction, and more.
- Add 9 expected-failure tests (currently unknown due to Laurel pipeline
  SAT-finding limitation) documenting properties the solver should
  disprove but cannot.
- Remove from_regex constructor from Any datatype — unused since
  re.compile is now a no-op returning the pattern string.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace manual .tcons construction with mty[regex], mty[string → regex],
etc. for readability.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add type annotations (mty[Error], mty[string → Error]) to
  RePatternError/NoError constructors in rePatternErrorFunc concreteEval,
  fixing SMT encoding errors for malformed patterns.
- Add malformed pattern tests in two styles:
  - m != None (passes): proves exception value is produced
  - try/except (unknown): idiomatic Python pattern; VC simplifies to
    (assert true) after concreteEval but quantified prelude axioms make
    SAT-finding intractable for the solver.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MikaelMayer
MikaelMayer previously approved these changes Mar 23, 2026
Copy link
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor test update.

…regex

# Conflicts:
#	StrataTest/Languages/Python/run_py_analyze_sarif.py
… feat-laurel-regex

# Conflicts:
#	StrataTest/Languages/Python/run_py_analyze_sarif.py
@ssomayyajula ssomayyajula force-pushed the feat-laurel-regex branch 2 times, most recently from 72f7902 to 98d20a1 Compare March 23, 2026 17:51
MikaelMayer added a commit that referenced this pull request Mar 23, 2026
- Handle Init.boolFalse as none in DDM option parser (fixes parsing
  of pyspec ion files generated with Bool default values)
- containsKey(kwargs, key) translates to parameter presence check
  (!Any..isfrom_none(param)) instead of Dict lookup
- Generate required-parameter assertions for kwargs fields without
  defaults
- Disable regexMatch until PR #623 lands (emit assert true)
The regex forward declarations in the DDM prelude are before
CoreOnlyDelimiter and thus stripped by coreOnlyFromRuntimeCorePart.
Pass ReFactory to Core.typeCheck so the regex function signatures
are available during type checking, matching what pyAnalyzeLaurel
does at verification time.
# Conflicts:
#	StrataTest/Languages/Python/run_py_analyze_sarif.py
Define PythonFactory (Core.Factory + ReFactory) and use it in
pyAnalyzeToGotoCommand and pyAnalyzeLaurelToGotoCommand. Add a
factory parameter to typeCheckCore and inlineCoreToGotoFiles
(defaulting to Core.Factory) so non-Python callers are unaffected.
Copy link
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured PR. The architecture — deferring regex compilation to match time so the correct MatchMode is applied — is a clean design that avoids the re.compile mode-ambiguity problem. The type-annotation fix in ReToCore.lean is well-motivated and the abbreviations (reTy, s2r, etc.) keep the builders readable. The pythonRegexToCoreEraseTypes test helper is a nice way to keep existing #guard_msgs output stable. Test coverage is thorough.

A few observations below.

Add regexType DDM op to LaurelGrammar.st, map it to .TCore "regex" in
ConcreteToAbstractTreeTranslator, and remove the special case in
LaurelToCoreTranslator. Add comment in PyFactory on irrelevant mode
in rePatternErrorFunc.
MikaelMayer
MikaelMayer previously approved these changes Mar 23, 2026
joscoh
joscoh previously approved these changes Mar 24, 2026
MikaelMayer
MikaelMayer previously approved these changes Mar 24, 2026
Allow any Core type to be used as a Laurel type via 'Core <name>' syntax,
replacing the hardcoded 'regex' keyword. The name is passed through to
HighType.TCore for translation to Core.
@ssomayyajula ssomayyajula dismissed stale reviews from MikaelMayer and joscoh via 1d8a4aa March 24, 2026 15:52
@MikaelMayer MikaelMayer dismissed shigoel’s stale review March 24, 2026 16:16

Reviewer agreed to dismiss review as concerns were addressed

@MikaelMayer MikaelMayer added this pull request to the merge queue Mar 24, 2026
Merged via the queue into main with commit a16fc61 Mar 24, 2026
15 checks passed
@MikaelMayer MikaelMayer deleted the feat-laurel-regex branch March 24, 2026 16:31
MikaelMayer added a commit that referenced this pull request Mar 24, 2026
- Merge main to get PR #623 regex support
- Enable regexMatch in specExprToLaurel using re_search_bool
- Add regex precondition to Storage.py test (Bucket must match ^[a-z0-9-]+$)
- Fix PR #623 bug: move regex forward declarations (re_fullmatch_str,
  re_search_str, etc.) after CoreOnlyDelimiter so they're included in
  the Core prelude
- Skip inlining prelude procedures to avoid type checking errors
MikaelMayer added a commit that referenced this pull request Mar 24, 2026
- Merge main to get PR #623 regex support
- Enable regexMatch in specExprToLaurel using re_search_bool
- Replace verification test with Core program inspection test
  (verifyCore requires native binary, not available in interpreted mode)
- Skip inlining prelude procedures in StrataMain
- Fix arity mismatch for opaque methods with NoneType return
MikaelMayer added a commit that referenced this pull request Mar 24, 2026
…e move

- Untranslatable preconditions now emit assert Hole (nondeterministic)
  instead of assert true
- Revert regex forward declaration move (causes duplicate on CLI path)
- Keep regex translation enabled (re_search_bool)
- Test uses programmatic check (verifyCore blocked by PR #623 type
  checker issue with re_fullmatch_str forward declarations)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants