Inliner, Break Optimization and Fixing UCS Tail Calls#423
Inliner, Break Optimization and Fixing UCS Tail Calls#423AnsonYeung wants to merge 34 commits intohkust-taco:hkmc2from
Conversation
LPTK
left a comment
There was a problem hiding this comment.
The logic looks a bit more complex than I expected. I am curious: what is the main source of complexity?
| import semantics.Elaborator.State | ||
|
|
||
| object Inliner: | ||
| object TermSymbolPath: |
There was a problem hiding this comment.
Pls document why we need this strange extractor.
There was a problem hiding this comment.
Since the Call node takes in a path, this allow matching for a specific function or method. However method can be virtual and also the this parameter needs to be adjusted for inlining and so method is not actually inlined right now. I added this as the doc:
// Reference to a function body can occur as a.f or f, this handles both cases.
There was a problem hiding this comment.
This will also mathc virtual calls, right? But presumably virtual functions won't be in the mapping we're maintaining so that's fine?
There was a problem hiding this comment.
Right now method is not considered at all for inlining so it's fine. However I think later if a class have 2 methods f and g where f calls g, then we need to know that g is not virtual / final in order to inline. Inlining methods should be somewhat important since most functions in a program will be methods.
There was a problem hiding this comment.
Yes, we can consider method inlining later. Following the previous version of MLscript, I intend to make methods not virtual by default, requiring a virtual modifier to make them so.
The main complexity imo is that function definition and usage is not in order, which requires at least 2 traversals. The function could be mutually recursive for example. Even in this case, one function could be inlined into another if one of the function is only used once or is small enough. Other complexities are symbol renaming, and spread arguments. Multiple chained calls to multiple param list is not supported for inlining currently. A global function symbol to definition map would simplify this a bit, it would mean no more Define node for functions however. |
Oh, this makes me think about that old GHC paper where they had what looked like a reasonable solution: Secrets of the Glasgow Haskell Compiler inliner. It would be great if you could see whether you could implement something like their solution (sec. 4.2). Would also be a nice paper to present in a group meeting/seminar.
Would it help to make
Can't you just define a reusable pass that creates that mapping? We could use it in other places as well, so that would be useful. We can't really do away with |
hkmc2/shared/src/main/scala/hkmc2/codegen/deforest/Rewrite.scala
Outdated
Show resolved
Hide resolved
hkmc2/shared/src/main/scala/hkmc2/codegen/BlockSimplifier.scala
Outdated
Show resolved
Hide resolved
|
When I try to run the tests locally, I get in Does inlining remove tail-recursion optimization opportunities? |
|
I'm not sure whether SymbolRefresher really handles everything. The logic is quite weird since a symbol can be introduced in many weird ways. |
There was a problem hiding this comment.
Pull request overview
This PR introduces an IR-level inliner in the codegen optimization pipeline and adds control-flow cleanups (notably eliminating redundant breaks/labels) to keep generated IR/JS smaller and to avoid UCS consequent-sharing patterns that accidentally move tail calls out of tail position (addressing the UCS part of #435). It also wires the feature into diff-test configuration with :noInline / :inlineThreshold directives and updates many golden test outputs accordingly.
Changes:
- Add configurable inlining (
Config.Inliner) and run it fromBlockSimplifier, with diff-test directives to toggle/parameterize it. - Add/extend control-flow simplifications (useless-
breakremoval, match/begin simplifications) and UCS normalization tweaks to preserve tail positions under sharing. - Update a large set of expected JS/IR outputs and add new inliner-focused test coverage.
Reviewed changes
Copilot reviewed 71 out of 71 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| hkmc2DiffTests/src/test/scala/hkmc2/MLsDiffMaker.scala | Adds :noInline and :inlineThreshold directives and wires them into Config.inlining. |
| hkmc2DiffTests/src/test/scala/hkmc2/JSBackendDiffMaker.scala | Adjusts BlockSimplifier invocation to provide required contextual parameters. |
| hkmc2/shared/src/test/mlscript/wasm/ControlFlow.mls | Updates wasm golden output for new control-flow behavior. |
| hkmc2/shared/src/test/mlscript/ucs/patterns/RestTuple.mls | Updates JS golden output after break/label optimization changes. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/SimplePairMatches.mls | Updates JS golden output after UCS/codegen control-flow changes. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls | Updates golden outputs; adds a regression-style case around optimization behavior. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/DeduplicationWhile.mls | Updates while + deduplication golden output. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls | Updates golden outputs for redundant break removal in switches/matches. |
| hkmc2/shared/src/test/mlscript/ucs/general/JoinPoints.mls | Updates golden outputs reflecting simpler join-point codegen. |
| hkmc2/shared/src/test/mlscript/ucs/general/BooleanPatterns.mls | Updates switch golden output after break elision. |
| hkmc2/shared/src/test/mlscript/tailrec/TailRecOpt.mls | Updates tailrec golden output; shows interaction with inlining outputs. |
| hkmc2/shared/src/test/mlscript/optim/AbortivePrefix.mls | Updates IR golden output for abortive-prefix control flow representation. |
| hkmc2/shared/src/test/mlscript/opt/DeadVarRemoval.mls | Updates optimized IR golden output (inlined temp/value naming/structure). |
| hkmc2/shared/src/test/mlscript/opt/DeadSelRemoval.mls | Updates golden output after dead-branch simplification. |
| hkmc2/shared/src/test/mlscript/opt/DeadObjRemoval.mls | Updates golden output after inlining affects trivial call sites. |
| hkmc2/shared/src/test/mlscript/llir/ControlFlow.mls | Updates LLIR golden output for simplified match branches. |
| hkmc2/shared/src/test/mlscript/llir/BasicCpp.mls | Updates C++ golden output for match/default codegen changes. |
| hkmc2/shared/src/test/mlscript/lifter/Loops.mls | Updates golden output after control-flow simplification in lifted code. |
| hkmc2/shared/src/test/mlscript/lifter/FunInFun.mls | Updates golden output after control-flow simplification in nested functions. |
| hkmc2/shared/src/test/mlscript/invalml/InvalMLCodeGen.mls | Updates golden output (redundant breaks removed). |
| hkmc2/shared/src/test/mlscript/handlers/StackSafety.mls | Updates golden output (breaks removed in handler-generated state machines). |
| hkmc2/shared/src/test/mlscript/handlers/SavedVars.mls | Updates golden output (breaks removed in handler-generated state machines). |
| hkmc2/shared/src/test/mlscript/handlers/RecursiveHandlers.mls | Updates golden output (breaks removed in handler-generated state machines). |
| hkmc2/shared/src/test/mlscript/handlers/NoStackSafety.mls | Updates golden output; shows inlining effects even under warning scenarios. |
| hkmc2/shared/src/test/mlscript/handlers/NonLocalReturns.mls | Updates expected diagnostics/output; adds TODO note about inliner interaction. |
| hkmc2/shared/src/test/mlscript/handlers/Effects.mls | Updates handler golden output for break removal and line shifts. |
| hkmc2/shared/src/test/mlscript/handlers/Debugging.mls | Updates golden stack traces/line numbers and handler control-flow output. |
| hkmc2/shared/src/test/mlscript/deforest/todos.mls | Updates deforest golden output after symbol refresh/flow changes. |
| hkmc2/shared/src/test/mlscript/deforest/simple.mls | Updates deforest golden output (name hygiene/ids affected by new passes). |
| hkmc2/shared/src/test/mlscript/deforest/nestedMatch.mls | Updates deforest golden output (name/id shifts). |
| hkmc2/shared/src/test/mlscript/deforest/determinism.mls | Updates deforest golden output after symbol refreshing and simplification. |
| hkmc2/shared/src/test/mlscript/deforest/clashes.mls | Updates deforest golden output (inlined value naming). |
| hkmc2/shared/src/test/mlscript/deforest/basic.mls | Updates deforest golden output; reflects symbol-refresh and match simplifications. |
| hkmc2/shared/src/test/mlscript/codegen/Throw.mls | Updates JS golden output for inlining behavior around throw. |
| hkmc2/shared/src/test/mlscript/codegen/SwitchSpecialization.mls | Updates JS golden output after redundant break removal. |
| hkmc2/shared/src/test/mlscript/codegen/ScopedBlocksAndHandlers.mls | Updates JS golden output; shows inlining affecting handler-local lambdas. |
| hkmc2/shared/src/test/mlscript/codegen/SanityChecks.mls | Updates sanitized JS golden output; shows inlining at call sites. |
| hkmc2/shared/src/test/mlscript/codegen/PartialApps.mls | Updates JS golden output (parameter hygiene changes). |
| hkmc2/shared/src/test/mlscript/codegen/ParamClasses.mls | Updates JS golden output for inlined trivial wrapper calls. |
| hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls | Updates golden output after join-point/break optimization. |
| hkmc2/shared/src/test/mlscript/codegen/InnerNameHygiene.mls | Disables inlining for name-hygiene test stability. |
| hkmc2/shared/src/test/mlscript/codegen/Inliner.mls | New test file covering inliner behavior and limits. |
| hkmc2/shared/src/test/mlscript/codegen/InlineLambdas.mls | Disables inlining to keep lambda-specific expectations stable. |
| hkmc2/shared/src/test/mlscript/codegen/Getters.mls | Disables inlining to keep getter-related expectations stable. |
| hkmc2/shared/src/test/mlscript/codegen/FunInClass.mls | Updates golden output for inlining within class method bodies / closures. |
| hkmc2/shared/src/test/mlscript/codegen/Functions.mls | Disables inlining for a specific initialization-order regression test. |
| hkmc2/shared/src/test/mlscript/codegen/FirstClassFunctionTransform.mls | Disables inlining to keep :checkIR repro stable; updates line numbers. |
| hkmc2/shared/src/test/mlscript/codegen/CaseShorthand.mls | Updates JS golden output (scrutinee naming hygiene). |
| hkmc2/shared/src/test/mlscript/block-staging/Syntax.mls | Updates staged IR golden output after begin/match simplification. |
| hkmc2/shared/src/test/mlscript/basics/MemberProjections.mls | Disables inlining for a member-projection error repro; updates golden output. |
| hkmc2/shared/src/test/mlscript/basics/CompanionModules_Functions.mls | Updates expected runtime error output after codegen changes. |
| hkmc2/shared/src/test/mlscript/backlog/ToTriage.mls | Disables inlining for a tail-position triage case; updates line numbers. |
| hkmc2/shared/src/test/mlscript/backlog/TailRecFailure.mls | Updates lowered IR golden output demonstrating tailrec fix under sharing. |
| hkmc2/shared/src/test/mlscript/backlog/Lifter.mls | Disables inlining for a lifter-related repro; updates line numbers. |
| hkmc2/shared/src/test/mlscript/apps/parsing/CamlLightTest.mls | Moves “example of” output blocks; updates expected transcript positioning. |
| hkmc2/shared/src/test/mlscript-compile/Predef.mjs | Updates compiled JS runtime output (notably equals) to reflect new codegen. |
| hkmc2/shared/src/test/mlscript-compile/apps/parsing/Lexer.mls | Enables lifting and rewrites some calls to preserve tail-call optimization. |
| hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala | Avoids introducing break/result-temp when continuation is already return/throw (tail-position preservation). |
| hkmc2/shared/src/main/scala/hkmc2/MLsCompiler.scala | Adjusts BlockSimplifier invocation to provide proper contextual scoping. |
| hkmc2/shared/src/main/scala/hkmc2/Config.scala | Adds Config.Inliner and new inlining config field with default. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/wasm/text/WatBuilder.scala | Treats missing match default as nop during lowering. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/SymbolRefresher.scala | New utility for refreshing symbols during transformations/inlining. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/Printer.scala | Adds printing support for AssignDynField. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala | Updates comment explaining mayRet in lowering context. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/js/JSBuilder.scala | Switch/match codegen tweaks: tolerate no-default, and avoid redundant break in abortive arms. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/deforest/Rewrite.scala | Replaces ad-hoc symbol refresh logic with SymbolRefresher and tightens assumptions. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/BufferableTransform.scala | Refreshes/rebinds parameter symbols while rewriting bufferable classes. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/BlockTraverser.scala | Traverses disambiguation term symbols in Value.Ref for correctness of analyses. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/BlockTransformer.scala | Routes Value.Ref local rewriting through applyLocal to support custom remappers. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/BlockSimplifier.scala | Adds inliner and label/break optimizations; drives inlining based on Config.inlining. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala | Simplifies Match/Begin construction when defaults/arms are empty and when rest is empty. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // * Remove useless break | ||
| case Break(label) if tailLabels.contains(label) => | ||
| tl.log(s"Break ${label} is eliminated: current tail label list is ${tailLabels}") | ||
| registerChange | ||
| End() |
There was a problem hiding this comment.
tl.log(...) is used here, but tl is not in scope. BlockSimplifier currently takes an unnamed contextual TL, so this will not compile. Either name the contextual parameter (e.g., (using ..., tl: TL)) or replace these calls with summon[TL].log(...) (and similarly for the other tl.log usages in this file).
There was a problem hiding this comment.
tl is a methods defined in hkmc2/shared/src/main/scala/utils/TraceLogger.scala.
| def matchArgs(args: List[Arg], params: ParamList): Option[List[(VarSymbol, Result)]] = | ||
| if args.exists(_.spread.isDefined) then | ||
| // we require a precise match when any arg is a spread arg | ||
| if params.restParam.isEmpty then return N | ||
| if args.exists(_.spread.exists(!_.isEager)) then return N | ||
| val pairs = args.zip(params.params.iterator.map((_, false)) ++ params.restParam.map((_, true))) | ||
| if pairs.exists((arg, param) => arg.spread.isDefined =/= param._2) then return N | ||
| S(pairs.map((arg, param) => (param._1.sym, arg.value))) | ||
| else |
There was a problem hiding this comment.
matchArgs claims to require a “precise match when any arg is a spread arg”, but it never checks that args.size matches the expected arity (fixed params + the single rest param). Because zip truncates to the shorter list, extra call arguments can be silently ignored and missing call arguments won’t be detected, which can lead to incorrect inlining/misbinding. Add an explicit size/shape check before zipping (and reject calls with more than one spread).
There was a problem hiding this comment.
@AnsonYeung What happens when there is a length mismatch and zip drops some elements from either side? Is the call semantics of the non-inlined version preserved? Please add a comment about that as well as regression tests for both cases.
| case bms: BlockMemberSymbol => | ||
| val newBms = new BlockMemberSymbol(bms.nme, Nil, bms.nameIsMeaningful) | ||
| newBms.tsym = bms.tsym.map: t => | ||
| val newOwner = t.owner.map(o => existingMapping.getOrElse(o, o).asInstanceOf) |
There was a problem hiding this comment.
newOwner is computed via existingMapping.getOrElse(o, o).asInstanceOf (context-inferred cast). If existingMapping ever contains a non-TermSymbol mapping for an owner, this will throw at runtime. Consider using a type-safe match (e.g., existingMapping.get(o).collect { case ts: TermSymbol => ts }.getOrElse(o)) so unexpected mappings are handled explicitly.
| val newOwner = t.owner.map(o => existingMapping.getOrElse(o, o).asInstanceOf) | |
| val newOwner = t.owner.map: o => | |
| existingMapping.get(o) match | |
| case Some(ts: TermSymbol) => ts | |
| case _ => o |
There was a problem hiding this comment.
The code proposed by Copilot does seem better?
LPTK
left a comment
There was a problem hiding this comment.
Nice work! The code looks pretty clean, although still under-documented.
Please address the comments, and otherwise LGTM.
| @@ -0,0 +1,242 @@ | |||
| :js | |||
|
|
|||
| :sjs | |||
There was a problem hiding this comment.
Why not use :soir consistently, instead of :sjs? Here, and everywhere else in the PR.
| //│ }; | ||
| //│ —————————————————| Output |————————————————————————————————————————————————————————————————————————— | ||
|
|
||
| :soir |
There was a problem hiding this comment.
| :soir | |
| // * TODO: support spread arguments like this one | |
| :soir |
| // * Remove useless break | ||
| case Break(label) if tailLabels.contains(label) => | ||
| tl.log(s"Break ${label} is eliminated: current tail label list is ${tailLabels}") | ||
| registerChange | ||
| End() |
There was a problem hiding this comment.
tl is a methods defined in hkmc2/shared/src/main/scala/utils/TraceLogger.scala.
| def matchArgs(args: List[Arg], params: ParamList): Option[List[(VarSymbol, Result)]] = | ||
| if args.exists(_.spread.isDefined) then | ||
| // we require a precise match when any arg is a spread arg | ||
| if params.restParam.isEmpty then return N | ||
| if args.exists(_.spread.exists(!_.isEager)) then return N | ||
| val pairs = args.zip(params.params.iterator.map((_, false)) ++ params.restParam.map((_, true))) | ||
| if pairs.exists((arg, param) => arg.spread.isDefined =/= param._2) then return N | ||
| S(pairs.map((arg, param) => (param._1.sym, arg.value))) | ||
| else |
There was a problem hiding this comment.
@AnsonYeung What happens when there is a length mismatch and zip drops some elements from either side? Is the call semantics of the non-inlined version preserved? Please add a comment about that as well as regression tests for both cases.
| case bms: BlockMemberSymbol => | ||
| val newBms = new BlockMemberSymbol(bms.nme, Nil, bms.nameIsMeaningful) | ||
| newBms.tsym = bms.tsym.map: t => | ||
| val newOwner = t.owner.map(o => existingMapping.getOrElse(o, o).asInstanceOf) |
There was a problem hiding this comment.
The code proposed by Copilot does seem better?
| tailLabels += lbl | ||
| val result = applyBlock(bod) | ||
| tailLabels -= lbl |
There was a problem hiding this comment.
Why not use freshLabelCtx/nestLabelCtx?
| if (scrut2 is scrut) && | ||
| (arms2 is arms) && | ||
| (dflt2 is dflt) && (rst2 is rst) |
There was a problem hiding this comment.
| if (scrut2 is scrut) && | |
| (arms2 is arms) && | |
| (dflt2 is dflt) && (rst2 is rst) | |
| if (scrut2 is scrut) | |
| && (arms2 is arms) | |
| && (dflt2 is dflt) | |
| && (rst2 is rst) |
| if (sub2 is sub) && (rst2 is rst) then b else Begin(sub2, rst2) | ||
|
|
||
| case Match(scrut, arms, dflt, _: End) => super.applyBlock(b) | ||
| case Match(scrut, arms, dflt, rst) => |
There was a problem hiding this comment.
Instead of duplicating this error-prone code, can we just modify BlockTransformer so that it uses an applyMatchArmBlock in these positions, that we can easily override? Try to keep it DRY.
| ): | ||
| def isPrivate = !symbolsToPreserve.contains(defn.sym) | ||
|
|
||
| def canBeInlineEliminated = |
There was a problem hiding this comment.
Is this doc correct?
| def canBeInlineEliminated = | |
| /** Whether the definition can be removed 1from the program after it has been inlined. */ | |
| def canBeInlineEliminated = |
or please update with a more accurate doc.
| def isPrivate = !symbolsToPreserve.contains(defn.sym) | ||
|
|
||
| def canBeInlineEliminated = | ||
| isPrivate && !isMethod && useCount <= 1 && !hasNakedRef && !isLoopBreaker |
There was a problem hiding this comment.
Can't we eliminate functions used several times?
This PR implements inlining. As naive inlining generates unnecessary labels and break, this PR adds break optimization to optimize away useless breaks. The UCS part of #435 is addressed.
Method inlining is not enabled with this PR.