[OrcJIT] Add LLVM ORC JIT v2 dynamic object loader addon#254
[OrcJIT] Add LLVM ORC JIT v2 dynamic object loader addon#254cyx-6 merged 32 commits intoapache:mainfrom
Conversation
Summary of ChangesHello @yaoyaoding, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant new capability to TVM-FFI by integrating LLVM's ORC JIT v2. This integration enables the dynamic loading and execution of C++ object files directly from Python, offering a flexible and efficient way to extend TVM-FFI's functionality at runtime. The new addon provides a clear Python API for managing JIT compilation contexts and dynamic libraries, making it easier for developers to incorporate custom C++ logic without requiring a full recompile of their applications. Highlights
Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new addon package, tvm-ffi-orcjit, to enable dynamic loading of object files using LLVM's ORC JIT v2. The overall design is robust, with a clear separation between the C++ backend and the Python frontend. The implementation demonstrates a deep understanding of LLVM's JIT capabilities, including a clever workaround for the __dso_handle issue. The tests are comprehensive and cover important scenarios like symbol conflicts.
I have identified a few areas for improvement, primarily concerning inconsistencies in documentation, build scripts, and a potential bug in the CMake configuration. There is also a mismatch between a Python API and its C++ backend. Addressing these points will enhance the quality and usability of this new addon.
addons/tvm-ffi-orcjit/CMakeLists.txt
Outdated
| target_link_libraries( | ||
| tvm_ffi_orcjit | ||
| PUBLIC tvm_ffi | ||
| PRIVATE LLVM |
| def link_against(self, *libraries: DynamicLibrary) -> None: | ||
| """Link this library against other dynamic libraries. | ||
|
|
||
| Sets the search order for symbol resolution. Symbols not found in this library | ||
| will be searched in the linked libraries in the order specified. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| *libraries : DynamicLibrary | ||
| One or more dynamic libraries to link against. | ||
|
|
||
| Examples | ||
| -------- | ||
| >>> session = create_session() | ||
| >>> lib_utils = session.create_library() | ||
| >>> lib_utils.add("utils.o") | ||
| >>> lib_main = session.create_library() | ||
| >>> lib_main.add("main.o") | ||
| >>> lib_main.link_against(lib_utils) # main can call utils symbols | ||
|
|
||
| """ | ||
| handles = [lib._handle for lib in libraries] | ||
| self._link_func(self._handle, *handles) |
There was a problem hiding this comment.
The link_against method accepts *libraries, which implies it can link against multiple libraries. However, the C++ backend function orcjit.DynamicLibraryLinkAgainst only supports linking against a single library at a time. This will cause a TypeError if more than one library is passed.
To fix this, you should either update the Python method to accept only one library or modify the C++ backend to handle multiple libraries. Given the current C++ implementation, changing the Python API is the most direct solution.
def link_against(self, library: DynamicLibrary) -> None:
"""Link this library against another dynamic library.
Sets the search order for symbol resolution. Symbols not found in this library
will be searched in the linked library.
Parameters
----------
library : DynamicLibrary
The dynamic library to link against.
Examples
--------
>>> session = create_session()
>>> lib_utils = session.create_library()
>>> lib_utils.add("utils.o")
>>> lib_main = session.create_library()
>>> lib_main.add("main.o")
>>> lib_main.link_against(lib_utils) # main can call utils symbols
"""
self._link_func(self._handle, library._handle)
addons/tvm-ffi-orcjit/CMakeLists.txt
Outdated
| separate_arguments(LLVM_DEFINITIONS_LIST NATIVE_COMMAND ${LLVM_DEFINITIONS}) | ||
| add_definitions(${LLVM_DEFINITIONS_LIST}) |
addons/tvm-ffi-orcjit/README.md
Outdated
| from tvm_ffi_orcjit import ObjectLoader | ||
|
|
||
| # Create a loader instance | ||
| loader = ObjectLoader() | ||
|
|
||
| # Load an object file | ||
| loader.load("example.o") | ||
|
|
||
| # Get and call a function | ||
| add_func = loader.get_function("simple_add") | ||
| result = add_func(1, 2) | ||
| print(f"Result: {result}") # Output: Result: 3 | ||
| ``` | ||
|
|
||
| ### Incremental Loading | ||
|
|
||
| Load multiple object files and access functions from all of them: | ||
|
|
||
| ```python | ||
| from tvm_ffi_orcjit import ObjectLoader | ||
|
|
||
| loader = ObjectLoader() | ||
|
|
||
| # Load first object file | ||
| loader.load("math_ops.o") | ||
| add = loader.get_function("simple_add") | ||
|
|
||
| # Load second object file - functions from first remain accessible | ||
| loader.load("string_ops.o") | ||
| concat = loader.get_function("string_concat") | ||
|
|
||
| # Both functions work | ||
| print(add(10, 20)) # From math_ops.o | ||
| print(concat("Hello", "World")) # From string_ops.o | ||
| ``` |
There was a problem hiding this comment.
The usage examples in this README appear to be outdated. They refer to an ObjectLoader class which is not present in the current implementation. The correct API, as demonstrated in examples/quick-start/run.py, uses create_session, session.create_library, and lib.add(). Please update the examples to reflect the current API to avoid confusion for new users.
addons/tvm-ffi-orcjit/README.md
Outdated
| ### Direct Module Access | ||
|
|
||
| You can also use TVM-FFI's `load_module` directly (`.o` files are automatically handled): | ||
|
|
||
| ```python | ||
| import tvm_ffi | ||
|
|
||
| # Load object file as a module | ||
| module = tvm_ffi.load_module("example.o") | ||
|
|
||
| # Get function | ||
| func = module.get_function("my_function") | ||
| result = func(arg1, arg2) | ||
| ``` |
There was a problem hiding this comment.
This section on "Direct Module Access" suggests that tvm_ffi.load_module("example.o") is supported. However, the implementation does not seem to register a module loader for .o files. If this feature is not yet implemented, it would be best to remove this section or clearly mark it as a future capability to prevent user confusion.
| int (*)(void* handle, const TVMFFIAny* args, int32_t num_args, TVMFFIAny* rv); | ||
| auto c_func = reinterpret_cast<TVMFFISafeCallType>(symbol); | ||
|
|
||
| return Function::FromPacked([c_func, name](PackedArgs args, Any* rv) { |
There was a problem hiding this comment.
Refer to existing LibraryModule impl.
There was a problem hiding this comment.
|
One general question, how do we plan to manage the versions of these add ons? |
|
I think most likely they can evolve independently for now and optional |
| [](const ORCJITExecutionSession& session, const String& name) -> ObjectRef { | ||
| return session->CreateDynamicLibrary(name); | ||
| }) | ||
| .def("orcjit.DynamicLibraryAdd", |
There was a problem hiding this comment.
because dynamic library is only a Module and is not an explicit subclass, let us avoid passing these function via def, instead we can overload GetFunction in ORCJITDynamicLibrary to return set link order and AddFile etc
|
This PR has been outstanding for a while, and looks indeed super interesting! I don't really have the expertise to review, unfortunately. @yaoyaoding @cyx-6 What's the status of it? Happy to get it in |
@cyx-6 is working on this PR right now and currently working on the support of cuda kernel with the orcjitv2. |
29735d0 to
e2ae9da
Compare
Add ORC JIT-based module loading addon (tvm-ffi-orcjit) with: - LLJIT wrapper exposing packed functions via TVM FFI Module interface - Init/fini handling for ELF (.init_array/.fini_array), Mach-O (__mod_init_func/__mod_term_func), and COFF (.CRT$XC*/.CRT$XT*) - COFF .pdata/.xdata relocation stripping for Windows JITLink compat - Windows DLL import symbol resolution and PLT stub generation - Static LLVM linking with bundled liborc_rt for manylinux/macOS/Windows - CI workflow (Linux x86_64/aarch64, macOS arm64, Windows AMD64) with cibuildwheel and multi-compiler test coverage - Comprehensive tests: error handling, type variety, cross-library linking, ctor/dtor ordering across platforms and compilers - Quick-start example with C and C++ variants - Documentation and kernel library guide
| // macOS: MachOPlatform handles init/fini natively via jit_->initialize()/deinitialize(). | ||
| auto& objlayer = jit_->getObjLinkingLayer(); | ||
| static_cast<llvm::orc::ObjectLinkingLayer&>(objlayer).addPlugin( | ||
| std::make_unique<InitFiniPlugin>(GetRef<ORCJITExecutionSession>(this))); |
There was a problem hiding this comment.
the ExecutionSession will leak. the session ref count will never reach zero.
the chain is like this: when you do GetRef<ORCJITExecutionSession>(this), it will create a session moved to InitFiniPlugin::session_ and owned by ObjectLinkingLayer, which is owned by LLJIT, which is owned by the session.
| // Try to get the symbol - return NullOpt if not found | ||
| if (void* symbol = GetSymbol(symbol_name)) { | ||
| // Wrap C function pointer as tvm-ffi Function | ||
| TVMFFISafeCallType c_func = reinterpret_cast<TVMFFISafeCallType>(symbol); |
There was a problem hiding this comment.
since c_func returned by GetSymbol points to JIT-allocated code pages, which is owned by LLJIT, and is owned by a session, if the dylib and session are destroyed, referencing c_func would be dangerous behavior.
i think it might be better if we take a look at src/ffi/extra/library_module.cc and use its method
Module self_strong_ref = GetRef<Module>(this);
| *ctx_addr = this; | ||
| } | ||
| Module::VisitContextSymbols([this](const ffi::String& name, void* symbol) { | ||
| if (void** ctx_addr = reinterpret_cast<void**>(GetSymbol(ffi::symbol::tvm_ffi_library_ctx))) { |
There was a problem hiding this comment.
GetSymbol calls seem repetitive. since each GetSymbol call will trigger the initializers, it will be init too many times.
There was a problem hiding this comment.
actually, in RunPendingInitializers, the initializers are only called once and then being removed.
| std::string symbol_name = symbol::tvm_ffi_symbol_prefix + std::string(name); | ||
|
|
||
| // Try to get the symbol - return NullOpt if not found | ||
| if (void* symbol = GetSymbol(symbol_name)) { |
There was a problem hiding this comment.
same, RunPendingInitializers might be triggers too many times.
There was a problem hiding this comment.
same as above, the initializers are only called once and then being removed.
| if (is_init) { | ||
| session_->AddPendingInitializer(&jit_dylib, entry); | ||
| } else { | ||
| session_->AddPendingDeinitializer(&jit_dylib, entry); |
There was a problem hiding this comment.
there might be a data racing problem if pending_initializers_ and pending_deinitializers_ are unordered_maps, especially during materialization which happen lazily.
There was a problem hiding this comment.
LLVM LLJIT defaults to InPlaceTaskDispatcher, which means materialization runs synchronously in the calling thread. All sequential in the same thread, no concurrent access to the maps. A custom concurrent TaskDispatcher could introduce races, but that is not the configuration used here.
…tion - Break reference cycle: InitFiniPlugin now stores a raw pointer to ORCJITExecutionSessionObj instead of a ref-counted ORCJITExecutionSession. The plugin lifetime is bounded by LLJIT which is owned by the session, so the raw pointer is always valid. - Capture a strong self-ref (Module) in the Function lambda returned by GetFunction, preventing use-after-free if the dylib/session is destroyed while a caller still holds the Function.
| namespace ffi { | ||
| namespace orcjit { | ||
|
|
||
| inline void call_llvm(llvm::Error err, const std::string& context_msg = "") { |
There was a problem hiding this comment.
minor, CamelCase CallLLVM, also need documentation. To make it lazy, likely we want to keep TVM_FFI_ORCJIT_LLVM_CALL a macro like TVM_FFI_CUDA_CALL
| return c, cxx | ||
|
|
||
|
|
||
| def _build_all(llvm_prefix: str) -> None: |
There was a problem hiding this comment.
would be good to cross check if we can leverage (enhance) tvm_ffi.cpp.build_inline here(cc @yaoyaoding )
|
|
||
| from __future__ import annotations | ||
|
|
||
| import argparse |
There was a problem hiding this comment.
would be good to cross check if we can run this as normal pytest, via tvm_ffi.cpp.build_inline in temp dir and then load test
There was a problem hiding this comment.
tvm_ffi.cpp.build is now refactored, matching the torch api and supporting the object files as input
| -y | ||
| if ($LASTEXITCODE -ne 0) { throw "micromamba create failed" } | ||
|
|
||
| # Build static zstd from source. |
There was a problem hiding this comment.
consider bringing to addons/tvm-ffi-orcjit/scripts/install_llvm.ps1,
BTW, if we are using it for gh workflow, likely it can be simplified via setup miniconda action
https://github.com/apache/tvm/blob/main/.github/actions/setup/action.yml#L21
There was a problem hiding this comment.
scripts removed in new workflow
| # Install LLVM from conda-forge using micromamba. | ||
| # Usage: bash tools/install_llvm.sh [version] | ||
| # version defaults to LLVM_VERSION env var, then 22.1.0 | ||
| set -ex |
There was a problem hiding this comment.
simplify via setup-miniconda action
| concat = lib.get_function("concat") | ||
| result = concat("Hello, ", "World!") | ||
| print(f"concat('Hello, ', 'World!') = '{result}'") | ||
| assert result == "Hello, World!", f"Expected 'Hello, World!', got '{result}'" |
There was a problem hiding this comment.
move check to sub function so we don't need explicit del here
There was a problem hiding this comment.
extracted to the sub function and no more explicit del now
|
|
||
| # Use the installed package if available; fall back to source tree for editable dev | ||
| try: | ||
| from tvm_ffi_orcjit import ExecutionSession |
There was a problem hiding this comment.
keep things simple, let us just assume tvm_ffi_orcjit is installed
…sion - Item 3: move null checks before GetSymbol/VisitContextSymbols in constructor - Item 5: make dylib_counter_ std::atomic<int> for thread safety - Item 6: add comment explaining intentional LoadLibraryA without FreeLibrary - Item 7: fix docstrings referencing non-existent create_session() - Item 8: pass list(libraries) instead of tuple in set_link_order - Item 9: use importlib.metadata.version() instead of hardcoded __version__
|
f125c41 to
a2226bb
Compare
Cool! I saw your PR to the llvm.
If we load the MSVC CRT into the process, will the COFFPlatform work? |
The MSVC CRT DLLs are already loaded in the host process — our DLLImportDefinitionGenerator resolves symbols from vcruntime140, ucrtbase, msvcp140, etc. The problem is that COFFPlatform requires a COFF ORC runtime library (orc_rt) to be built and loaded. Looking at the LLVM source, COFFPlatform needs to resolve ~10 runtime entry points like The COFF ORC runtime does exist in compiler-rt, and it implements In principle, if someone completes the COFF ORC runtime (especially the exception interop), COFFPlatform could work. But that's been stalled for 2+ years in LLVM. Our current approach ( |
Explain why repeated jit_->initialize() calls on macOS do not cause refcount imbalance: LLJIT tracks InitializedDylib and switches from dlopen to dlupdate after the first call. Similarly, RunPendingInitializers on Linux/Windows is idempotent (drains map, no-op until new object files).
Move LLVM/zstd installation from CIBW_BEFORE_ALL scripts to a cached host-level step, and replace zstd source builds with conda-forge zstd-static package. Delete standalone install scripts.
Remove ZSTD_SOURCE_DIR fallback (now always find_library from conda prefix). Add CXX_VISIBILITY_PRESET hidden to match tvm-ffi convention.
Thank you for the clarification! Looks like we are on the edge of the orcjit's current development & support coverage. It's pretty cool you finally make it work. |
|
The PR looks good to me now, thanks @cyx-6 ! |
| int c; | ||
| int *d_a, *d_b, *d_c; | ||
| cudaMalloc(&d_a, sizeof(int)); | ||
| cudaMalloc(&d_b, sizeof(int)); |
There was a problem hiding this comment.
a betetr canonical examplw would be apssing DLTensor in from python side via pytorch
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces the tvm-ffi-orcjit addon, which enables dynamic loading and execution of compiled object files using LLVM ORC JIT v2. The implementation includes a C++ backend for session and library management, a Python wrapper, and a custom plugin to handle platform-specific static constructors and destructors. Additionally, the tvm_ffi.cpp build system is enhanced to support compiling sources into relocatable object files and handling pure C files. Review feedback focuses on minor grammatical corrections in the documentation and improving the accuracy of error messages regarding library search paths.
| relocations from COFF objects before JITLink graph building, working around a | ||
| JITLink limitation with COMDAT section symbols. | ||
|
|
||
| Please refers to [ORCJIT_PRIMER.md](./ORCJIT_PRIMER.md) to learn more about object file, linking, llvm orcjit v2, and how the addon works. |
There was a problem hiding this comment.
Minor grammatical correction: refers should be refer.
| Please refers to [ORCJIT_PRIMER.md](./ORCJIT_PRIMER.md) to learn more about object file, linking, llvm orcjit v2, and how the addon works. | |
| Please refer to [ORCJIT_PRIMER.md](./ORCJIT_PRIMER.md) to learn more about object file, linking, llvm orcjit v2, and how the addon works. |
| raise RuntimeError( | ||
| f"Could not find {_LIB_NAME}. " | ||
| f"Searched in {_LIB_PATH} and site-packages. " | ||
| f"Please ensure the package is installed correctly." | ||
| ) |
There was a problem hiding this comment.
The error message is slightly misleading. It states that it searched in site-packages, but the code only iterates through the paths defined in _LIB_PATH. To avoid confusion for users debugging loading issues, it would be better to list the actual paths that were searched.
| raise RuntimeError( | |
| f"Could not find {_LIB_NAME}. " | |
| f"Searched in {_LIB_PATH} and site-packages. " | |
| f"Please ensure the package is installed correctly." | |
| ) | |
| raise RuntimeError( | |
| f"Could not find {_LIB_NAME}. " | |
| f"Searched in: {_LIB_PATH}. " | |
| f"Please ensure the package is installed correctly." | |
| ) |
|
thanks @cyx-6 ! the PR is LGTM |
- Extract build logic from conftest.py into utils.py with build_test_objects() that compiles to /tmp - test_basic.py calls utils directly at module level - Delete conftest.py, run_all_tests.py, __init__.py - CI workflow runs pytest + example directly
…workflow - Extract orcjit build+test logic into .github/actions/build-orcjit-wheel/ - Add path-filtered orcjit job to ci_test.yml - Add publish_orcjit_wheel.yml for PyPI publishing - Remove standalone tvm_ffi_orcjit.yml (superseded)
This PR introduces a new addon package
tvm-ffi-orcjitthat enables dynamic loading of TVM-FFI exported object files (.o) at runtime using LLVM's ORC JIT v2 engine.The addon provides a Python API for loading compiled object files, load the tvm-ffi functions defined in the object files.
The API is organized around three main concepts:
.ofiles containing TVM-FFI exported functions.Usage Example
For incremental loading, you can add multiple object files to the same session:
See the test for more example.
TODO