fix(test): use sys_yield() instead of sys_sleep() in balance system test#199
Merged
MRNIU merged 3932 commits intoSimple-XX:mainfrom Mar 20, 2026
Merged
fix(test): use sys_yield() instead of sys_sleep() in balance system test#199MRNIU merged 3932 commits intoSimple-XX:mainfrom
MRNIU merged 3932 commits intoSimple-XX:mainfrom
Conversation
…e interrupt controllers Interrupt members Move arch-specific singleton type aliases from shared kernel.h into per-arch directories, and convert interrupt controller singletons (PlicSingleton, ApicSingleton) into private members of each arch's Interrupt class, following the existing aarch64 pattern where Gic is already an Interrupt member. - Move Pl011Singleton to src/arch/aarch64/include/pl011_singleton.h - Move SerialSingleton to file-local scope in x86_64/early_console.cpp - Move Ns16550aSingleton to file-local scope in riscv64/interrupt_main.cpp - Add Plic plic_ member to riscv64 Interrupt with InitPlic() deferred init - Add Apic apic_ member to x86_64 Interrupt with InitApic() deferred init - Move APIC creation from ArchInit() to InterruptInit() (boot order fix) - Remove arch-specific #includes and #ifdefs from kernel.h Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Since Interrupt is used as etl::singleton (only one instance), static class members are semantically equivalent to non-static members. Remove static to eliminate the need for out-of-class definitions in .cpp files. - aarch64: interrupt_handlers -> interrupt_handlers_ (non-static member) - riscv64: interrupt_handlers_, exception_handlers_ (drop static + defs) - x86_64: interrupt_handlers_, idts_ (drop static + defs, keep alignas) The alignas(4096) on x86_64 members propagates correctly through etl::singleton via uninitialized_buffer_of<T> which uses alignas(etl::alignment_of<T>::value) on its storage. Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…irectories Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
- C1: Add ReapTask(current) for orphan tasks in Exit() to prevent TCB leak - C2: Start FSM after default-constructing TCB in Clone() to avoid null deref - I2: Use STATE_ID constant in StateExited::on_event(MsgReap) - I3: Move GetStatus() implementation from header to .cpp file - I4: Enqueue idle task in kReady state, then transition to kRunning - M2: Restore dropped @todo SIGCHLD comment in exit.cpp - M5: Add [[nodiscard]] attribute to GetStatus() Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…_router Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…ification Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…ork injection Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…config override Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
U-Boot's image.h requires openssl/evp.h for FIT image signing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…uild OP-TEE's build system requires aarch64-linux-gnu-cpp which was not symlinked via update-alternatives. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Add kSyscallSchedGetaffinity and kSyscallSchedSetaffinity constants for all architectures. Add dispatcher cases for sys_kill, sys_sigaction, sys_sigprocmask, sys_sched_getaffinity, and sys_sched_setaffinity. Implement sys_kill, sys_sigaction, and sys_sigprocmask function bodies that delegate to TaskManager signal methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
- signal_test: SIGTERM/SIGKILL default, SIG_IGN, sigprocmask, error paths - affinity_test: get/set affinity syscalls, cross-task, error paths - tick_test: tick increment, sleep timing, runtime tracking - zombie_reap_test: zombie reaping, orphan reparenting, multi-child Wait - stress_test: 20 concurrent tasks, wait non-child, rapid create-exit Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
…w clone - Split single serial job into parallel build-riscv64 + build-aarch64 jobs - Add dev-image.yml workflow to build/push dev container to GHCR - Replace devcontainers/ci per-step with container: for shared container - Use shallow clone (fetch-depth: 1) and shallow submodules (--depth 1) - Add CMake build cache via actions/cache - Reduce system test runs to 3 for PRs (10 for push/release) - Add concurrency group to cancel superseded runs - Upgrade codecov-action v3->v4, actions-gh-pages v3->v4 Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
The riscv64 system test step only checked the cmake exit code, which is always 0 even when individual tests fail inside QEMU. Align with the aarch64 approach: capture output to file, grep for "Failed: 0" to determine pass/fail. Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes scheduler load-balancing visibility in the balance system test by replacing sleep-based worker loops with yield-based loops, and it also introduces broader kernel scheduling/tasking improvements (work stealing, signal plumbing, test-suite expansion) alongside removing x86_64-related build/tooling paths.
Changes:
- Update balance system test workers to use
sys_yield()and longer runtimes so tasks remain visible toBalance()and survive multiple balance intervals. - Implement/enable core work-stealing (
TaskManager::Balance()), improve wait/block/wakeup semantics, and add basic signal support + new system tests. - Remove x86_64 toolchain/build paths and update docs/CI/devcontainer to reflect RISC-V + AArch64 focus.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tools/x86_64_qemu_virt.its.in | Removed x86_64 FIT template |
| tools/x86_64_boot_scr.txt | Removed x86_64 U-Boot boot script |
| tools/README.md | Removed x86_64 tooling references |
| tools/Dockerfile | Removed old tools Dockerfile |
| tools/.pre-commit-config.yaml.in | Adjusted commented clang-tidy filters |
| tests/unit_test/balance_test.cpp | Added unit tests for RR queue stealing primitives |
| tests/unit_test/README.md | Removed x86_64 unit test reference |
| tests/unit_test/CMakeLists.txt | Added balance_test.cpp to unit tests |
| tests/system_test/yield_test.cpp | Added sys_yield() system test |
| tests/system_test/wait_system_test.cpp | Adjusted child PID handling; renamed entry to wait_test() |
| tests/system_test/tick_test.cpp | Added tick/sleep/runtime tracking system tests |
| tests/system_test/thread_group_system_test.cpp | Refactored thread group tests; renamed entry to thread_group_test() |
| tests/system_test/system_test.h | Expanded test registry, updated QEMU exit, added includes |
| tests/system_test/stress_test.cpp | Added stress system tests (many tasks / wait errors / churn) |
| tests/system_test/spinlock_test.cpp | Tightened SMP barrier semantics and return value |
| tests/system_test/ramfs_system_test.cpp | Renamed to ramfs_test() and updated messages |
| tests/system_test/main.cpp | Expanded test list; improved runner PID capture; added primary-boot guard |
| tests/system_test/fork_test.cpp | Added fork system tests |
| tests/system_test/exit_system_test.cpp | Renamed to exit_test(); adjusted local PID allocation patterns |
| tests/system_test/ctor_dtor_test.cpp | Removed AArch64 FPU setup call |
| tests/system_test/clone_system_test.cpp | Renamed to clone_test(); reap children to avoid runner miscount |
| tests/system_test/balance_test.cpp | Added/updated balance system test using sys_yield() |
| tests/system_test/affinity_test.cpp | Added affinity system tests |
| tests/system_test/CMakeLists.txt | Added many new system tests; removed x86_64 QEMU flags branch |
| tests/integration_test/aarch64_minimal/main.cpp | Updated description; removed local FPU setup routine |
| tests/integration_test/CMakeLists.txt | Removed x86_64 QEMU boot flags branch |
| tests/AGENTS.md | Updated unit-test invocation docs |
| src/task/wakeup.cpp | Refactored wakeup to support per-core wake and added WakeupOne() |
| src/task/wait.cpp | Reworked wait locking/blocking to avoid lost wakeups; added ECHILD error |
| src/task/tick_update.cpp | Call Balance() every 64 ticks |
| src/task/task_manager.cpp | Implemented TaskManager::Balance() work stealing |
| src/task/sleep.cpp | Check pending signals after wake |
| src/task/signal.cpp | Added basic signal support in TaskManager |
| src/task/schedule.cpp | Added kernel_thread_bootstrap(); moved scheduler_started set under lock |
| src/task/mutex.cpp | Improved mutex lock path to avoid lost wakeups; use WakeupOne() |
| src/task/include/task_manager.hpp | Added signal/wakeup APIs and doc; moved GetCurrentCpuSched() |
| src/task/include/task_fsm.hpp | Added atomic cached state for cross-core safe reads |
| src/task/include/task_control_block.hpp | Added SignalState to task aux data |
| src/task/include/scheduler_base.hpp | Declared kernel_thread_bootstrap() for arch entry stubs |
| src/task/exit.cpp | Adjusted exit wake ordering and reparent timing |
| src/task/block.cpp | Added Block(CpuSchedData&, ...) overload to avoid lost wakeups |
| src/task/CMakeLists.txt | Added signal.cpp to build |
| src/task/AGENTS.md | Updated docs to reflect Balance implementation |
| src/syscall.cpp | Added signal + affinity syscalls to dispatcher and implementations |
| src/memory/memory.cpp | Added BmallocLock for allocator thread-safety |
| src/memory/include/virtual_memory.hpp | Updated supported arch list to remove x86_64 mention |
| src/main.cpp | Removed ad-hoc test tasks; added primary-boot guard |
| src/libc/sk_stdlib.c | Removed x86_64/SSE gating for strtod |
| src/libc/include/sk_stdlib.h | Removed x86_64/SSE gating for strtof docs block |
| src/include/syscall.hpp | Added signal/affinity syscall numbers and APIs; removed x86_64 numbering |
| src/include/signal.hpp | Added signal definitions and SignalState |
| src/include/kernel_config.hpp | Increased task/scheduler capacity constants |
| src/include/interrupt_base.h | Updated doc to remove x86_64/APIC mention |
| src/include/expected.hpp | Removed APIC error codes; added signal error codes |
| src/filesystem/vfs/open.cpp | Call FileOps::Open() to prepare FS-specific handle |
| src/filesystem/vfs/include/vfs_types.hpp | Added default FileOps::Open() hook and doc tweaks |
| src/filesystem/fatfs/include/fatfs.hpp | Added FatFsFileOps::Open() override declaration |
| src/filesystem/fatfs/fatfs.cpp | Treat FR_EXIST on mkdir as success; implement FatFS file open hook |
| src/arch/x86_64/timer.cpp | Removed x86_64 timer stub |
| src/arch/x86_64/syscall.cpp | Removed x86_64 syscall stub |
| src/arch/x86_64/switch.S | Removed x86_64 switch stub |
| src/arch/x86_64/macro.S | Removed x86_64 macro stub |
| src/arch/x86_64/interrupt_main.cpp | Removed x86_64 interrupt implementation |
| src/arch/x86_64/interrupt.cpp | Removed x86_64 interrupt implementation |
| src/arch/x86_64/interrupt.S | Removed x86_64 trap return stub |
| src/arch/x86_64/include/sipi.h | Removed x86_64 SIPI header |
| src/arch/x86_64/include/interrupt.h | Removed x86_64 interrupt header |
| src/arch/x86_64/early_console.cpp | Removed x86_64 early console |
| src/arch/x86_64/boot.S | Removed x86_64 boot code |
| src/arch/x86_64/backtrace.cpp | Removed x86_64 backtrace |
| src/arch/x86_64/arch_main.cpp | Removed x86_64 arch init |
| src/arch/x86_64/apic/io_apic.cpp | Removed x86_64 IO APIC driver |
| src/arch/x86_64/apic/include/io_apic.h | Removed x86_64 IO APIC header |
| src/arch/x86_64/apic/include/apic.h | Removed x86_64 APIC header |
| src/arch/x86_64/apic/apic.cpp | Removed x86_64 APIC implementation |
| src/arch/x86_64/apic/README.md | Removed x86_64 APIC docs |
| src/arch/x86_64/apic/CMakeLists.txt | Removed x86_64 APIC build target |
| src/arch/riscv64/timer.cpp | Call CheckPendingSignals() from timer tick |
| src/arch/riscv64/switch.S | Route new thread entry through kernel_thread_bootstrap() |
| src/arch/riscv64/macro.S | Reduced saved trap/callee registers (removed FP saves) |
| src/arch/riscv64/link.ld | Comment formatting update |
| src/arch/riscv64/interrupt.S | Updated trap context offset usage |
| src/arch/aarch64/timer.cpp | Call CheckPendingSignals() from timer tick |
| src/arch/aarch64/switch.S | Route new thread entry through kernel_thread_bootstrap() |
| src/arch/aarch64/macro.S | Reduced trap/callee context sizes (removed FP/SIMD saves) |
| src/arch/aarch64/link.ld | Comment formatting update |
| src/arch/aarch64/interrupt_main.cpp | Handle spurious IRQ IDs explicitly |
| src/arch/aarch64/interrupt.cpp | Changed EOIR write ordering |
| src/arch/aarch64/interrupt.S | Updated trap context offsets for return path |
| src/arch/README.md | Updated to reflect only riscv64/aarch64 support |
| src/arch/CMakeLists.txt | Removed x86_64 subdir linkage |
| src/arch/AGENTS.md | Updated arch structure/docs after x86_64 removal |
| src/CMakeLists.txt | Removed x86_64 QEMU boot flags branch |
| docs/docker.md | Updated commands to riscv64 defaults; removed x86_64 QEMU mention |
| cmake/x86_64-gcc.cmake | Removed x86_64 toolchain file |
| cmake/functions.cmake | Ensure /srv/tftp exists before linking files |
| cmake/compile_config.cmake | Removed x86_64-specific compile/link options |
| cmake/3rd.cmake | Removed x86_64 U-Boot defconfig |
| README_ENG.md | Updated docs to remove x86_64 support claims |
| README.md | Updated docs to remove x86_64 support claims |
| CMakePresets.json | Removed x86_64 preset; updated QEMU flags/device |
| AGENTS.md | Updated repo-level docs for two-arch support |
| 3rd/cpu_io | Updated cpu_io submodule revision |
| .gitignore | Added .claude ignore |
| .github/workflows/workflow.yml | Reworked CI to riscv64/aarch64 builds + repeated system tests + publish step |
| .github/workflows/dev-image.yml | Added workflow to build/push devcontainer image |
| .devcontainer/devcontainer.json | Removed x86_64 assembly extension recommendation |
| .devcontainer/Dockerfile | Updated devcontainer packages; removed x86_64 toolchain/QEMU; ensure /srv/tftp |
Comments suppressed due to low confidence (2)
tests/system_test/main.cpp:1
- Using
test_and_set(std::memory_order_acquire)does not publish the primary core's initialization to secondary cores (there is no matching release operation on the write). For a one-time init guard, usestd::memory_order_acq_rel(orreleaseon the writer andacquireon readers) so other cores reliably observe initialization effects.
tests/system_test/balance_test.cpp:1 - The PR title/description focus on a narrow test fix (swap
sys_sleep()→sys_yield()in balance system test), but the diff also includes substantial changes: implementingTaskManager::Balance(), adding a signal subsystem, resizing kernel limits, adding many new system tests, changing CI, and removing x86_64 support/tooling. Consider updating the PR description/title to reflect the full scope, or splitting into smaller PRs so the balance-test fix can be reviewed and landed independently.
Workers in tight sys_yield() loops create extreme scheduling pressure, triggering a recursive spinlock panic on sched_lock. The window between Schedule()'s UnLock (interrupts restored) and switch_to allows a timer interrupt to re-enter the scheduler path. Fix: batch 10 yields (visible to Balance()) with 1ms sleeps (reduces lock contention). Total lifetime still covers multiple Balance() cycles. Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
Previously CI grepped for 'Failed: 0' which matches even when tests time out (e.g. 'Failed: 0 | Timeout: 1'), silently passing a hung test. Now the kernel test runner prints 'RESULT: ALL TESTS PASSED' only when every test passes with no failures and no timeouts. CI greps for this marker instead, correctly catching all failure modes: - Test assertion failures (no marker printed) - Kernel PANIC/deadlock (no output at all, timeout 300 kills QEMU) - Individual test hangs (runner marks as Timeout, marker not printed) Signed-off-by: Niu Zhihong <zhihong@nzhnb.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
sys_sleep()替换为sys_yield(),使任务对负载均衡器可见问题原因
Balance()通过GetQueueSize()检查各核心的调度器就绪队列来判断负载。但sys_sleep()会将任务从就绪队列移到独立的sleeping_tasks优先队列中,导致:sys_sleep(10)→ 移出ready_queue_,进入sleeping_tasksBalance()每 64 tick 运行,检查GetQueueSize()→ 返回 0(sleeping 任务不可见)cores_used < 2→ 测试失败修复方案
imbalance_worker:sys_sleep(10)→sys_yield(),迭代次数 20 → 2000affinity_pinned_worker:sys_sleep(10)→sys_yield(),迭代次数 10 → 1000sys_yield()调用Schedule()后任务留在就绪队列中,对Balance()可见测试
make SimpleKernel编译通过(riscv64)