Skip to content

loader,smp: Make sure MMU is enabled on all cores for cache coherency.#465

Open
bruelc wants to merge 1 commit intoseL4:mainfrom
bruelc:master-cpusync
Open

loader,smp: Make sure MMU is enabled on all cores for cache coherency.#465
bruelc wants to merge 1 commit intoseL4:mainfrom
bruelc:master-cpusync

Conversation

@bruelc
Copy link
Copy Markdown

@bruelc bruelc commented Apr 1, 2026

Hello,

This fixes an SMP boot issue observed on the STM32MP2 platform: core 0 does not start because print_lock is never seen as released by the secondary cores. The issue is related to cachability, which is not yet enabled at this stage.

Ensure all cores are in the same coherency domain before using cached shared data by enable MMU on all cores before using shared datas
Thus moved arch_mmu_enable() out of start_kernel().

Ensure all cores are in the same coherency domain before using cached
shared data.
Thus moved arch_mmu_enable() out of start_kernel().

Signed-off-by: Christian Bruel <christian.bruel@foss.st.com>
@bruelc bruelc mentioned this pull request Apr 1, 2026
Copy link
Copy Markdown
Contributor

@midnightveil midnightveil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be correct to me, but I'd like to confirm it against the ARM ARM first.

What is the shareability of these mappings, do you know? Hopefully we set it up as ISH.

Maybe I misunderstood @KurtWu10 but I was under the impression that this would be fine as we only disabled the MMU & L1 cache (or actually thinking about it I guess we also disable the cache itself. I was asking about how it works it you just disable the MMU, not the cache too)).

This probably would affect spin table booting too, though I'm inclined to leave it as is as nothing but raspberry pi uses it (looking at Linux it does a cache clean & invalidate to PoC there)

(Not something to fix here) this does make me realise I could have switched to the kernel page tables instead of disabling the uboot ones, as that would stop the loader loading things over the current page tables since it knows to not load over itself. Although we probably want to clean caches then.

@KurtWu10
Copy link
Copy Markdown
Contributor

KurtWu10 commented Apr 1, 2026

Initially, I thought this was correct, but now I am not sure whether enabling MMU is required for cache coherency.

According to the Arm ARM D8.2.12 The effects of disabling an address translation stage, assuming that the effective value of HCR_EL2.DC is 0, it is said that

($\textsf{R}_\textsf{WFZPW}$) ... the stage 1 translation assigns one of the following attributes to memory accesses:

  • For a data access, the Device-nGnRnE memory attribute.

then according to B2.3 Ordering requirements defined by the formal concurrency model,

In this section ..., all of the following are assumed:

  • One of the following applies:
    • ...
    • All PEs are part of the Full System domain and all Memory Effects are to Normal, Inner Non-Cacheable, Outer Non-Cacheable memory or Device memory as long as all barriers specify the Full System domain and there are no TLB invalidations.
  • All Memory Effects are to Normal, Inner Write-Back, Outer Write-Back memory. Behaviors in Ordering requirements defined by the formal concurrency model also apply to Memory Effects to memory with attributes other than Normal, Inner Write-Back, Outer Write-Back (for example, Device or Inner Write-Through, Outer Write-Through) as long as the behavior of the instruction which generated the Memory Effect is guaranteed for the memory type.

My interpretation is, with MMU disabled, data memory accesses use the device attribute, and cache coherency for them is provided for load-acquire and store-release instructions.


The reasoning above assumes that the effective value of HCR_EL2.DC is 0. This might not be always true, as only the switch_to_el1 function sets the value. By default, the field is said to reset to an unknown value.

@bruelc
Copy link
Copy Markdown
Author

bruelc commented Apr 2, 2026

My interpretation is, with MMU disabled, data memory accesses use the device attribute, and cache coherency for them is provided for load-acquire and store-release instructions.

I think you are correct here: cacheability and coherency can indeed apply regardless of MMU, as long as the cores are in the same shareability domain and cache attributes are consistent.

My mistake in the description was that I confused “enabling the caches” with “enabling the MMU”, because both are currently done inside enable_mmu().

A possible refinement would be to separate cache activation from the enable_mmu() with a new function called before accessing shared data. Which makes now 2 possible fixes :

  • Keep the existing patch, update the comments accordingly, and retitle the PR to mention cache instead of MMU
  • Keep the current implementation but split enable_mmu() with a new enable_cache() called before accessing the sync datas.

I am fine with either option

Thank you for raising this inconsistency!

@bruelc
Copy link
Copy Markdown
Author

bruelc commented Apr 2, 2026

This seems like it would be correct to me, but I'd like to confirm it against the ARM ARM first.

What is the shareability of these mappings, do you know? Hopefully we set it up as ISH.

In #464 I set inner shareability in the memory translation attributes

Maybe I misunderstood @KurtWu10 but I was under the impression that this would be fine as we only disabled the MMU & L1 cache (or actually thinking about it I guess we also disable the cache itself. I was asking about how it works it you just disable the MMU, not the cache too)).

That the point, we could disable the MMU and only enable the cache. But in this situation, I don't know what the default shareability settings. I tested on STM32MP and this works, but I'm afraid we could fall into SoC or undefined behavior.

Making sure attributes in the translation tables are set (thus with MMU enabled) is IMHO the best way to avoid ambiguity

This probably would affect spin table booting too, though I'm inclined to leave it as is as nothing but raspberry pi uses it (looking at Linux it does a cache clean & invalidate to PoC there)

(Not something to fix here) this does make me realise I could have switched to the kernel page tables instead of disabling the uboot ones, as that would stop the loader loading things over the current page tables since it knows to not load over itself. Although we probably want to clean caches then.

@midnightveil
Copy link
Copy Markdown
Contributor

Yes, I agree. I think it makes sense to turn MMU and caches on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants