Skip to content

HEAPLab/embedded_parallel_programming

Repository files navigation

Embedded Parallel Programming

This repository contains the replication package of the research paper "Parallel Programming Models on Microcontrollers". The following sections illustrate where the patches to the OpenMP and SYCL runtimes can be found, and how to reproduce the experiments on a Raspberry Pi Pico RP2040.

Patches to OpenMP and SYCL runtimes to support microcontrollers

The OpenMP runtime (libgomp for GCC and libomp for LLVM) is compiled together with the compiler, so when installing the arm-miosix-eabi-gcc compiler you are also installing the correspongding runtime.

The compiler patches required to run OpenMP on microcontrollers are in the openmp-runtime-patches directory.

For what concerns SYCL, the AdaptiveCpp runtime is separate from the compiler, and is in the AdaptiveCpp-Embedded git submodule. The modifications required to adapt AdaptiveCpp to microcontrollers are in the git commit history.

Dependencies

To install the necessary dependencies, you can run:

sudo apt install cmake ninja-build python3.12

or an apt equivalent for non-debian linux distributions

Optionally, also install ccache to cache the miosix-llvm build

sudo apt install ccache

Install picotool https://github.com/raspberrypi/picotool

Install openocd patched for the raspberrypi https://github.com/raspberrypi/openocd

Remember to install picotool's udev rules and to add your user to plugdev and dialout groups in order to be able to call picotool and screen /dev/ttyACM0 without sudo. Also create file /etc/udev/rules.d/60-openocd.rules containing

SUBSYSTEM=="usb", ATTRS{idVendor}=="2e8a", ATTRS{idProduct}=="000c", MODE="0666", TAG+="uaccess"

to call openocd without sudo. Reboot for the changes to take effect.

Installation

Open a shell in the root directory of this repository

Fetch the Miosix and AdaptiveCpp submodules with git submodule update --init --recursive

Access the compiler directory with cd miosix-kernel/miosix/_tools/compiler

Install the Miosix gcc compiler (either use the binary release provided as part of this repository, or build it from source using the installation scripts provided together with the kernel in the miosix-kernel/miosix/_tools/compiler directory):

sh MiosixToolchainInstaller9.2.0mp3.4.run

Compile and install the Miosix llvm compiler. The install-script is set to use ccache by default: if you didn't install it, set USE_CCACHE=0 inside the script, otherwise it will stop early.

cd llvm-18
./download.sh 
./install-script.sh

Compile and install AdaptiveCpp:

cd ../../../../../AdaptiveCpp-Embedded
mkdir build
cd build
cmake -DCOMPILE_FOR_MICROCONTROLLERS=ON \
      -DCMAKE_INSTALL_PREFIX=/opt/miosix-adaptivecpp \
      -DCMAKE_TOOLCHAIN_FILE=../../miosix-kernel/miosix/cmake/Toolchains/clang.cmake \
      -DCMAKE_C_FLAGS="-mcpu=cortex-m0plus -mthumb" \
      -DCMAKE_CXX_FLAGS="-mcpu=cortex-m0plus -mthumb" ..
make -j`nproc`
sudo make install

Tests

Open a shell in the root directory of this repository and access the tests directory with cd tests. Note that make <target>_program flashes the microcontroller. You can see the program output on the serial port to which the microcontroller is connected.

Test that Miosix is working:

cd miosix
mkdir build
cd build
cmake ..
make -j`nproc`
make main_program
cd ..

The output should be similar to this:

Starting Kernel... Ok
Miosix v3.0devel1 (rp2040_raspberry_pi_pico, Jun  5 2025 15:53:43)
Mounting MountpointFs as / ... Ok
Mounting DevFs as /dev ... Ok
Mounting Fat32Fs as /sd ... Failed
OS Timer freq = 48000000 Hz
Available heap 259344 out of 266888 Bytes
Hello world!

Test that OpenMP is working:

cd openmp
mkdir build
cd build
cmake ..
make -j`nproc`
make main_program
cd ..

The output should be similar to this:

Starting Kernel... Ok
Miosix v3.0devel1 (rp2040_raspberry_pi_pico, Jun  5 2025 16:32:39)
Mounting MountpointFs as / ... Ok
Mounting DevFs as /dev ... Ok
Mounting Fat32Fs as /sd ... Failed
OS Timer freq = 48000000 Hz
Available heap 249096 out of 256728 Bytes
Hello from the main thread! Using 2 threads.
Iteration 0 executed by thread 0
Iteration 5 executed by thread 1
Iteration 1 executed by thread 0
Iteration 6 executed by thread 1
Iteration 2 executed by thread 0
Iteration 7 executed by thread 1
Iteration 3 executed by thread 0
Iteration 8 executed by thread 1
Iteration 4 executed by thread 0
Iteration 9 executed by thread 1
Total parallel time: 6.00904
Iteration 0 executed by thread 0
Iteration 1 executed by thread 0
Iteration 2 executed by thread 0
Iteration 3 executed by thread 0
Iteration 4 executed by thread 0
Iteration 5 executed by thread 0
Iteration 6 executed by thread 0
Iteration 7 executed by thread 0
Iteration 8 executed by thread 0
Iteration 9 executed by thread 0
Total sequential time: 10.04

Test that Sycl is working:

cd sycl-library-only
mkdir build
cd build
cmake .. -DCMAKE_CXX_STANDARD=17
make -j`nproc`
make main_program
cd ..

The output should be similar to this:

Starting Kernel... Ok
Miosix v3.0devel1 (rp2040_raspberry_pi_pico, Jun  5 2025 16:43:12)
Mounting MountpointFs as / ... Ok
Mounting DevFs as /dev ... Ok
Mounting Fat32Fs as /sd ... Failed
OS Timer freq = 48000000 Hz
Available heap 235848 out of 255920 Bytes
Using 2 cores
Device: AdaptiveCpp OpenMP host device
[AdaptiveCpp Warning] This application uses SYCL buffers; the SYCL buffer-accessor model is well-known to introduce unnecessary overheads. Please consider migrating to the SYCL2020 USM model, in particular device USM (sycl::malloc_device) combined with in-order queues for more performance. See the AdaptiveCpp performance guide for more information: 
https://github.com/AdaptiveCpp/AdaptiveCpp/blob/develop/doc/performance.md
Total time: 0.838656

C:
200 200 200 200 200 
200 200 200 200 200 
200 200 200 200 200 
200 200 200 200 200 
200 200 200 200 200 

matmul successfully completed
Stack memory statistics.
Size: 16384
Used (current/max): 1792/3080
Free (current/min): 14592/13304
Heap memory statistics.
Size: 255920
Used (current/max): 202096/202672
Free (current/min): 53824/53248

Benchmarks

Open a shell in the root directory of this repository and access the polybench directory with cd benchmarks/polybench

Connect the rp2040 to the host using a debug probe or a second rp2040 as debugger like explained in https://datasheets.raspberrypi.com/pico/getting-started-with-pico.pdf

Open a separate shell in the current directory, start openocd and leave it open

openocd -f ../../miosix-kernel/miosix/arch/cortexM0plus_rp2040/rp2040_raspberry_pi_pico/openocd.cfg

Identify the device in your filesystem and set its path inside run.py, modifying the line

device = "/dev/ttyACM0"

Install python requirements with python3.12 -m pip install -r requirements.txt

Run the benchmarks with python3.12 run.py

Observe the results in the results directory

After this process, also the additional binary bench-runner-rt will be available in directories

  • benchmarks/polybench/openmp/build_gcc
  • benchmarks/polybench/openmp/build_clang
  • benchmarks/polybench/sycl/build_gcc
  • benchmarks/polybench/sycl/build_clang

You can flash those additional binaries to run the benchmarks co-scheduled with two real time threads and observe that no deadlines are missed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors