Originally named Execution Process Metrics Collector
A set of Python programs to monitor, collect, and digest metrics of a given Linux process or command line, and its descendants. Initially developed for ELIXIR STEERS.
treecript: Process Tree Metrics Transcriptor
treecript/
├── treecript/ # Core Python package — all program logic lives here
├── installation/ # Constraints and requirements files for reproducible installs
├── legacy/ # Deprecated Bash scripts kept for historical reference
├── sample-series/ # Real metrics from a WfExS workflow execution, used in documentation examples
├── sample-charts/ # Pre-generated charts from the sample series, embedded in this README
├── sample-work-to-measure/ # Example scripts showing how to set up and run a measurement
├── sample_cpuinfo/ # Example /proc/cpuinfo files for testing the TDP finder programs
├── onboarding/ # Full worked example with metrics, charts and a step-by-step walkthrough
├── sample/ # Legacy single-process sample from 2018 (pre-treecript era)
└── tests/ # Unit tests
| Directory | Description |
|---|---|
treecript/ |
Core Python package — aggregator, collector, parser, plotter, TDP finder |
installation/ |
Per-version constraints files and requirements for reproducible installs |
legacy/ |
Deprecated Bash scripts superseded by the Python programs |
sample-series/ |
Real metrics collected from a WfExS workflow run, used throughout this README as examples |
sample-charts/ |
Pre-generated chart outputs (SVG/PDF/PNG) from the sample series |
sample-work-to-measure/ |
Ready-to-use scripts to download and run example workloads to measure |
sample_cpuinfo/ |
Example /proc/cpuinfo files (Intel and AMD) for testing cpuinfo-tdp-finder.py and modelname-tdp-finder.py without needing a real machine |
onboarding/ |
Self-contained worked example: a full metrics collection, chart generation and aggregation walkthrough for new users |
sample/ |
Legacy single-process sample from 2018, predating the current process-tree approach |
tests/ |
Unit tests for the core collector module |
- Linux OS (Ubuntu recommended)
- Python 3.9 or newer
- Git
| I want to... | Use |
|---|---|
| Keep things simple and already have Python installed | Option 1 — pip + venv |
| Already use conda or manage multiple projects/environments | Option 2 — Conda |
| Work on an HPC or shared cluster environment (e.g. BSC) | Option 2 — Conda |
| Work on a machine with a corporate or university firewall | Either — both have firewall notes in their respective sections |
The repository ships per-version constraints files under the installation/ directory to ensure a working set of dependencies. Pick the one that matches your setup:
| Situation | Constraints file to use |
|---|---|
| Native Linux, Python 3.9 | installation/constraints-3.9.txt |
| Native Linux, Python 3.10 | installation/constraints-3.10.txt |
| Native Linux, Python 3.11 | installation/constraints-3.11.txt |
| Native Linux, Python 3.12 | installation/constraints-3.12.txt |
| Ubuntu 22.04 on WSL (Windows) | installation/constraints-3.10_Ubuntu-22.04-wsl.txt |
| Ubuntu 24.04 on WSL (Windows) | installation/constraints-3.10_Ubuntu-24.04-wsl.txt |
WSL = Windows Subsystem for Linux — Ubuntu running inside Windows rather than directly on hardware. If you are running Ubuntu natively on your machine, use the plain constraints file. To check:
uname -r # if the output contains "microsoft" or "WSL", you are on WSL
To check your Python version:
python3 --versionUse this if you already have Python installed on your system and don't use conda. This is the lightest option — it creates an isolated Python environment using only tools that come built into Python, with no additional software required.
# 1. Create a virtual environment
python3 -m venv TREECRIPT
# 2. Activate it
source TREECRIPT/bin/activate
# 3. Upgrade pip and wheel
pip install --upgrade pip wheel
# 4. Download the constraints file for your Python version (adjust filename as needed)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt
# 5. Install treecript with constraints
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@execNetwork issues? If you are behind a corporate or university firewall (e.g. Fortiguard), add
--no-check-certificateto thewgetcommand.
To deactivate the environment:
deactivateTo reactivate later:
source TREECRIPT/bin/activateUse this if you already work with Anaconda or Miniconda, or if you prefer conda for managing environments across multiple projects. Conda handles both Python and system-level dependencies, which makes it particularly well suited for HPC or shared computing environments.
Miniconda is a minimal conda installer — it gives you the conda command and Python without bundling hundreds of extra packages like the full Anaconda distribution does.
# Download installer (add --no-check-certificate if behind a firewall)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# Run the installer
bash miniconda.sh
# - Accept the license
# - Accept the default install location
# - Answer "yes" when asked to update your shell profile
# Apply changes to current session
source ~/.bashrc
# Verify
conda --versionNetwork issues? If
repo.anaconda.comis blocked by your network, use Miniforge instead — it is functionally identical to Miniconda but downloads from GitHub and defaults to theconda-forgechannel, which is actually a better fit for the scientific packages treecript needs:wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniconda.sh bash miniconda.sh
# 1. Create a clean environment with Python 3.10
conda create -n treecript python=3.10 -y
# 2. Activate it
conda activate treecript
# 3. Download the constraints file (adjust filename for your Python version / OS)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt
# or for WSL Ubuntu 22.04:
# wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10_Ubuntu-22.04-wsl.txt
# 4. Install treecript and all dependencies in one shot
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@execTo deactivate:
conda deactivateTo remove the environment entirely:
conda deactivate
conda remove -n treecript --all -yRun this after either installation method to confirm all dependencies are working correctly:
python -c "
import psutil; print('psutil OK:', psutil.__version__)
import docker; print('docker OK:', docker.__version__)
import pandas; print('pandas OK:', pandas.__version__)
import networkx; print('networkx OK:', networkx.__version__)
import matplotlib; print('matplotlib OK:', matplotlib.__version__)
import adjustText; print('adjustText OK:', adjustText.__version__)
import treecript; print('treecript OK')
"# 1. Collect metrics for a command
execution-metrics-collector.py ~/my_metrics my_command --arg1 --arg2
# 2. Plot time series charts
plotGraph.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_charts/
# 3. Find your CPU's TDP
tdp-finder.py ~/my_metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv
# 4. Aggregate and estimate energy consumption
metrics-aggregator.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_agg/ 28.0execution-metrics-collector.py runs a command and monitors it and all its child processes:
execution-metrics-collector.py {base_metrics_directory} {command} {args...}Internally this launches the command, captures its PID, and calls process-metrics-collector.py with a sampling period of 1 second:
process-metrics-collector.py {pid} {base_metrics_directory} {sample_period}Example:
execution-metrics-collector.py ~/metrics python myscript.py --input data.txtplotGraph.py generates line charts for each monitored process, comparing time series of CPU, memory, I/O and other metrics:
plotGraph.py {metrics_directory} {output_directory}Example:
plotGraph.py ~/metrics/2025_01_01-00_00-12345/ ~/charts/Three programs are available depending on what information you have:
tdp-finder.py {metrics_directory} {csv_files...}Example:
tdp-finder.py ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csvUse -q for quiet mode (outputs only the TDP value, useful for scripting):
tdp-finder.py -q ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv
# Output: 28.0Does not require a metrics directory:
cpuinfo-tdp-finder.py /proc/cpuinfo cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csvOr from a saved copy of /proc/cpuinfo — the sample_cpuinfo/ directory contains example files for Intel and AMD processors you can use for testing:
cpuinfo-tdp-finder.py sample_cpuinfo/cpuinfo-amd.txt cpu-spec-dataset_Josua/dataset/*.csvmodelname-tdp-finder.py "11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz" cpu-spec-dataset_Josua/dataset/*.csv
modelname-tdp-finder.py "AMD EPYC 7742 64-Core Processor" cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csvmetrics-aggregator.py digests the collected time series and estimates energy consumption per process subtree. It requires the CPU TDP value in Watts.
metrics-aggregator.py {metrics_directory} {output_directory} {TDP_watts} [command_filter]The optional command_filter argument filters results to show only processes whose command matches the string (e.g. "docker run" to focus on Docker steps).
Example using the included sample series:
metrics-aggregator.py sample-series/Wetlab2Variations_metrics/2025_05_20-02_19-14001/ dest_directory 28.0 "docker run"The output directory will contain:
- A table of energy consumption per task (stdout)
graph.pdf/graph.svg— process call graph as a treespiral-graph.pdf/spiral-graph.svg— process call graph as a spiralconsumptions.pdf/consumptions.svg— barplot of task energy and durationtimeline.pdf/timeline.svg— lollipop chart of task start, duration, and end
The TDP programs require one or more CPU specification datasets to look up processor TDP values. Three sources are supported:
Recommended — JosuaCarl fork (better column names):
git clone https://github.com/JosuaCarl/cpu-spec-dataset cpu-spec-dataset_JosuaAlternative — original felixsteinke repo:
git clone https://github.com/felixsteinke/cpu-spec-datasetCPUBenchmark scrape (good coverage for AMD server CPUs):
python -m treecript.tdp_sources cpumark_table.csvYou can pass multiple sources to the TDP programs and they will be tried in order:
tdp-finder.py ~/metrics/dir/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csvEach execution-metrics-collector.py run creates a subdirectory named after the start timestamp and PID. It contains:
| File | Description |
|---|---|
reference_pid.txt |
PID of the root process being monitored |
sampling-rate-seconds.txt |
Sampling rate in seconds (usually 1) |
pids.txt |
Table of all spawned processes with timestamps and parent PIDs |
agg_metrics.tsv |
Time series of aggregated metrics across all processes |
metrics-{pid}_{create_time}.csv |
Per-process time series metrics |
command-{pid}_{create_time}.txt |
Linearized command line for each process |
command-{pid}_{create_time}.json |
JSON representation of the command line |
cpu_details.json |
Physical CPU information from /proc/cpuinfo |
core_affinity.json |
Processor-to-core-to-CPU mapping derived from /proc/cpuinfo |
| Column | Description |
|---|---|
Time |
Sample timestamp |
PID |
Process ID |
Virt |
Virtual memory size (matches top VIRT) |
Res |
Resident set size — non-swapped physical memory (matches top RES) |
CPU |
CPU utilization as a percentage (can exceed 100% for multithreaded processes) |
Memory |
RSS memory as a percentage of total physical system memory |
TCP connections |
Number of open TCP connections |
Thread Count |
Number of threads (non-cumulative) |
User |
Time spent in user mode (seconds) |
System |
Time spent in kernel mode (seconds) |
Children_User |
User time of child processes (always 0 on Windows/macOS) |
Children_System |
System time of child processes (always 0 on Windows/macOS) |
IO |
Time waiting for blocking I/O (Linux only) |
uss |
Unique Set Size — memory freed if this process terminated now |
swap |
Memory swapped out to disk |
processor_num |
Number of unique CPU processors used |
core_num |
Number of unique CPU cores used |
cpu_num |
Number of unique physical CPUs used |
processor_ids |
IDs of CPU processors used (space-separated) |
core_ids |
IDs of CPU cores used (space-separated) |
cpu_ids |
IDs of physical CPUs used (space-separated) |
process_status |
Process status string (e.g. sleeping, running) |
read_count |
Cumulative number of read syscalls |
write_count |
Cumulative number of write syscalls |
read_bytes |
Bytes physically read from disk (cumulative) |
write_bytes |
Bytes physically written to disk (cumulative) |
read_chars |
Bytes passed to read syscalls (cumulative, Linux only) |
write_chars |
Bytes passed to write syscalls (cumulative, Linux only) |
Each row is a 1-second sample across all monitored processes combined:
| Column | Description |
|---|---|
| Timestamp | Sample time |
| Number of PIDs | Processes monitored at that moment |
| Threads | Total thread count |
| Processors | Number of distinct CPU processors in use |
| Cores | Number of distinct CPU cores in use |
| Physical CPUs | Number of distinct physical CPUs in use |
| CPU IDs | IDs of physical CPUs (space-separated) |
| User memory | Total user memory across all processes |
| Swap memory | Total swap memory across all processes |
| Read ops | Total read operations |
| Write ops | Total write operations |
| Read bytes | Bytes physically read |
| Write bytes | Bytes physically written |
| Read chars | Bytes passed to read syscalls |
| Write chars | Bytes passed to write syscalls |
The legacy/ directory contains older Bash-based scripts that predate the current Python implementation. They are kept for historical reference but are no longer maintained or recommended.
The original Bash wrapper for launching a command and monitoring it. It runs the command in the background, captures the PID, and calls process-metrics-collector.py directly:
./legacy/execution-metrics-collector.sh {base_metrics_directory} {command} {args...}This has been superseded by execution-metrics-collector.py, which provides the same functionality in a more portable and maintainable way. The sample series included in this repository was originally collected using this script:
~/projects/treecript/legacy/execution-metrics-collector.sh \
~/projects/treecript/Wetlab2Variations_metrics \
python WfExS-backend.py -L workflow_examples/local_config.yaml \
staged-workdir offline-exec 01a1db90-1508-4bad-beb7-7f7989838542The original gnuplot-based visualization script. It reads the collected CSV files and generates .pdf charts using gnuplot (requires apt install gnuplot). It has been superseded by plotGraph.py, which generates richer charts without requiring gnuplot.
./legacy/plotGraph.sh {metrics_csv_files...}An earlier helper script for plotting individual metric files. Also superseded by plotGraph.py.
These scripts are no longer actively maintained. For all new usage, prefer the Python equivalents.
Licensed under GNU GPL v3.
This repository is a fork and evolution of chamilad/process-metrics-collector.