`treecript`: Process Tree Metrics Transcriptor

Originally named Execution Process Metrics Collector

A set of Python programs to monitor, collect, and digest metrics of a given Linux process or command line, and its descendants. Initially developed for ELIXIR STEERS.

Repository Structure

treecript/
├── treecript/               # Core Python package — all program logic lives here
├── installation/            # Constraints and requirements files for reproducible installs
├── legacy/                  # Deprecated Bash scripts kept for historical reference
├── sample-series/           # Real metrics from a WfExS workflow execution, used in documentation examples
├── sample-charts/           # Pre-generated charts from the sample series, embedded in this README
├── sample-work-to-measure/  # Example scripts showing how to set up and run a measurement
├── sample_cpuinfo/          # Example /proc/cpuinfo files for testing the TDP finder programs
├── onboarding/              # Full worked example with metrics, charts and a step-by-step walkthrough
├── sample/                  # Legacy single-process sample from 2018 (pre-treecript era)
└── tests/                   # Unit tests

Directory	Description
`treecript/`	Core Python package — aggregator, collector, parser, plotter, TDP finder
`installation/`	Per-version constraints files and requirements for reproducible installs
`legacy/`	Deprecated Bash scripts superseded by the Python programs
`sample-series/`	Real metrics collected from a WfExS workflow run, used throughout this README as examples
`sample-charts/`	Pre-generated chart outputs (SVG/PDF/PNG) from the sample series
`sample-work-to-measure/`	Ready-to-use scripts to download and run example workloads to measure
`sample_cpuinfo/`	Example `/proc/cpuinfo` files (Intel and AMD) for testing `cpuinfo-tdp-finder.py` and `modelname-tdp-finder.py` without needing a real machine
`onboarding/`	Self-contained worked example: a full metrics collection, chart generation and aggregation walkthrough for new users
`sample/`	Legacy single-process sample from 2018, predating the current process-tree approach
`tests/`	Unit tests for the core collector module

Installation

Prerequisites

Linux OS (Ubuntu recommended)
Python 3.9 or newer
Git

Not sure which installation method to use?

I want to...	Use
Keep things simple and already have Python installed	Option 1 — pip + venv
Already use conda or manage multiple projects/environments	Option 2 — Conda
Work on an HPC or shared cluster environment (e.g. BSC)	Option 2 — Conda
Work on a machine with a corporate or university firewall	Either — both have firewall notes in their respective sections

Choosing a constraints file

The repository ships per-version constraints files under the installation/ directory to ensure a working set of dependencies. Pick the one that matches your setup:

Situation	Constraints file to use
Native Linux, Python 3.9	`installation/constraints-3.9.txt`
Native Linux, Python 3.10	`installation/constraints-3.10.txt`
Native Linux, Python 3.11	`installation/constraints-3.11.txt`
Native Linux, Python 3.12	`installation/constraints-3.12.txt`
Ubuntu 22.04 on WSL (Windows)	`installation/constraints-3.10_Ubuntu-22.04-wsl.txt`
Ubuntu 24.04 on WSL (Windows)	`installation/constraints-3.10_Ubuntu-24.04-wsl.txt`

WSL = Windows Subsystem for Linux — Ubuntu running inside Windows rather than directly on hardware. If you are running Ubuntu natively on your machine, use the plain constraints file. To check:
uname -r  # if the output contains "microsoft" or "WSL", you are on WSL

To check your Python version:

python3 --version

Option 1: pip + virtual environment (venv)

Use this if you already have Python installed on your system and don't use conda. This is the lightest option — it creates an isolated Python environment using only tools that come built into Python, with no additional software required.

# 1. Create a virtual environment
python3 -m venv TREECRIPT

# 2. Activate it
source TREECRIPT/bin/activate

# 3. Upgrade pip and wheel
pip install --upgrade pip wheel

# 4. Download the constraints file for your Python version (adjust filename as needed)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt

# 5. Install treecript with constraints
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec

Network issues? If you are behind a corporate or university firewall (e.g. Fortiguard), add --no-check-certificate to the wget command.

To deactivate the environment:

deactivate

To reactivate later:

source TREECRIPT/bin/activate

Option 2: Conda environment

Use this if you already work with Anaconda or Miniconda, or if you prefer conda for managing environments across multiple projects. Conda handles both Python and system-level dependencies, which makes it particularly well suited for HPC or shared computing environments.

Installing Miniconda (if not already installed)

Miniconda is a minimal conda installer — it gives you the conda command and Python without bundling hundreds of extra packages like the full Anaconda distribution does.

# Download installer (add --no-check-certificate if behind a firewall)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

# Run the installer
bash miniconda.sh
# - Accept the license
# - Accept the default install location
# - Answer "yes" when asked to update your shell profile

# Apply changes to current session
source ~/.bashrc

# Verify
conda --version

Network issues? If repo.anaconda.com is blocked by your network, use Miniforge instead — it is functionally identical to Miniconda but downloads from GitHub and defaults to the conda-forge channel, which is actually a better fit for the scientific packages treecript needs:
wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh

Creating the treecript conda environment

# 1. Create a clean environment with Python 3.10
conda create -n treecript python=3.10 -y

# 2. Activate it
conda activate treecript

# 3. Download the constraints file (adjust filename for your Python version / OS)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt
# or for WSL Ubuntu 22.04:
# wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10_Ubuntu-22.04-wsl.txt

# 4. Install treecript and all dependencies in one shot
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec

To deactivate:

conda deactivate

To remove the environment entirely:

conda deactivate
conda remove -n treecript --all -y

Verifying the installation

Run this after either installation method to confirm all dependencies are working correctly:

python -c "
import psutil; print('psutil OK:', psutil.__version__)
import docker; print('docker OK:', docker.__version__)
import pandas; print('pandas OK:', pandas.__version__)
import networkx; print('networkx OK:', networkx.__version__)
import matplotlib; print('matplotlib OK:', matplotlib.__version__)
import adjustText; print('adjustText OK:', adjustText.__version__)
import treecript; print('treecript OK')
"

Quick Start

# 1. Collect metrics for a command
execution-metrics-collector.py ~/my_metrics my_command --arg1 --arg2

# 2. Plot time series charts
plotGraph.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_charts/

# 3. Find your CPU's TDP
tdp-finder.py ~/my_metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv

# 4. Aggregate and estimate energy consumption
metrics-aggregator.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_agg/ 28.0

Programs Reference

Collecting metrics

execution-metrics-collector.py runs a command and monitors it and all its child processes:

execution-metrics-collector.py {base_metrics_directory} {command} {args...}

Internally this launches the command, captures its PID, and calls process-metrics-collector.py with a sampling period of 1 second:

process-metrics-collector.py {pid} {base_metrics_directory} {sample_period}

Example:

execution-metrics-collector.py ~/metrics python myscript.py --input data.txt

Plotting time series charts

plotGraph.py generates line charts for each monitored process, comparing time series of CPU, memory, I/O and other metrics:

plotGraph.py {metrics_directory} {output_directory}

Example:

plotGraph.py ~/metrics/2025_01_01-00_00-12345/ ~/charts/

Finding CPU TDP

Three programs are available depending on what information you have:

`tdp-finder.py` — from a metrics directory

tdp-finder.py {metrics_directory} {csv_files...}

Example:

tdp-finder.py ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Use -q for quiet mode (outputs only the TDP value, useful for scripting):

tdp-finder.py -q ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv
# Output: 28.0

`cpuinfo-tdp-finder.py` — from `/proc/cpuinfo`

Does not require a metrics directory:

cpuinfo-tdp-finder.py /proc/cpuinfo cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Or from a saved copy of /proc/cpuinfo — the sample_cpuinfo/ directory contains example files for Intel and AMD processors you can use for testing:

cpuinfo-tdp-finder.py sample_cpuinfo/cpuinfo-amd.txt cpu-spec-dataset_Josua/dataset/*.csv

`modelname-tdp-finder.py` — from a processor model string

modelname-tdp-finder.py "11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz" cpu-spec-dataset_Josua/dataset/*.csv
modelname-tdp-finder.py "AMD EPYC 7742 64-Core Processor" cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Digesting metrics

metrics-aggregator.py digests the collected time series and estimates energy consumption per process subtree. It requires the CPU TDP value in Watts.

metrics-aggregator.py {metrics_directory} {output_directory} {TDP_watts} [command_filter]

The optional command_filter argument filters results to show only processes whose command matches the string (e.g. "docker run" to focus on Docker steps).

Example using the included sample series:

metrics-aggregator.py sample-series/Wetlab2Variations_metrics/2025_05_20-02_19-14001/ dest_directory 28.0 "docker run"

The output directory will contain:

A table of energy consumption per task (stdout)
graph.pdf / graph.svg — process call graph as a tree
spiral-graph.pdf / spiral-graph.svg — process call graph as a spiral
consumptions.pdf / consumptions.svg — barplot of task energy and duration
timeline.pdf / timeline.svg — lollipop chart of task start, duration, and end

CPU Dataset Setup

The TDP programs require one or more CPU specification datasets to look up processor TDP values. Three sources are supported:

Recommended — JosuaCarl fork (better column names):

git clone https://github.com/JosuaCarl/cpu-spec-dataset cpu-spec-dataset_Josua

Alternative — original felixsteinke repo:

git clone https://github.com/felixsteinke/cpu-spec-dataset

CPUBenchmark scrape (good coverage for AMD server CPUs):

python -m treecript.tdp_sources cpumark_table.csv

You can pass multiple sources to the TDP programs and they will be tried in order:

tdp-finder.py ~/metrics/dir/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Output Files Reference

Each execution-metrics-collector.py run creates a subdirectory named after the start timestamp and PID. It contains:

File	Description
`reference_pid.txt`	PID of the root process being monitored
`sampling-rate-seconds.txt`	Sampling rate in seconds (usually 1)
`pids.txt`	Table of all spawned processes with timestamps and parent PIDs
`agg_metrics.tsv`	Time series of aggregated metrics across all processes
`metrics-{pid}_{create_time}.csv`	Per-process time series metrics
`command-{pid}_{create_time}.txt`	Linearized command line for each process
`command-{pid}_{create_time}.json`	JSON representation of the command line
`cpu_details.json`	Physical CPU information from `/proc/cpuinfo`
`core_affinity.json`	Processor-to-core-to-CPU mapping derived from `/proc/cpuinfo`

Per-process metrics (`metrics-{pid}_{create_time}.csv`)

Column	Description
`Time`	Sample timestamp
`PID`	Process ID
`Virt`	Virtual memory size (matches `top` VIRT)
`Res`	Resident set size — non-swapped physical memory (matches `top` RES)
`CPU`	CPU utilization as a percentage (can exceed 100% for multithreaded processes)
`Memory`	RSS memory as a percentage of total physical system memory
`TCP connections`	Number of open TCP connections
`Thread Count`	Number of threads (non-cumulative)
`User`	Time spent in user mode (seconds)
`System`	Time spent in kernel mode (seconds)
`Children_User`	User time of child processes (always 0 on Windows/macOS)
`Children_System`	System time of child processes (always 0 on Windows/macOS)
`IO`	Time waiting for blocking I/O (Linux only)
`uss`	Unique Set Size — memory freed if this process terminated now
`swap`	Memory swapped out to disk
`processor_num`	Number of unique CPU processors used
`core_num`	Number of unique CPU cores used
`cpu_num`	Number of unique physical CPUs used
`processor_ids`	IDs of CPU processors used (space-separated)
`core_ids`	IDs of CPU cores used (space-separated)
`cpu_ids`	IDs of physical CPUs used (space-separated)
`process_status`	Process status string (e.g. `sleeping`, `running`)
`read_count`	Cumulative number of read syscalls
`write_count`	Cumulative number of write syscalls
`read_bytes`	Bytes physically read from disk (cumulative)
`write_bytes`	Bytes physically written to disk (cumulative)
`read_chars`	Bytes passed to read syscalls (cumulative, Linux only)
`write_chars`	Bytes passed to write syscalls (cumulative, Linux only)

Aggregated metrics (`agg_metrics.tsv`)

Each row is a 1-second sample across all monitored processes combined:

Column	Description
Timestamp	Sample time
Number of PIDs	Processes monitored at that moment
Threads	Total thread count
Processors	Number of distinct CPU processors in use
Cores	Number of distinct CPU cores in use
Physical CPUs	Number of distinct physical CPUs in use
CPU IDs	IDs of physical CPUs (space-separated)
User memory	Total user memory across all processes
Swap memory	Total swap memory across all processes
Read ops	Total read operations
Write ops	Total write operations
Read bytes	Bytes physically read
Write bytes	Bytes physically written
Read chars	Bytes passed to read syscalls
Write chars	Bytes passed to write syscalls

Legacy

The legacy/ directory contains older Bash-based scripts that predate the current Python implementation. They are kept for historical reference but are no longer maintained or recommended.

`execution-metrics-collector.sh`

The original Bash wrapper for launching a command and monitoring it. It runs the command in the background, captures the PID, and calls process-metrics-collector.py directly:

./legacy/execution-metrics-collector.sh {base_metrics_directory} {command} {args...}

This has been superseded by execution-metrics-collector.py, which provides the same functionality in a more portable and maintainable way. The sample series included in this repository was originally collected using this script:

~/projects/treecript/legacy/execution-metrics-collector.sh \
  ~/projects/treecript/Wetlab2Variations_metrics \
  python WfExS-backend.py -L workflow_examples/local_config.yaml \
  staged-workdir offline-exec 01a1db90-1508-4bad-beb7-7f7989838542

`plotGraph.sh`

The original gnuplot-based visualization script. It reads the collected CSV files and generates .pdf charts using gnuplot (requires apt install gnuplot). It has been superseded by plotGraph.py, which generates richer charts without requiring gnuplot.

./legacy/plotGraph.sh {metrics_csv_files...}

`plot-metrics.sh`

An earlier helper script for plotting individual metric files. Also superseded by plotGraph.py.

These scripts are no longer actively maintained. For all new usage, prefer the Python equivalents.

License

Licensed under GNU GPL v3.

This repository is a fork and evolution of chamilad/process-metrics-collector.

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github		.github
data		data
installation		installation
legacy		legacy
onboarding		onboarding
sample-charts		sample-charts
sample-series/Wetlab2Variations_metrics/2025_05_20-02_19-14001		sample-series/Wetlab2Variations_metrics/2025_05_20-02_19-14001
sample-work-to-measure		sample-work-to-measure
sample/27819-2018_05_03_1521		sample/27819-2018_05_03_1521
sample_cpuinfo		sample_cpuinfo
tests		tests
treecript		treecript
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
cpuinfo-tdp-finder.py		cpuinfo-tdp-finder.py
execution-metrics-collector.py		execution-metrics-collector.py
metrics-aggregator.py		metrics-aggregator.py
modelname-tdp-finder.py		modelname-tdp-finder.py
plotGraph.py		plotGraph.py
process-metrics-collector.py		process-metrics-collector.py
pyproject.toml		pyproject.toml
setup.py		setup.py
tdp-finder.py		tdp-finder.py

Folders and files

Latest commit

History

Repository files navigation

treecript: Process Tree Metrics Transcriptor

Table of Contents

Repository Structure

Installation

Prerequisites

Not sure which installation method to use?

Choosing a constraints file

Option 1: pip + virtual environment (venv)

Option 2: Conda environment

Installing Miniconda (if not already installed)

Creating the treecript conda environment

Verifying the installation

Quick Start

Programs Reference

Collecting metrics

Plotting time series charts

Finding CPU TDP

tdp-finder.py — from a metrics directory

cpuinfo-tdp-finder.py — from /proc/cpuinfo

modelname-tdp-finder.py — from a processor model string

Digesting metrics

CPU Dataset Setup

Output Files Reference

Per-process metrics (metrics-{pid}_{create_time}.csv)

Aggregated metrics (agg_metrics.tsv)

Legacy

execution-metrics-collector.sh

plotGraph.sh

plot-metrics.sh

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`treecript`: Process Tree Metrics Transcriptor

`tdp-finder.py` — from a metrics directory

`cpuinfo-tdp-finder.py` — from `/proc/cpuinfo`

`modelname-tdp-finder.py` — from a processor model string

Per-process metrics (`metrics-{pid}_{create_time}.csv`)

Aggregated metrics (`agg_metrics.tsv`)

`execution-metrics-collector.sh`

`plotGraph.sh`

`plot-metrics.sh`

Packages