Skip to content

inab/treecript

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

121 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

treecript: Process Tree Metrics Transcriptor

Originally named Execution Process Metrics Collector

A set of Python programs to monitor, collect, and digest metrics of a given Linux process or command line, and its descendants. Initially developed for ELIXIR STEERS.


Table of Contents


Repository Structure

treecript/
├── treecript/               # Core Python package — all program logic lives here
├── installation/            # Constraints and requirements files for reproducible installs
├── legacy/                  # Deprecated Bash scripts kept for historical reference
├── sample-series/           # Real metrics from a WfExS workflow execution, used in documentation examples
├── sample-charts/           # Pre-generated charts from the sample series, embedded in this README
├── sample-work-to-measure/  # Example scripts showing how to set up and run a measurement
├── sample_cpuinfo/          # Example /proc/cpuinfo files for testing the TDP finder programs
├── onboarding/              # Full worked example with metrics, charts and a step-by-step walkthrough
├── sample/                  # Legacy single-process sample from 2018 (pre-treecript era)
└── tests/                   # Unit tests
Directory Description
treecript/ Core Python package — aggregator, collector, parser, plotter, TDP finder
installation/ Per-version constraints files and requirements for reproducible installs
legacy/ Deprecated Bash scripts superseded by the Python programs
sample-series/ Real metrics collected from a WfExS workflow run, used throughout this README as examples
sample-charts/ Pre-generated chart outputs (SVG/PDF/PNG) from the sample series
sample-work-to-measure/ Ready-to-use scripts to download and run example workloads to measure
sample_cpuinfo/ Example /proc/cpuinfo files (Intel and AMD) for testing cpuinfo-tdp-finder.py and modelname-tdp-finder.py without needing a real machine
onboarding/ Self-contained worked example: a full metrics collection, chart generation and aggregation walkthrough for new users
sample/ Legacy single-process sample from 2018, predating the current process-tree approach
tests/ Unit tests for the core collector module

Installation

Prerequisites

  • Linux OS (Ubuntu recommended)
  • Python 3.9 or newer
  • Git

Not sure which installation method to use?

I want to... Use
Keep things simple and already have Python installed Option 1 — pip + venv
Already use conda or manage multiple projects/environments Option 2 — Conda
Work on an HPC or shared cluster environment (e.g. BSC) Option 2 — Conda
Work on a machine with a corporate or university firewall Either — both have firewall notes in their respective sections

Choosing a constraints file

The repository ships per-version constraints files under the installation/ directory to ensure a working set of dependencies. Pick the one that matches your setup:

Situation Constraints file to use
Native Linux, Python 3.9 installation/constraints-3.9.txt
Native Linux, Python 3.10 installation/constraints-3.10.txt
Native Linux, Python 3.11 installation/constraints-3.11.txt
Native Linux, Python 3.12 installation/constraints-3.12.txt
Ubuntu 22.04 on WSL (Windows) installation/constraints-3.10_Ubuntu-22.04-wsl.txt
Ubuntu 24.04 on WSL (Windows) installation/constraints-3.10_Ubuntu-24.04-wsl.txt

WSL = Windows Subsystem for Linux — Ubuntu running inside Windows rather than directly on hardware. If you are running Ubuntu natively on your machine, use the plain constraints file. To check:

uname -r  # if the output contains "microsoft" or "WSL", you are on WSL

To check your Python version:

python3 --version

Option 1: pip + virtual environment (venv)

Use this if you already have Python installed on your system and don't use conda. This is the lightest option — it creates an isolated Python environment using only tools that come built into Python, with no additional software required.

# 1. Create a virtual environment
python3 -m venv TREECRIPT

# 2. Activate it
source TREECRIPT/bin/activate

# 3. Upgrade pip and wheel
pip install --upgrade pip wheel

# 4. Download the constraints file for your Python version (adjust filename as needed)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt

# 5. Install treecript with constraints
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec

Network issues? If you are behind a corporate or university firewall (e.g. Fortiguard), add --no-check-certificate to the wget command.

To deactivate the environment:

deactivate

To reactivate later:

source TREECRIPT/bin/activate

Option 2: Conda environment

Use this if you already work with Anaconda or Miniconda, or if you prefer conda for managing environments across multiple projects. Conda handles both Python and system-level dependencies, which makes it particularly well suited for HPC or shared computing environments.

Installing Miniconda (if not already installed)

Miniconda is a minimal conda installer — it gives you the conda command and Python without bundling hundreds of extra packages like the full Anaconda distribution does.

# Download installer (add --no-check-certificate if behind a firewall)
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh

# Run the installer
bash miniconda.sh
# - Accept the license
# - Accept the default install location
# - Answer "yes" when asked to update your shell profile

# Apply changes to current session
source ~/.bashrc

# Verify
conda --version

Network issues? If repo.anaconda.com is blocked by your network, use Miniforge instead — it is functionally identical to Miniconda but downloads from GitHub and defaults to the conda-forge channel, which is actually a better fit for the scientific packages treecript needs:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh

Creating the treecript conda environment

# 1. Create a clean environment with Python 3.10
conda create -n treecript python=3.10 -y

# 2. Activate it
conda activate treecript

# 3. Download the constraints file (adjust filename for your Python version / OS)
wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10.txt
# or for WSL Ubuntu 22.04:
# wget https://raw.githubusercontent.com/inab/treecript/exec/installation/constraints-3.10_Ubuntu-22.04-wsl.txt

# 4. Install treecript and all dependencies in one shot
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec

To deactivate:

conda deactivate

To remove the environment entirely:

conda deactivate
conda remove -n treecript --all -y

Verifying the installation

Run this after either installation method to confirm all dependencies are working correctly:

python -c "
import psutil; print('psutil OK:', psutil.__version__)
import docker; print('docker OK:', docker.__version__)
import pandas; print('pandas OK:', pandas.__version__)
import networkx; print('networkx OK:', networkx.__version__)
import matplotlib; print('matplotlib OK:', matplotlib.__version__)
import adjustText; print('adjustText OK:', adjustText.__version__)
import treecript; print('treecript OK')
"

Quick Start

# 1. Collect metrics for a command
execution-metrics-collector.py ~/my_metrics my_command --arg1 --arg2

# 2. Plot time series charts
plotGraph.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_charts/

# 3. Find your CPU's TDP
tdp-finder.py ~/my_metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv

# 4. Aggregate and estimate energy consumption
metrics-aggregator.py ~/my_metrics/2025_01_01-00_00-12345/ ~/my_agg/ 28.0

Programs Reference

Collecting metrics

execution-metrics-collector.py runs a command and monitors it and all its child processes:

execution-metrics-collector.py {base_metrics_directory} {command} {args...}

Internally this launches the command, captures its PID, and calls process-metrics-collector.py with a sampling period of 1 second:

process-metrics-collector.py {pid} {base_metrics_directory} {sample_period}

Example:

execution-metrics-collector.py ~/metrics python myscript.py --input data.txt

Plotting time series charts

plotGraph.py generates line charts for each monitored process, comparing time series of CPU, memory, I/O and other metrics:

plotGraph.py {metrics_directory} {output_directory}

Example:

plotGraph.py ~/metrics/2025_01_01-00_00-12345/ ~/charts/

Finding CPU TDP

Three programs are available depending on what information you have:

tdp-finder.py — from a metrics directory

tdp-finder.py {metrics_directory} {csv_files...}

Example:

tdp-finder.py ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Use -q for quiet mode (outputs only the TDP value, useful for scripting):

tdp-finder.py -q ~/metrics/2025_01_01-00_00-12345/ cpu-spec-dataset_Josua/dataset/*.csv
# Output: 28.0

cpuinfo-tdp-finder.py — from /proc/cpuinfo

Does not require a metrics directory:

cpuinfo-tdp-finder.py /proc/cpuinfo cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Or from a saved copy of /proc/cpuinfo — the sample_cpuinfo/ directory contains example files for Intel and AMD processors you can use for testing:

cpuinfo-tdp-finder.py sample_cpuinfo/cpuinfo-amd.txt cpu-spec-dataset_Josua/dataset/*.csv

modelname-tdp-finder.py — from a processor model string

modelname-tdp-finder.py "11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz" cpu-spec-dataset_Josua/dataset/*.csv
modelname-tdp-finder.py "AMD EPYC 7742 64-Core Processor" cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Digesting metrics

metrics-aggregator.py digests the collected time series and estimates energy consumption per process subtree. It requires the CPU TDP value in Watts.

metrics-aggregator.py {metrics_directory} {output_directory} {TDP_watts} [command_filter]

The optional command_filter argument filters results to show only processes whose command matches the string (e.g. "docker run" to focus on Docker steps).

Example using the included sample series:

metrics-aggregator.py sample-series/Wetlab2Variations_metrics/2025_05_20-02_19-14001/ dest_directory 28.0 "docker run"

The output directory will contain:

  • A table of energy consumption per task (stdout)
  • graph.pdf / graph.svg — process call graph as a tree
  • spiral-graph.pdf / spiral-graph.svg — process call graph as a spiral
  • consumptions.pdf / consumptions.svg — barplot of task energy and duration
  • timeline.pdf / timeline.svg — lollipop chart of task start, duration, and end

Sample process call graph (tree) Sample process call graph (spiral) Sample task consumptions and duration barplots Sample task executions lollipop


CPU Dataset Setup

The TDP programs require one or more CPU specification datasets to look up processor TDP values. Three sources are supported:

Recommended — JosuaCarl fork (better column names):

git clone https://github.com/JosuaCarl/cpu-spec-dataset cpu-spec-dataset_Josua

Alternative — original felixsteinke repo:

git clone https://github.com/felixsteinke/cpu-spec-dataset

CPUBenchmark scrape (good coverage for AMD server CPUs):

python -m treecript.tdp_sources cpumark_table.csv

You can pass multiple sources to the TDP programs and they will be tried in order:

tdp-finder.py ~/metrics/dir/ cpu-spec-dataset_Josua/dataset/*.csv cpumark_table.csv

Output Files Reference

Each execution-metrics-collector.py run creates a subdirectory named after the start timestamp and PID. It contains:

File Description
reference_pid.txt PID of the root process being monitored
sampling-rate-seconds.txt Sampling rate in seconds (usually 1)
pids.txt Table of all spawned processes with timestamps and parent PIDs
agg_metrics.tsv Time series of aggregated metrics across all processes
metrics-{pid}_{create_time}.csv Per-process time series metrics
command-{pid}_{create_time}.txt Linearized command line for each process
command-{pid}_{create_time}.json JSON representation of the command line
cpu_details.json Physical CPU information from /proc/cpuinfo
core_affinity.json Processor-to-core-to-CPU mapping derived from /proc/cpuinfo

Per-process metrics (metrics-{pid}_{create_time}.csv)

Column Description
Time Sample timestamp
PID Process ID
Virt Virtual memory size (matches top VIRT)
Res Resident set size — non-swapped physical memory (matches top RES)
CPU CPU utilization as a percentage (can exceed 100% for multithreaded processes)
Memory RSS memory as a percentage of total physical system memory
TCP connections Number of open TCP connections
Thread Count Number of threads (non-cumulative)
User Time spent in user mode (seconds)
System Time spent in kernel mode (seconds)
Children_User User time of child processes (always 0 on Windows/macOS)
Children_System System time of child processes (always 0 on Windows/macOS)
IO Time waiting for blocking I/O (Linux only)
uss Unique Set Size — memory freed if this process terminated now
swap Memory swapped out to disk
processor_num Number of unique CPU processors used
core_num Number of unique CPU cores used
cpu_num Number of unique physical CPUs used
processor_ids IDs of CPU processors used (space-separated)
core_ids IDs of CPU cores used (space-separated)
cpu_ids IDs of physical CPUs used (space-separated)
process_status Process status string (e.g. sleeping, running)
read_count Cumulative number of read syscalls
write_count Cumulative number of write syscalls
read_bytes Bytes physically read from disk (cumulative)
write_bytes Bytes physically written to disk (cumulative)
read_chars Bytes passed to read syscalls (cumulative, Linux only)
write_chars Bytes passed to write syscalls (cumulative, Linux only)

Aggregated metrics (agg_metrics.tsv)

Each row is a 1-second sample across all monitored processes combined:

Column Description
Timestamp Sample time
Number of PIDs Processes monitored at that moment
Threads Total thread count
Processors Number of distinct CPU processors in use
Cores Number of distinct CPU cores in use
Physical CPUs Number of distinct physical CPUs in use
CPU IDs IDs of physical CPUs (space-separated)
User memory Total user memory across all processes
Swap memory Total swap memory across all processes
Read ops Total read operations
Write ops Total write operations
Read bytes Bytes physically read
Write bytes Bytes physically written
Read chars Bytes passed to read syscalls
Write chars Bytes passed to write syscalls

Legacy

The legacy/ directory contains older Bash-based scripts that predate the current Python implementation. They are kept for historical reference but are no longer maintained or recommended.

execution-metrics-collector.sh

The original Bash wrapper for launching a command and monitoring it. It runs the command in the background, captures the PID, and calls process-metrics-collector.py directly:

./legacy/execution-metrics-collector.sh {base_metrics_directory} {command} {args...}

This has been superseded by execution-metrics-collector.py, which provides the same functionality in a more portable and maintainable way. The sample series included in this repository was originally collected using this script:

~/projects/treecript/legacy/execution-metrics-collector.sh \
  ~/projects/treecript/Wetlab2Variations_metrics \
  python WfExS-backend.py -L workflow_examples/local_config.yaml \
  staged-workdir offline-exec 01a1db90-1508-4bad-beb7-7f7989838542

plotGraph.sh

The original gnuplot-based visualization script. It reads the collected CSV files and generates .pdf charts using gnuplot (requires apt install gnuplot). It has been superseded by plotGraph.py, which generates richer charts without requiring gnuplot.

./legacy/plotGraph.sh {metrics_csv_files...}

plot-metrics.sh

An earlier helper script for plotting individual metric files. Also superseded by plotGraph.py.

These scripts are no longer actively maintained. For all new usage, prefer the Python equivalents.


License

Licensed under GNU GPL v3.

This repository is a fork and evolution of chamilad/process-metrics-collector.

About

A set of python scripts to monitor, collect, and visualize metrics of a given Linux process or a give command line

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 81.4%
  • Shell 18.6%