perf: replace regex findall with str.count in parser advance() by matheusvir · Pull Request #629 · theskumar/python-dotenv

matheusvir · 2026-03-12T01:31:50Z

What was done

Replaced re.findall with str.count() in the advance() method of the parser to eliminate unnecessary list allocations.

Previously, for each token read, re.findall created a Python list with all newline matches found, only to get its length.
With str.count(), counting is done directly in C, without allocating any intermediate data structure.

No new test files were added. The advance() method is exercised on every parse call, so the entire tests/test_parser.py suite (parametrized over ~30 input cases) provides coverage. All existing tests pass with no regressions.

Performance

All benchmarks were executed inside Docker containers to isolate the runtime environment and eliminate host-specific variance from CPU scheduling, OS caching, and library versions.

Methodology

Input: a .env file with 24,999 variables, parsed in full on each run.
Baseline: 39 valid runs after outlier filtering.
Optimized: 32 valid runs after outlier filtering.
Timing: time.perf_counter_ns() with GC disabled during measurement.

Rationale

The advance() method is called for every token during parsing. In the original implementation, each call to re.findall(r'\n', ...) creates a Python list object, populates it with match objects, and then discards it — all to count newlines. str.count('\n') performs the same counting entirely in C with no heap allocation, making it strictly cheaper per call and highly effective at scale.

Results

Variant	Mean (ms)	Std dev (ms)	Runs
Baseline	11,403.40	1,227.85	39
Optimized	8,528.51	132.97	32
Improvement			25.21%

Analysis

The change reduces mean parse time by 25.21% and is statistically confirmed. More notable is the 9x reduction in standard deviation (from 1,227 ms to 132 ms), which indicates that the original allocation pressure was also responsible for inconsistent GC pauses and timing spikes.

The change is minimal in scope — a single-line replacement — with no behavioral difference and no impact on correctness.

Reproducing the benchmark

The full benchmark infrastructure is available in the research repository at matheusvir/eda-oss-performance.

Relevant files:

Dockerfile: setup/python-dotenv/Dockerfile
Baseline script: experiments/python-dotenv/str_count_parser_test/baseline_pythondotenv_str-count-newline-advance.py
Experiment script: experiments/python-dotenv/str_count_parser_test/experiment_pythondotenv_str-count-newline-advance.py
Result merger: experiments/python-dotenv/str_count_parser_test/merge_results.py
Runner script: experiments/python-dotenv/str_count_parser_test/run.sh

To run inside Docker:

# From the root of eda-oss-performance
docker build -t dotenv-perf ./setup/python-dotenv/

# Run baseline
docker run --rm -e EXPERIMENT=str_count_parser_test -e VARIANT=baseline dotenv-perf

# Run optimized
docker run --rm -e EXPERIMENT=str_count_parser_test -e VARIANT=optimized dotenv-perf

Results are written to results/python-dotenv/result_python-dotenv_str-count-newline-advance.json.

This is a targeted, low-risk improvement with measurable impact on large .env file parsing.

Relates to #504.

Co-authored-by: Matheus Virgolino <matheus.virgolino.abilio.da.silva@ccc.ufcg.edu.br> Co-authored-by: Manoel Netto <manoel.da.nobrega.eustaqueo.netto@ccc.ufcg.edu.br> Co-authored-by: Pedro <pedroalmeida1896@gmail.com> Co-authored-by: Lucaslg7 <lucasmoizinholg7@gmail.com> Co-authored-by: RailtonDantas <railtondantas.code@gmail.com> Co-authored-by: João Pereira <joao.pereira.de.oliveira@ccc.ufcg.edu.br>

matheusvir force-pushed the optimization/str-count-newline-advance branch from 306c140 to 2c29354 Compare March 12, 2026 01:38

matheusvir changed the title ~~perf(python-dotenv): replace regex findall with str.count in parser advance()~~ perf: replace regex findall with str.count in parser advance() Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace regex findall with str.count in parser advance()#629

perf: replace regex findall with str.count in parser advance()#629
matheusvir wants to merge 1 commit intotheskumar:mainfrom
matheusvir:optimization/str-count-newline-advance

matheusvir commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matheusvir commented Mar 12, 2026

What was done

Performance

Methodology

Rationale

Results

Analysis

Reproducing the benchmark

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants