Skip to content

Fix flaky autests for timeout, sigusr2, and thread_config#13012

Merged
bryancall merged 5 commits intoapache:masterfrom
bryancall:fix/flaky-autests
Mar 25, 2026
Merged

Fix flaky autests for timeout, sigusr2, and thread_config#13012
bryancall merged 5 commits intoapache:masterfrom
bryancall:fix/flaky-autests

Conversation

@bryancall
Copy link
Contributor

@bryancall bryancall commented Mar 23, 2026

Problem

The tls_conn_timeout and thread_config autests fail intermittently under parallel ASAN runs. The ssl-delay-server helper dies when a client disconnects during the handshake delay (SIGPIPE) or when accept() is interrupted (EINTR), failing the StillRunningAfter check. The thread_config test can't find the correct traffic_server process under ASAN because the process CWD differs from the expected ts_path.

Changes

  • Handle SIGPIPE in ssl-delay-server -- ignore SIGPIPE to prevent the helper from dying when a client disconnects during the TLS handshake delay.
  • Retry accept() on EINTR -- under heavy parallel load, accept() can return EINTR; retry instead of treating it as a fatal error.
  • Fix accept() error check -- use < 0 instead of <= 0 since fd 0 is a valid descriptor when stdin is closed.
  • Add cmdline matching for ASAN in check_threads.py -- fall back to matching ts_path in process command line arguments when the CWD doesn't match, which happens under ASAN.

Testing

  • Run AuTests in ASAN configuration
  • No production code paths changed (test-only PR)
  • All 15 CI platforms green

The ssl-delay-server test helper could die unexpectedly when a
client disconnects during the handshake delay. SIGPIPE from the
broken connection kills the process, or accept() returns EINTR
under heavy parallel load. Add SIGPIPE ignore and EINTR retry to
keep the server alive for the StillRunningAfter check.
Test 1's Default process had Ready = When.FileExists(diags.log),
but by the time Default starts, rotate_diags_log has already moved
diags.log to diags.log_old. This creates a deadlock: Default waits
for diags.log to exist, but only SIGUSR2 (sent by Default) would
cause TS to recreate it. The StartBefore chain already guarantees
correct ordering (ts → rotate → Default), so the Ready condition
is unnecessary and harmful.
Under ASAN, the ATS process CWD may differ from the
expected ts_path. Fall back to matching ts_path in the
process command line arguments so the test can find the
correct traffic_server process.
@bryancall bryancall self-assigned this Mar 23, 2026
@bryancall bryancall added AuTest Tests ASan Address Sanitizer labels Mar 23, 2026
@bryancall bryancall added this to the 11.0.0 milestone Mar 23, 2026
@bryancall bryancall requested a review from Copilot March 23, 2026 17:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Test-only PR to stabilize three flaky AuTests under parallel ASAN runs by hardening helper behavior and improving AuTest process sequencing / matching.

Changes:

  • Update ssl-delay-server helper to ignore SIGPIPE and retry accept() on EINTR.
  • Make thread_config’s thread-count helper identify the correct traffic_server process more reliably under ASAN by matching via CWD or command line.
  • Adjust sigusr2 test process ordering to remove a deadlocking Ready condition and clarify the intended startup chain.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tests/gold_tests/timeout/ssl-delay-server.cc Improves helper robustness against SIGPIPE and EINTR during accept.
tests/gold_tests/thread_config/check_threads.py Improves ATS process identification under ASAN by broadening the matching criteria.
tests/gold_tests/logging/sigusr2.test.py Removes deadlocking Ready gating and documents intended process ordering for SIGUSR2 log rotation.

@bryancall bryancall requested a review from bneradt March 23, 2026 22:39
accept() returns -1 on error but fd 0 is a valid descriptor
(e.g. if stdin is closed). The <= 0 check would incorrectly
treat a valid connection as failure.
@bryancall bryancall requested a review from bneradt March 25, 2026 18:51
@bryancall bryancall merged commit ff31470 into apache:master Mar 25, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ASan Address Sanitizer AuTest Tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants