Skip to content
/ server Public

MDEV-33070 Thread pool starvation at oversubscribe#4804

Open
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-33070-thread-pool-starvation
Open

MDEV-33070 Thread pool starvation at oversubscribe#4804
varundeepsaini wants to merge 1 commit intoMariaDB:mainfrom
varundeepsaini:MDEV-33070-thread-pool-starvation

Conversation

@varundeepsaini
Copy link
Contributor

@varundeepsaini varundeepsaini commented Mar 14, 2026

Summary

  • allow a thread group to reach the oversubscribe threshold before treating it as oversubscribed
  • add an MTR regression test for MDEV-33070 covering queued work at the threshold

Testing

  • Added a regression test

Allow one more worker at the oversubscribe threshold so a group
only becomes oversubscribed after it exceeds the configured
limit.

Add a regression test that reproduces the starvation case in the
generic thread pool and verifies queued work still drains.

Signed-off-by: Varun Deep Saini <varun.23bcs10048@ms.sst.scaler.com>
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Mar 16, 2026
Copy link
Member

@gkodinov gkodinov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

--source include/have_pool_of_threads.inc

let $extra_port=`SELECT @@port + 1`;
let $restart_parameters=--extra-port=$extra_port;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you add this to the opt file and avoid the restart?

@@ -0,0 +1,112 @@
--source include/have_pool_of_threads.inc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we customarily start with a heading stating the MDEV.

SET @@global.thread_pool_max_threads=@_thread_pool_max_threads,
@@global.thread_pool_size=@_thread_pool_size,
@@global.thread_pool_oversubscribe=@_thread_pool_oversubscribe,
@@global.thread_pool_stall_limit=@_thread_pool_stall_limit;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we customarily end with a footer of the type "End of ... tests".

@_thread_pool_oversubscribe,
@_thread_pool_stall_limit;

SET @@global.thread_pool_max_threads=3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also add these to the .opt file.

@@ -0,0 +1 @@
--thread-handling=pool-of-threads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add --loose-thread-pool-mode=generic. So that is also tested on Windows (with non-default thread-pool-mode)

@@ -0,0 +1,112 @@
--source include/have_pool_of_threads.inc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test needs a good explanation.

To be honest, I don't quite understand, why we need thread_pool_max_threads=3, and 6 connections to prove that patch is working.

Would it work if we were more economic perhaps?
Say thread_pool_max_threads=2, oversubscribe=0, thread_pool_dedicated_listener=ON,
so we have one dedicated listener (so we can still enqueue new requests, into the queue), a single worker (making for 2 threads overall). And check that after killing a running query the queue is still drained by that single worker (i.e single worker won't switch to wait, because "too many threads" were active). Will this work and prove that the patch is correct? (I see that it is correct, but test case is not very obvious).

In any case, write the comments of how test case showing that the fix is working.

@@ -0,0 +1 @@
--thread-handling=pool-of-threads
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add --loose-thread-pool-mode=generic. We want that to also be tested on Windows (in its non-default thread-pool-mode).

Copy link
Member

@vaintroub vaintroub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on test case. The one-liner code fix is ok, but test needs some improvement (and explanation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

3 participants