Skip to content

[JUJU-4504] Implement timeout-based retry method for txns#69

Open
nvinuesa wants to merge 1 commit intojuju:masterfrom
nvinuesa:fix-txn-retries
Open

[JUJU-4504] Implement timeout-based retry method for txns#69
nvinuesa wants to merge 1 commit intojuju:masterfrom
nvinuesa:fix-txn-retries

Conversation

@nvinuesa
Copy link
Copy Markdown
Member

@nvinuesa nvinuesa commented Sep 1, 2023

This patch is part of the set of patches aiming to fix https://bugs.launchpad.net/juju/+bug/2031631.

The idea here is to implement a transaction retry mechanism similar to the one being used in the official mongodb driver, i.e. a 120 seconds timeout for transactions to finish or waiting while being retried.

The context passed to the mgo Run method is necessary in order to cancel the actual request that's being executed by mgo either after timeout, either because the user (juju) cancels the passed context.

Tests have been fixed and some have been deleted since they are no longer relevant (we don't retry for a fixed number of times, instead we retry until timeout).

For some more context, here is the high-level picture of how we handle the retry (from juju to mgo via txn) by @hpidcock :

1) timer starts in juju/txn
2) juju/txn passes cancellation context/chan to juju/mgo/sstxn runner.Run
3) juju/mgo retries until cancellation or it gets a non-writeConflict error
4) the juju/txn layer does its retry, instead of for i := 0; i < tr.nrRetries; i++ { it just goes until it is timed out.

Note: The original PR on juju is juju/juju#16159 and on juju/mgo is juju/mgo#27.

This patch is part of the set of patchs aiming to fix
https://bugs.launchpad.net/juju/+bug/2031631.

The idea here is to implement a transaction retry mechanism similar to
the one being used in the official mongodb driver, i.e. a 120 seconds
timeout for transactions to finish or waiting while being retried.

The context passed to the mgo Run method is necessary in order to cancel
the actual request that's being executed by mgo either after timeout,
either because the user (juju) cancels the passed context.

Tests have been fixed and some have been deleted since they are no
longer relevant (we don't retry for a fixed number of times, instead we
retry until timeout).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant