Skip to content
/ server Public

MDEV-37949: Implement innodb_log_archive, innodb_log_recovery_start, innodb_log_recovery_target, …#4405

Open
dr-m wants to merge 17 commits into11.4from
MDEV-37949
Open

MDEV-37949: Implement innodb_log_archive, innodb_log_recovery_start, innodb_log_recovery_target, …#4405
dr-m wants to merge 17 commits into11.4from
MDEV-37949

Conversation

@dr-m
Copy link
Contributor

@dr-m dr-m commented Oct 28, 2025

  • The Jira issue number for this PR is: MDEV-37949

Description

MDEV-37949: Implement innodb_log_archive

The InnoDB write-ahead log file in the old innodb_log_archive=OFF format is named ib_logfile0, pre-allocated to innodb_log_file_size and written as a ring buffer. This is good for write performance and space management, but unsuitable for arbitrary point-in-time recovery or for facilitating efficient incremental backup.

innodb_log_archive=ON: A new format where InnoDB will create and preallocate files ib_%016x.log, instead of writing a circular file ib_logfile0. Each file will be pre-allocated to innodb_log_file_size (between 4M and 4G; we impose a stricter upper limit of 4 GiB for innodb_log_archive=ON). Once a log fills up, we will create and pre-allocate another log file, to which log records will be written. Upon the completion of the first log checkpoint in a recently created log file, the old log file will be marked read-only, signaling that there will be no further writes to that file, and that the file may safely be moved to long-term storage.

The file name includes the log sequence number (LSN) at file offset 12288 (log_t::START_OFFSET). Limiting the file size to 4 GiB allows us to identify each checkpoint by storing a 32-bit big-endian offset into the optional FILE_MODIFY and the mandatory FILE_CHECKPOINT records, between 12288 and the end of the file.

The innodb_encrypt_log format is identified by storing the encryption information at the start of the log file. The first 32-bit value will be 1, which is an invalid checkpoint offset. Each innodb_log_archive=ON log must use the same encryption parameters. Changing innodb_encrypt_log or related parameters is only possible by setting innodb_log_archive=OFF and restarting the server, which will permanently lose the history of the archived log.

The maximum number of log checkpoints that the innodb_log_archive=ON file header can represent is limited to 12288/4=3072 when using innodb_encrypt_log=OFF. If we run out of slots in a log file, each subsequently completed checkpoint in that log file will overwrite the last slot in the checkpoint header, until we switch to the next log.

innodb_log_recovery_start: The checkpoint LSN to start recovery from. This will be useful when recovering from an archived log. This is useful for restoring an incremental backup (applying InnoDB log files that were copied since the previous restore).

innodb_log_recovery_target: The requested LSN to end recovery at. When this is set, all persistent InnoDB tables will be read-only, and no writes to the log are allowed. The intended purpose of this setting is to prepare an incremental backup, as well as to allow data retrieval as of a particular logical point of time.

Setting innodb_log_recovery_target>0 is much like setting innodb_read_only=ON, with the exception that the data files may be written to by crash recovery, and locking reads will conflict with any incomplete transactions as necessary, and all transaction isolation levels will work normally (not hard-wired to READ UNCOMMITTED).

The status variable innodb_lsn_archived will reflect the LSN since when a complete InnoDB log archive is available. Its initial value will be that of the new parameter innodb_log_archive_start. If that variable is 0 (the default), the innodb_lsn_archived will be recovered from the available log files. If innodb_log_archive=OFF, innodb_lsn_archived will be adjusted to the latest checkpoint every time a log checkpoint is executed. If innodb_log_archive=ON, the value
should not change.

Statements like

SET GLOBAL innodb_log_archive=!@@GLOBAL.innodb_log_archive;

will take effect as soon as possible, possibly after a log checkpoint has been completed. The log file will be renamed between ib_logfile0 and ib_%016x.log as appropriate.

When innodb_log_archive=ON, the setting SET GLOBAL innodb_log_file_size will affect subsequently created log files when the file that is being currently written is running out. If we are switching log files exactly at the same time, then a somewhat misleading error message "innodb_log_file_size change is already in progress" will be issued.

When innodb_read_only=ON, innodb_log_recovery_target will be set to the current LSN. This ensures that it suffices to check only one of these variables when blocking writes to persistent tables.

See the commit message for a more detailed description of the changed data structures and functions.

Release Notes

See the previous section.

How can this PR be tested?

mysql-test/mtr --parallel=auto --force --big-test --mysqld=--loose-innodb-log-archive --skip-test=mariabackup
mysql-test/mtr --parallel=auto --force --big-test --mysqld=--loose-innodb-log-archive --mysqld=--loose-innodb-log-recovery-start=12288 --skip-test=mariabackup
mysql-test/mtr --parallel=auto --force --big-test --mysqld=--loose-innodb-log-archive --mysqld=--loose-innodb-log-recovery-start=12288 --mysqld=--loose-inodb-log-file-mmap=OFF --skip-test=mariabackup
mysql-test/mtr --parallel=auto --force --big-test --mysqld=--loose-innodb-log-archive --mysqld=--loose-innodb-log-recovery-start=12288 --mysqld=--loose-inodb-log-file-mmap=ON --skip-test=mariabackup

The mariabackup suite must be skipped when setting innodb_log_archive=ON, because the mariadb-backup tool will only support the old innodb_log_archive=OFF format (copying from ib_logfile0).

Unfortunately, all --suite=encryption tests that use innodb_encrypt_log must be skipped when using innodb_log_archive. This is because the server would have to be reinitialized; we do not allow changing the format of an archived log on startup (such as adding or removing encryption). This combination is covered by the test innodb.log_file_size_online,encrypted.

no_checkpoint_prepare.inc: A new file, to prepare for subsequent inclusion of no_checkpoint_end.inc. We will invoke the server to parse the log and to determine the latest checkpoint.

A number of tests that would fail when the parameter innodb_log_recovery_start=12288 is present, which is forcing
recovery to start from the beginning of the history (the database creation), have been adjusted with
explicit --innodb-log-recovery-start=0 to override that:

  1. Some injected corruption may be "healed" by replaying the log from the beginning. Some tests expect an empty buffer pool after a restart, with no page I/O due to crash recovery.
  2. Any test that sets innodb_read_only=ON would fail with an error message that the setting prevents crash recovery, unless innodb_log_recovery_start=0.
  3. Any test that changes innodb_undo_tablespaces would fail in crash recovery, because crash recovery assumes that the undo tablespace ID that is available from the undo* files corresponds with the start of the log. This is an unforunate design bug which we cannot fix easily.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

This is a new feature, but for now based on the 11.4 branch so that any unrelated errors that may be found during testing can be fixed rather quickly. Merges to the main branch may be blocked for weeks at a time.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@dr-m dr-m self-assigned this Oct 28, 2025
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@dr-m dr-m force-pushed the MDEV-37949 branch 4 times, most recently from 4a9a384 to 9d14e2c Compare November 12, 2025 14:35
@dr-m dr-m force-pushed the MDEV-37949 branch 2 times, most recently from 1ab3880 to 254bfaa Compare January 8, 2026 15:13
@dr-m dr-m force-pushed the MDEV-37949 branch 2 times, most recently from b9728c2 to 091c2b3 Compare February 5, 2026 05:28
@dr-m dr-m changed the title MDEV-37949: Implement innodb_log_archive_file_size, innodb_log_archive_path, … MDEV-37949: Implement innodb_log_archive, innodb_log_recovery_start, innodb_log_recovery_target, … Feb 12, 2026
@dr-m dr-m force-pushed the MDEV-37949 branch 3 times, most recently from 1112487 to 998c03a Compare February 13, 2026 13:54
@dr-m dr-m force-pushed the MDEV-37949 branch 7 times, most recently from 4d2e8ad to 99959b5 Compare February 19, 2026 13:47
@dr-m dr-m marked this pull request as ready for review February 19, 2026 13:48
log_t::checkpoint_margin(): Properly wait for log file rotation
@dr-m dr-m requested a review from Thirunarayanan March 12, 2026 13:06
Skip log_checkpoint_low() if innodb_log_recovery_target is set
dr-m added 2 commits March 13, 2026 11:12
log_t::write_checkpoint(): Properly resume set_archive().
innodb_log_recovery_target fixes
dr-m added 2 commits March 13, 2026 13:28
Allow the last checkpoint slot to be overwritten after recovery
buf_flush_sync_for_checkpoint(): Prevent a shutdown hang
when innodb_log_recovery_target is set.
}
sql_print_error("[FATAL] InnoDB: Failed to create %s of %" PRIu64
" bytes", path.c_str(), resize_target);
abort();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check this crash :-

# 2026-03-15T18:38:27 [1183625] | #0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
# 2026-03-15T18:38:27 [1183625] | #1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
# 2026-03-15T18:38:27 [1183625] | #2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
# 2026-03-15T18:38:27 [1183625] | #3  0x0000721e7924526e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
# 2026-03-15T18:38:27 [1183625] | #4  0x0000721e792288ff in __GI_abort () at ./stdlib/abort.c:79
# 2026-03-15T18:38:27 [1183625] | #5  0x0000565800824e4f in log_t::archive_new_write (this=this@entry=0x56580204cd00 <log_sys>, buf=0x721e60c00200 "\003", buf@entry=0x721e60c00000 "\001\020\303", length=1536, length@entry=2048, offset=offset@entry=134217216) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1239
# 2026-03-15T18:38:27 [1183625] | #6  0x00005658008256f6 in log_t::write_buf<(log_t::resizing_and_latch)1> (this=0x56580204cd00 <log_sys>) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1672
# 2026-03-15T18:38:27 [1183625] | #7  log_writer () at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1819
# 2026-03-15T18:38:27 [1183625] | #8  0x00005658008227cb in log_write_up_to (lsn=lsn@entry=138401032, durable=durable@entry=true, callback=<optimized out>, callback@entry=0x721e786665f0) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1778
# 2026-03-15T18:38:27 [1183625] | #9  0x00005658009586ba in trx_flush_log_if_needed (lsn=138401032, trx=trx@entry=0x721e63202300) at /data/Server/MDEV-37949B/storage/innobase/trx/trx0trx.cc:1280
# 2026-03-15T18:38:27 [1183625] | #10 0x000056580095c381 in trx_t::commit_in_memory (mtr=0x721e78666680, this=0x721e63202300) at /data/Server/MDEV-37949B/storage/innobase/trx/trx0trx.cc:1486
# 2026-03-15T18:38:27 [1183625] | #11 trx_t::commit_persist (this=this@entry=0x721e63202300) at /data/Server/MDEV-37949B/storage/innobase/trx/trx0trx.cc:1601
# 2026-03-15T18:38:27 [1183625] | #12 0x000056580095c4bc in trx_t::commit (this=this@entry=0x721e63202300) at /data/Server/MDEV-37949B/storage/innobase/trx/trx0trx.cc:1610
# 2026-03-15T18:38:27 [1183625] | #13 0x00005658009fc06a in dict_stats_save (table=table@entry=0x6a1e44c4cb00, index_id=index_id@entry=0) at /data/Server/MDEV-37949B/storage/innobase/dict/dict0stats.cc:3046
# 2026-03-15T18:38:27 [1183625] | #14 0x000056580079ba0e in ha_innobase::statistics_init (this=this@entry=0x6a1e44cb0980, table=table@entry=0x6a1e44c4cb00, recalc=recalc@entry=true) at /data/Server/MDEV-37949B/storage/innobase/handler/ha_innodb.cc:5783
# 2026-03-15T18:38:27 [1183625] | #15 0x00005658007b7b00 in ha_innobase::reload_statistics (this=0x6a1e44cb0980) at /data/Server/MDEV-37949B/storage/innobase/handler/ha_innodb.cc:1541
# 2026-03-15T18:38:27 [1183625] | #16 0x000056580079bb13 in innodb_notify_tabledef_changed (handler=<optimized out>) at /data/Server/MDEV-37949B/storage/innobase/handler/ha_innodb.cc:1555
# 2026-03-15T18:38:27 [1183625] | #17 0x0000565800390d7a in notify_tabledef_changed (table_list=table_list@entry=0x6a1e4441b2e0) at /data/Server/MDEV-37949B/sql/sql_table.cc:7794
# 2026-03-15T18:38:27 [1183625] | #18 0x00005658003a4eec in mysql_inplace_alter_table (thd=thd@entry=0x6a1e44002c98, table_list=0x6a1e4441b2e0, table=table@entry=0x6a1e44c4c0b8, altered_table=altered_table@entry=0x721e786680e0, ha_alter_info=ha_alter_info@entry=0x721e78668030, target_mdl_request=target_mdl_request@entry=0x721e78668910, ddl_log_state=0x721e78667ff0, trigger_param=0x721e786684d0, alter_ctx=0x721e78669560, partial_alter=@0x721e78667f6e: false, start_alter_id=@0x721e78667f78: 0, if_exists=false) at /data/Server/MDEV-37949B/sql/sql_table.cc:8214
# 2026-03-15T18:38:27 [1183625] | #19 0x00005658003a8e82 in mysql_alter_table (thd=thd@entry=0x6a1e44002c98, new_db=new_db@entry=0x6a1e44007840, new_name=<optimized out>, create_info=create_info@entry=0x721e7866a400, table_list=<optimized out>, table_list@entry=0x6a1e4441b2e0, recreate_info=recreate_info@entry=0x721e7866a250, alter_info=<optimized out>, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>, if_exists=<optimized out>) at /data/Server/MDEV-37949B/sql/sql_table.cc:11621
# 2026-03-15T18:38:27 [1183625] | #20 0x0000565800419027 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x6a1e44002c98) at /data/Server/MDEV-37949B/sql/sql_alter.cc:701
# 2026-03-15T18:38:27 [1183625] | #21 0x00005658002e2735 in mysql_execute_command (thd=thd@entry=0x6a1e44002c98, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/MDEV-37949B/sql/sql_parse.cc:5916
# 2026-03-15T18:38:27 [1183625] | #22 0x00005658002e32c1 in mysql_parse (thd=thd@entry=0x6a1e44002c98, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x721e7866b400) at /data/Server/MDEV-37949B/sql/sql_parse.cc:7945
# 2026-03-15T18:38:27 [1183625] | #23 0x00005658002e49a4 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x6a1e44002c98, packet=packet@entry=0x6a1e4400c189 "ALTER TABLE test.extra_t2 DROP COLUMN col2 /* E_R checkDatabaseIntegrity QNO 7 CON_ID 8 */ ", packet_length=packet_length@entry=91, blocking=blocking@entry=true) at /data/Server/MDEV-37949B/sql/sql_parse.cc:1925
# 2026-03-15T18:38:27 [1183625] | #24 0x00005658002e5ef1 in do_command (thd=thd@entry=0x6a1e44002c98, blocking=blocking@entry=true) at /data/Server/MDEV-37949B/sql/sql_parse.cc:1433
# 2026-03-15T18:38:27 [1183625] | #25 0x0000565800411d05 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x565803fba5f8, put_in_cache=put_in_cache@entry=true) at /data/Server/MDEV-37949B/sql/sql_connect.cc:1497
# 2026-03-15T18:38:27 [1183625] | #26 0x0000565800411f11 in handle_one_connection (arg=0x565803fba5f8) at /data/Server/MDEV-37949B/sql/sql_connect.cc:1409
# 2026-03-15T18:38:27 [1183625] | #27 0x0000721e7929ca94 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:447
# 2026-03-15T18:38:27 [1183625] | #28 0x0000721e79329a34 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100

(rr) frame 5
#5  0x0000565800824e4f in log_t::archive_new_write (this=this@entry=0x56580204cd00 <log_sys>, buf=0x721e60c00200 "\003", buf@entry=0x721e60c00000 "\001", <incomplete sequence \303>, length=1536,
    length@entry=2048, offset=offset@entry=134217216) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1239
1239      abort();
(rr) info args
this = 0x56580204cd00 <log_sys>
buf = 0x721e60c00200 "\003"
length = 1536
offset = 134217216
(rr) info locals
__PRETTY_FUNCTION__ = "void log_t::archive_new_write(const byte*, size_t, lsn_t)"
path = "ib_00000000083fd000.log"
success = false
file = <optimized out>
(rr) x/64bx buf
0x721e60c00200: 0x03    0x00    0x3d    0x30    0x12    0x03    0x2e    0x38
0x721e60c00208: 0x03    0x02    0xa0    0x32    0x00    0x00    0x00    0x2e
0x721e60c00210: 0x34    0x03    0x2e    0x19    0x06    0xb3    0x2a    0x20
0x721e60c00218: 0x72    0x34    0x03    0x2e    0x43    0x02    0x34    0x03
0x721e60c00220: 0x2e    0x3f    0x03    0xa1    0x02    0xb2    0x13    0x56
0x721e60c00228: 0xc3    0x1e    0x04    0xff    0xd3    0x02    0x06    0x0b
0x721e60c00230: 0x34    0x03    0x2e    0x49    0x01    0xb7    0x00    0x00
0x721e60c00238: 0x00    0x00    0x2e    0x00    0x2c    0xd3    0x00    0x06
(rr) print length
$1 = 1536
(rr) print offset
$2 = 134217216
(rr) info registers
Missing register name
(rr) list
1234        os_file_close(file);
1235        IF_WIN(DeleteFile(path.c_str()), unlink(path.c_str()));
1236      }
1237      sql_print_error("[FATAL] InnoDB: Failed to create %s of %" PRIu64
1238                      " bytes", path.c_str(), resize_target);
1239      abort();
1240    }

RR trace is present on sdp :-
/data/results/1773602901/TBR-2397-MDEV-37949

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the first time you file this documented behaviour as a bug. Quoting the first comment in MDEV-37949:

What to do if space allocation fails

InnoDB recovery will read records from one or more log files. Only after the log-based recovery is completed, may InnoDB start writing log records, just like under normal operation.
It is safest to block any InnoDB writes when we fail to allocate space for a log file when innodb_log_archive=ON, and simplest to crash the server. This is very disruptive, but a situation that InnoDB would be guaranteed to recover from.
If we silently executed SET GLOBAL innodb_log_archive=OFF, that could ruin a "perfect streak" and prevent point-in-time recovery. I do not want to introduce a difficult-to-handle regression, such as the open bug MDEV-27803 which involves a failure to allocate space for the binlog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have errno=EEXIST rather than ENOSPC here. That is, a log file by that name ib_00000000083fd000.log already existed. The server that was killed during workload had actually created such a file but not written anything to it:

(rr) when
Completed event: 1000230
(rr) backtrace
Thread 5 (Thread 1186009.1186110 (mariadbd)):
#0  0x0000000070000002 in syscall_traced ()
#1  0x00007ff6db96d018 in _raw_syscall () at /home/ubuntu/rr/src/preload/raw_syscall.S:120
#2  0x00007ff6db966909 in traced_raw_syscall (call=0x7ff6d83fffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:379
#3  0x00007ff6db96a0bc in sys_futex (call=<optimized out>) at /home/ubuntu/rr/src/preload/syscallbuf.c:2085
#4  syscall_hook_internal (call=call@entry=0x7ff6d83fffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4203
#5  0x00007ff6db96cda4 in syscall_hook (call=0x7ff6d83fffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4387
#6  syscall_hook (call=0x7ff6d83fffa0) at /home/ubuntu/rr/src/preload/syscallbuf.c:4371
#7  0x00007ff6db966323 in _syscall_hook_trampoline () at /home/ubuntu/rr/src/preload/syscall_hook.S:308
#8  0x00007ff6db96638d in __morestack () at /home/ubuntu/rr/src/preload/syscall_hook.S:443
#9  0x00007ff6db9663a9 in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () at /home/ubuntu/rr/src/preload/syscall_hook.S:462
#10 0x00007ff6db11c689 in fallocate64 (fd=fd@entry=0x5d, mode=mode@entry=0x0, offset=offset@entry=0x0, len=len@entry=0x80000000) at ../sysdeps/unix/sysv/linux/fallocate64.c:27
#11 0x00005b2346705af4 in os_file_set_size (name=0x77f6b803baf0 "ib_00000000083fd000.log", file=file@entry=0x5d, size=0x80000000, is_sparse=is_sparse@entry=0x0) at /data/Server/MDEV-37949B/storage/innobase/os/os0file.cc:1557
#12 0x00005b23466bcdd5 in log_t::archive_new_write (this=this@entry=0x5b2347ee4d00 <log_sys>, buf=0x77f6be40a800 "\275", buf@entry=0x77f6be400000 "?\200\006\032\300\024\203G-0", length=0x5aa00, length@entry=0x65200, offset=offset@entry=0x7ff5800) at /usr/include/c++/13/bits/basic_string.h:222
#13 0x00005b23466bd6f6 in log_t::write_buf<(log_t::resizing_and_latch)1> (this=0x5b2347ee4d00 <log_sys>) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1672
#14 log_writer () at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1819
#15 0x00005b23466ba7cb in log_write_up_to (lsn=0x8457923, lsn@entry=0x8426f3c, durable=durable@entry=0x1, callback=<optimized out>, callback@entry=0x0) at /data/Server/MDEV-37949B/storage/innobase/log/log0log.cc:1778
#16 0x00005b23468564dc in buf_page_t::flush (this=this@entry=0x77f6c0009c00, space=space@entry=0x77f6b801fac0) at /data/Server/MDEV-37949B/storage/innobase/buf/buf0flu.cc:855
#17 0x00005b2346858de3 in buf_do_flush_list_batch (max_n=max_n@entry=0x60, lsn=lsn@entry=0xffffffffffffffff) at /data/Server/MDEV-37949B/storage/innobase/buf/buf0flu.cc:1538
#18 0x00005b234685917b in buf_flush_list_holding_mutex (max_n=0x60, lsn=lsn@entry=0xffffffffffffffff) at /data/Server/MDEV-37949B/storage/innobase/buf/buf0flu.cc:1595
#19 0x00005b234685a36a in buf_flush_page_cleaner () at /data/Server/MDEV-37949B/storage/innobase/buf/buf0flu.cc:2875

The simulated fallocate(2) system call is failing several times with EINTR, and os_file_set_size() is retrying, all the time until the SIGKILL arrives. It looks like the file size would have remained at 0 bytes. I confirmed that with the following command:

tar tvJf /data/results/1773602901/TBR-2397-MDEV-37949/archive.tar.xz|grep ib_00000000083fd000.log
-rw-rw---- saahil/dev         0 2026-03-15 18:37 ./1/fbackup/data/ib_00000000083fd000.log
-rw-rw---- saahil/dev         0 2026-03-15 18:37 ./1/data/ib_00000000083fd000.log

We must allow the last log file to be overwritten, in case the server had previously been killed before the file had been extended to a valid size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the proper way to recover from this is that when the last archive file is below the minimum size (4 MiB), on the first write that goes beyond the current archive file, we will extend the invalid file to the current innodb_log_file_size. I can reproduce this failure with the following test:

diff --git a/mysql-test/suite/innodb/t/log_archive.test b/mysql-test/suite/innodb/t/log_archive.test
index 3f1781e74ef..d36b8c30453 100644
--- a/mysql-test/suite/innodb/t/log_archive.test
+++ b/mysql-test/suite/innodb/t/log_archive.test
@@ -195,6 +195,9 @@ print FILE $FILE_MODIFY;
 close(FILE) or die "Unable to close $file\n";
 EOF
 
+--write_file $DATADIR/ib_ffffffffffc00000.log
+EOF
+
 --let $restart_parameters= --innodb-log-recovery-start=0 --innodb-log-buffer-size=5m
 --source include/start_mysqld.inc
 
2026-03-16 11:15:52 0 [Warning] InnoDB: ignoring .//ib_ffffffffffc00000.log
2026-03-16 11:15:52 0 [ERROR] [FATAL] InnoDB: Failed to create ib_ffffffffffc00000.log of 4194304 bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

84449df should cover this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that my test for this is crashing on FreeBSD. I will analyze and fix that tomorrow.


fil_flush_file_spaces();

if (UNIV_UNLIKELY(recv_sys.rpo != 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check this one at the start of function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the complete lifecycle of recv_sys.rpo? When is it set, when is it cleared, and what operations are forbidden in this mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RPO is an abbreviation of "recovery point objective". The related parameter is read-only:

static MYSQL_SYSVAR_UINT64_T(log_recovery_target, recv_sys.rpo,
  PLUGIN_VAR_RQCMDARG | PLUGIN_VAR_READONLY,
  "recovery point objective (end LSN; 0=unlimited)",
  nullptr, nullptr, 0, 0, std::numeric_limits<uint64_t>::max(), 0);

The purpose of this parameter is to make the server read-only, with the only exception that the log will be applied to the data files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

5 participants