Skip to content

Failover bootstrap can report empty checkpointTs after MaxUint64 checkpoint recovery #4845

@3AceShowHand

Description

@3AceShowHand

What did you do?

Observed a CI failure in pull_cdc_mysql_integration_heavy:

I checked the archived test artifacts under tmp/tidb_cdc_test/fail_over_ddl_mix_with_syncpoint and traced the failure path through the logs and the related recovery/bootstrap code.

What did you expect to see?

After failover, the changefeed should resume from a valid checkpoint/start ts, bootstrap successfully, continue replicating downstream tables, and pass the final sync_diff check.

What did you see instead?

sync_diff fails at the end because the downstream is missing test.table_4, but that is only the final symptom. The earlier failure happens during changefeed failover/bootstrap:

  • the changefeed is reloaded with checkpointTs = 18446744073709551615 (MaxUint64)
  • that value is used as the table trigger dispatcher start ts and is narrowed to a signed int64, which becomes -1
  • at the same time, tidb_cdc.ddl_ts_v1 is missing on the downstream, so DDL-ts recovery falls back to 0
  • recovery then computes realStartTs = 0, creates the table trigger dispatcher with startTs=0, and sends a bootstrap response with checkpointTs=0
  • maintainer rejects the bootstrap with failed to init table trigger dispatcher: all bootstrap responses reported empty checkpointTs
  • replication stops, and sync_diff later reports missing downstream tables

Relevant log snippets:

[changefeed.go:93] ["changefeed instance created"] [id=default/test] [checkpointTs=18446744073709551615] [state=normal]
[mysql_writer_for_ddl_ts.go:356] ["ddl ts table is not found"] [error="Error 1146 (42S02): Table 'tidb_cdc.ddl_ts_v1' doesn't exist"]
[dispatcher_manager.go:408] ["get table recovery info for dispatchers"] [receiveStartTs="[-1]"] [realStartTs="[0]"]
[maintainer_controller_bootstrap.go:158] ["handle bootstrap response"] [checkpointTs=0]
[maintainer_controller_bootstrap.go:93] ["can not determine the startTs from the bootstrap response"] [error="[CDC:ErrChangefeedInitTableTriggerDispatcherFailed]failed to init table trigger dispatcher: all bootstrap responses reported empty checkpointTs"]
[main.go:110] ["failed to initialize diff process"] [error="from upstream: please make sure the filter is correct.: the target has no table to be compared. source-table is ``test`.`table_4``"]

This looks like a product bug in the recovery/bootstrap path rather than a branch-specific regression. The implicated code paths are in maintainer, downstreamadapter/dispatchermanager, and MySQL DDL-ts recovery.

Possible root cause:

checkpointTs = MaxUint64 is persisted or reused across failover recovery, then converted into -1 in the dispatcher path. When tidb_cdc.ddl_ts_v1 is absent, recovery falls back to 0, which makes the bootstrap checkpoint empty and causes the maintainer to fail initialization.

Versions of the cluster

Upstream TiDB cluster version (from the archived sync_diff log):

9.0.0-beta.2.pre-1579-ga5545f5815

Upstream TiKV version:

unknown from the archived CI artifacts

TiCDC version (best-effort from the archived changefeed metadata in the CI logs):

v8.5.4-nextgen.202510.5-184-ga509ee02

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe issue is confirmed as a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions