What did you do?
Observed a CI failure in pull_cdc_mysql_integration_heavy:
I checked the archived test artifacts under tmp/tidb_cdc_test/fail_over_ddl_mix_with_syncpoint and traced the failure path through the logs and the related recovery/bootstrap code.
What did you expect to see?
After failover, the changefeed should resume from a valid checkpoint/start ts, bootstrap successfully, continue replicating downstream tables, and pass the final sync_diff check.
What did you see instead?
sync_diff fails at the end because the downstream is missing test.table_4, but that is only the final symptom. The earlier failure happens during changefeed failover/bootstrap:
- the changefeed is reloaded with
checkpointTs = 18446744073709551615 (MaxUint64)
- that value is used as the table trigger dispatcher start ts and is narrowed to a signed
int64, which becomes -1
- at the same time,
tidb_cdc.ddl_ts_v1 is missing on the downstream, so DDL-ts recovery falls back to 0
- recovery then computes
realStartTs = 0, creates the table trigger dispatcher with startTs=0, and sends a bootstrap response with checkpointTs=0
- maintainer rejects the bootstrap with
failed to init table trigger dispatcher: all bootstrap responses reported empty checkpointTs
- replication stops, and
sync_diff later reports missing downstream tables
Relevant log snippets:
[changefeed.go:93] ["changefeed instance created"] [id=default/test] [checkpointTs=18446744073709551615] [state=normal]
[mysql_writer_for_ddl_ts.go:356] ["ddl ts table is not found"] [error="Error 1146 (42S02): Table 'tidb_cdc.ddl_ts_v1' doesn't exist"]
[dispatcher_manager.go:408] ["get table recovery info for dispatchers"] [receiveStartTs="[-1]"] [realStartTs="[0]"]
[maintainer_controller_bootstrap.go:158] ["handle bootstrap response"] [checkpointTs=0]
[maintainer_controller_bootstrap.go:93] ["can not determine the startTs from the bootstrap response"] [error="[CDC:ErrChangefeedInitTableTriggerDispatcherFailed]failed to init table trigger dispatcher: all bootstrap responses reported empty checkpointTs"]
[main.go:110] ["failed to initialize diff process"] [error="from upstream: please make sure the filter is correct.: the target has no table to be compared. source-table is ``test`.`table_4``"]
This looks like a product bug in the recovery/bootstrap path rather than a branch-specific regression. The implicated code paths are in maintainer, downstreamadapter/dispatchermanager, and MySQL DDL-ts recovery.
Possible root cause:
checkpointTs = MaxUint64 is persisted or reused across failover recovery, then converted into -1 in the dispatcher path. When tidb_cdc.ddl_ts_v1 is absent, recovery falls back to 0, which makes the bootstrap checkpoint empty and causes the maintainer to fail initialization.
Versions of the cluster
Upstream TiDB cluster version (from the archived sync_diff log):
9.0.0-beta.2.pre-1579-ga5545f5815
Upstream TiKV version:
unknown from the archived CI artifacts
TiCDC version (best-effort from the archived changefeed metadata in the CI logs):
v8.5.4-nextgen.202510.5-184-ga509ee02
What did you do?
Observed a CI failure in
pull_cdc_mysql_integration_heavy:tests/integration_tests/fail_over_ddl_mix_with_syncpointI checked the archived test artifacts under
tmp/tidb_cdc_test/fail_over_ddl_mix_with_syncpointand traced the failure path through the logs and the related recovery/bootstrap code.What did you expect to see?
After failover, the changefeed should resume from a valid checkpoint/start ts, bootstrap successfully, continue replicating downstream tables, and pass the final
sync_diffcheck.What did you see instead?
sync_difffails at the end because the downstream is missingtest.table_4, but that is only the final symptom. The earlier failure happens during changefeed failover/bootstrap:checkpointTs = 18446744073709551615(MaxUint64)int64, which becomes-1tidb_cdc.ddl_ts_v1is missing on the downstream, so DDL-ts recovery falls back to0realStartTs = 0, creates the table trigger dispatcher withstartTs=0, and sends a bootstrap response withcheckpointTs=0failed to init table trigger dispatcher: all bootstrap responses reported empty checkpointTssync_difflater reports missing downstream tablesRelevant log snippets:
This looks like a product bug in the recovery/bootstrap path rather than a branch-specific regression. The implicated code paths are in
maintainer,downstreamadapter/dispatchermanager, and MySQL DDL-ts recovery.Possible root cause:
checkpointTs = MaxUint64is persisted or reused across failover recovery, then converted into-1in the dispatcher path. Whentidb_cdc.ddl_ts_v1is absent, recovery falls back to0, which makes the bootstrap checkpoint empty and causes the maintainer to fail initialization.Versions of the cluster
Upstream TiDB cluster version (from the archived
sync_difflog):9.0.0-beta.2.pre-1579-ga5545f5815Upstream TiKV version:
unknown from the archived CI artifactsTiCDC version (best-effort from the archived changefeed metadata in the CI logs):
v8.5.4-nextgen.202510.5-184-ga509ee02