Skip to content

[server] Allow enabling lakehouse on tables created before cluster-level lakehouse is enabled#2924

Open
luoyuxia wants to merge 4 commits intoapache:mainfrom
luoyuxia:issue-2908-datalake-enabled
Open

[server] Allow enabling lakehouse on tables created before cluster-level lakehouse is enabled#2924
luoyuxia wants to merge 4 commits intoapache:mainfrom
luoyuxia:issue-2908-datalake-enabled

Conversation

@luoyuxia
Copy link
Contributor

Purpose

Linked issue: close #2908

This change allows lakehouse to be enabled only after the cluster-level lakehouse binding is turned on, while keeping compatibility for clusters and tables created before the new configuration semantics.

Brief change log

  • add cluster config datalake.enabled and define legacy, pre-bind, and fully-enabled semantics
  • update dynamic lake catalog loading so explicit datalake.enabled=true requires datalake.format, while false skips creating LakeCatalog and still tolerates legacy clusters that delete datalake.format
  • persist table.datalake.format when creating tables and validate both CREATE TABLE and ALTER TABLE ... SET ('table.datalake.enabled'='true') against the cluster datalake state and format
  • reject enabling lakehouse for old tables that do not have a persisted table-level datalake format, and reject format mismatches after cluster binding is enabled

Tests

  • ./mvnw -pl fluss-server,fluss-client -am -DskipITs -DfailIfNoTests=false -Dtest=DynamicConfigChangeTest,LakeEnableTableITCase test

API and Format

  • adds cluster config datalake.enabled
  • persists table.datalake.format for newly created lakehouse tables
  • does not change existing storage format, but tightens validation for enabling lakehouse on pre-existing tables

Documentation

  • no

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new cluster-level switch (datalake.enabled) to decouple “format pre-binding” from “lakehouse tables enabled”, so that tables created before full lakehouse enablement can later be enabled safely (with persisted table-level format validation).

Changes:

  • Add datalake.enabled cluster config with legacy / pre-bind-only / fully-enabled semantics and updated dynamic validation.
  • Persist/validate table-level table.datalake.format on create/enable flows and reject mismatches or missing persisted formats.
  • Extend unit/integration tests for the new cluster/table validation behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
fluss-server/src/test/java/org/apache/fluss/server/DynamicConfigChangeTest.java Adds tests for requiring format when datalake.enabled is set, pre-bind-only behavior, and format immutability in explicit mode.
fluss-server/src/main/java/org/apache/fluss/server/utils/TableDescriptorValidation.java Adds create/alter table validation helpers for datalake enablement and format matching.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/MetadataManager.java Hooks new ALTER TABLE datalake validation into table property alteration flow.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java Updates dynamic loading/validation logic to incorporate datalake.enabled and “effective runtime mode”.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java Applies system defaults using lake container state and validates create-table datalake config.
fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java Adds datalake.enabled option and refines datalake.format description.
fluss-client/src/test/java/org/apache/fluss/client/table/LakeEnableTableITCase.java Adds/updates IT coverage for pre-bind enablement, missing persisted format, and format mismatch rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@luoyuxia luoyuxia force-pushed the issue-2908-datalake-enabled branch 2 times, most recently from 0a4efc3 to b4633f6 Compare March 25, 2026 06:41
@luoyuxia luoyuxia requested a review from Copilot March 25, 2026 06:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@luoyuxia luoyuxia force-pushed the issue-2908-datalake-enabled branch from b4633f6 to e06641b Compare March 25, 2026 07:08
@luoyuxia luoyuxia force-pushed the issue-2908-datalake-enabled branch 2 times, most recently from e9a8d09 to 9224cdc Compare March 25, 2026 07:35
@luoyuxia luoyuxia requested a review from Copilot March 25, 2026 07:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- If `datalake.enabled` is unset, Fluss preserves the legacy behavior: configuring `datalake.format` alone also enables lakehouse tables.
- If `datalake.enabled = false`, Fluss pre-binds the lake format for newly created tables but does not allow lakehouse tables yet.
- If `datalake.enabled = true`, Fluss fully enables lakehouse tables.
- If `datalake.enabled` is explicitly configured, `datalake.format` must also be configured.
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upgrade note says that if datalake.enabled is explicitly configured, datalake.format must also be configured. Elsewhere in the docs (and in the current validation logic) the requirement is only when datalake.enabled is explicitly set to true. Please clarify this bullet to match the intended behavior (and, if datalake.enabled=false can be used without a format, note that pre-bind semantics require setting the format).

Suggested change
- If `datalake.enabled` is explicitly configured, `datalake.format` must also be configured.
- If `datalake.enabled` is explicitly set to `true`, `datalake.format` must also be configured. When `datalake.enabled = false`, `datalake.format` is optional, but required if you want the pre-bind behavior described above.

Copilot uses AI. Check for mistakes.
| Option | Type | Default | Description |
|------------------|---------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| datalake.enabled | Boolean | (None) | Whether the Fluss cluster is ready to create and manage lakehouse tables. If unset, Fluss keeps the legacy behavior where configuring `datalake.format` also enables lakehouse tables. If set to `false`, Fluss pre-binds the lake format for newly created tables but does not allow lakehouse tables yet. If set to `true`, Fluss fully enables lakehouse tables. When this option is explicitly configured to true, `datalake.format` must also be configured. |
| datalake.format | Enum | (None) | The datalake format used by of Fluss to be as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. |
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar in the datalake.format description is incorrect (“used by of Fluss to be as lakehouse storage”). Please rephrase (e.g., “used by Fluss as lakehouse storage”) to match the wording used in ConfigOptions.DATALAKE_FORMAT.

Suggested change
| datalake.format | Enum | (None) | The datalake format used by of Fluss to be as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. |
| datalake.format | Enum | (None) | The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. |

Copilot uses AI. Check for mistakes.
+ "If unset, Fluss keeps the legacy behavior where configuring `datalake.format` "
+ "also enables lakehouse tables. If set to `false`, Fluss pre-binds the lake format "
+ "for newly created tables but does not allow lakehouse tables yet. If set to `true`, "
+ "Fluss fully enables lakehouse tables. When this option is explicitly configured, "
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datalake.enabled description says that when the option is "explicitly configured", datalake.format must also be configured, but the server-side validation/enforcement currently only requires datalake.format when datalake.enabled is explicitly set to true. Please align the wording with the actual semantics (e.g., specify “explicitly set to true”) to avoid misleading operators.

Suggested change
+ "Fluss fully enables lakehouse tables. When this option is explicitly configured, "
+ "Fluss fully enables lakehouse tables. When this option is explicitly set to `true`, "

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow enabling lakehouse on tables created before cluster-level lakehouse is enabled

2 participants