[server] Allow enabling lakehouse on tables created before cluster-level lakehouse is enabled#2924
[server] Allow enabling lakehouse on tables created before cluster-level lakehouse is enabled#2924luoyuxia wants to merge 4 commits intoapache:mainfrom
Conversation
51bd033 to
acf4e2a
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a new cluster-level switch (datalake.enabled) to decouple “format pre-binding” from “lakehouse tables enabled”, so that tables created before full lakehouse enablement can later be enabled safely (with persisted table-level format validation).
Changes:
- Add
datalake.enabledcluster config with legacy / pre-bind-only / fully-enabled semantics and updated dynamic validation. - Persist/validate table-level
table.datalake.formaton create/enable flows and reject mismatches or missing persisted formats. - Extend unit/integration tests for the new cluster/table validation behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-server/src/test/java/org/apache/fluss/server/DynamicConfigChangeTest.java | Adds tests for requiring format when datalake.enabled is set, pre-bind-only behavior, and format immutability in explicit mode. |
| fluss-server/src/main/java/org/apache/fluss/server/utils/TableDescriptorValidation.java | Adds create/alter table validation helpers for datalake enablement and format matching. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/MetadataManager.java | Hooks new ALTER TABLE datalake validation into table property alteration flow. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java | Updates dynamic loading/validation logic to incorporate datalake.enabled and “effective runtime mode”. |
| fluss-server/src/main/java/org/apache/fluss/server/coordinator/CoordinatorService.java | Applies system defaults using lake container state and validates create-table datalake config. |
| fluss-common/src/main/java/org/apache/fluss/config/ConfigOptions.java | Adds datalake.enabled option and refines datalake.format description. |
| fluss-client/src/test/java/org/apache/fluss/client/table/LakeEnableTableITCase.java | Adds/updates IT coverage for pre-bind enablement, missing persisted format, and format mismatch rejection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/utils/TableDescriptorValidation.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Outdated
Show resolved
Hide resolved
0a4efc3 to
b4633f6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Outdated
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/utils/TableDescriptorValidation.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Outdated
Show resolved
Hide resolved
b4633f6 to
e06641b
Compare
e9a8d09 to
9224cdc
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
9224cdc to
a62c9cf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
fluss-client/src/test/java/org/apache/fluss/client/table/LakeEnableTableITCase.java
Show resolved
Hide resolved
fluss-client/src/test/java/org/apache/fluss/client/table/LakeEnableTableITCase.java
Show resolved
Hide resolved
a62c9cf to
7beeba7
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
fluss-server/src/main/java/org/apache/fluss/server/coordinator/LakeCatalogDynamicLoader.java
Show resolved
Hide resolved
7beeba7 to
94ebf66
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - If `datalake.enabled` is unset, Fluss preserves the legacy behavior: configuring `datalake.format` alone also enables lakehouse tables. | ||
| - If `datalake.enabled = false`, Fluss pre-binds the lake format for newly created tables but does not allow lakehouse tables yet. | ||
| - If `datalake.enabled = true`, Fluss fully enables lakehouse tables. | ||
| - If `datalake.enabled` is explicitly configured, `datalake.format` must also be configured. |
There was a problem hiding this comment.
The upgrade note says that if datalake.enabled is explicitly configured, datalake.format must also be configured. Elsewhere in the docs (and in the current validation logic) the requirement is only when datalake.enabled is explicitly set to true. Please clarify this bullet to match the intended behavior (and, if datalake.enabled=false can be used without a format, note that pre-bind semantics require setting the format).
| - If `datalake.enabled` is explicitly configured, `datalake.format` must also be configured. | |
| - If `datalake.enabled` is explicitly set to `true`, `datalake.format` must also be configured. When `datalake.enabled = false`, `datalake.format` is optional, but required if you want the pre-bind behavior described above. |
| | Option | Type | Default | Description | | ||
| |------------------|---------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| | datalake.enabled | Boolean | (None) | Whether the Fluss cluster is ready to create and manage lakehouse tables. If unset, Fluss keeps the legacy behavior where configuring `datalake.format` also enables lakehouse tables. If set to `false`, Fluss pre-binds the lake format for newly created tables but does not allow lakehouse tables yet. If set to `true`, Fluss fully enables lakehouse tables. When this option is explicitly configured to true, `datalake.format` must also be configured. | | ||
| | datalake.format | Enum | (None) | The datalake format used by of Fluss to be as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. | |
There was a problem hiding this comment.
Grammar in the datalake.format description is incorrect (“used by of Fluss to be as lakehouse storage”). Please rephrase (e.g., “used by Fluss as lakehouse storage”) to match the wording used in ConfigOptions.DATALAKE_FORMAT.
| | datalake.format | Enum | (None) | The datalake format used by of Fluss to be as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. | | |
| | datalake.format | Enum | (None) | The datalake format used by Fluss as lakehouse storage. Currently, supported formats are Paimon, Iceberg, and Lance. In the future, more kinds of data lake format will be supported, such as DeltaLake or Hudi. | |
| + "If unset, Fluss keeps the legacy behavior where configuring `datalake.format` " | ||
| + "also enables lakehouse tables. If set to `false`, Fluss pre-binds the lake format " | ||
| + "for newly created tables but does not allow lakehouse tables yet. If set to `true`, " | ||
| + "Fluss fully enables lakehouse tables. When this option is explicitly configured, " |
There was a problem hiding this comment.
The datalake.enabled description says that when the option is "explicitly configured", datalake.format must also be configured, but the server-side validation/enforcement currently only requires datalake.format when datalake.enabled is explicitly set to true. Please align the wording with the actual semantics (e.g., specify “explicitly set to true”) to avoid misleading operators.
| + "Fluss fully enables lakehouse tables. When this option is explicitly configured, " | |
| + "Fluss fully enables lakehouse tables. When this option is explicitly set to `true`, " |
Purpose
Linked issue: close #2908
This change allows lakehouse to be enabled only after the cluster-level lakehouse binding is turned on, while keeping compatibility for clusters and tables created before the new configuration semantics.
Brief change log
datalake.enabledand definelegacy,pre-bind, andfully-enabledsemanticsdatalake.enabled=truerequiresdatalake.format, whilefalseskips creatingLakeCatalogand still tolerates legacy clusters that deletedatalake.formattable.datalake.formatwhen creating tables and validate bothCREATE TABLEandALTER TABLE ... SET ('table.datalake.enabled'='true')against the cluster datalake state and formatTests
./mvnw -pl fluss-server,fluss-client -am -DskipITs -DfailIfNoTests=false -Dtest=DynamicConfigChangeTest,LakeEnableTableITCase testAPI and Format
datalake.enabledtable.datalake.formatfor newly created lakehouse tablesDocumentation