-
Notifications
You must be signed in to change notification settings - Fork 1k
PHOENIX-7751 : [SyncTable Tool] Feature to validate table data using PhoenixSyncTable tool b/w source and target cluster #2379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
3c54c86
c97f7e0
53e9a3b
6b75fec
6f40ab4
7328f93
fd46404
58ef6a9
6f226f6
1ccf4b6
e75c6c1
a5060ab
cffd2e6
2ef30e6
dd18dae
326e792
b7127cc
f588291
f81aa56
d60104f
359f345
1bcd693
7904c50
b9dfd3c
6c50f95
b8c00e4
d54f970
a951251
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -199,4 +199,33 @@ public static long getMaxLookbackInMillis(Configuration conf) { | |
|
|
||
| /** Exposed for testing */ | ||
| public static final String SCANNER_OPENED_TRACE_INFO = "Scanner opened on server"; | ||
|
|
||
| /** | ||
| * The scan attribute to enable server-side chunk formation and checksum computation for | ||
| * PhoenixSyncTableTool. | ||
| */ | ||
| public static final String SYNC_TABLE_CHUNK_FORMATION = "_SyncTableChunkFormation"; | ||
|
|
||
| /** | ||
| * The scan attribute to provide the target chunk size in bytes for PhoenixSyncTableTool. | ||
| */ | ||
| public static final String SYNC_TABLE_CHUNK_SIZE_BYTES = "_SyncTableChunkSizeBytes"; | ||
|
|
||
| /** | ||
| * The scan attribute to provide the MessageDigest state for cross-region hash continuation in | ||
| * PhoenixSyncTableTool. | ||
| */ | ||
| public static final String SYNC_TABLE_CONTINUED_DIGEST_STATE = "_SyncTableContinuedDigestState"; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add JavaDoc on all 3 constants individually with a description of what they the attribute is and what type of value it would contain? |
||
|
|
||
| /** | ||
| * PhoenixSyncTableTool chunk metadata cell qualifiers. These define the wire protocol between | ||
| * PhoenixSyncTableRegionScanner (server-side coprocessor) and PhoenixSyncTableMapper (client-side | ||
| * mapper). The coprocessor returns chunk metadata as HBase cells with these qualifiers, and the | ||
| * mapper parses them to extract chunk information. | ||
| */ | ||
| public static final byte[] SYNC_TABLE_START_KEY_QUALIFIER = Bytes.toBytes("START_KEY"); | ||
| public static final byte[] SYNC_TABLE_HASH_QUALIFIER = Bytes.toBytes("HASH"); | ||
| public static final byte[] SYNC_TABLE_ROW_COUNT_QUALIFIER = Bytes.toBytes("ROW_COUNT"); | ||
| public static final byte[] SYNC_TABLE_IS_PARTIAL_CHUNK_QUALIFIER = | ||
| Bytes.toBytes("IS_PARTIAL_CHUNK"); | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one | ||
| * or more contributor license agreements. See the NOTICE file | ||
| * distributed with this work for additional information | ||
| * regarding copyright ownership. The ASF licenses this file | ||
| * to you under the Apache License, Version 2.0 (the | ||
| * "License"); you may not use this file except in compliance | ||
| * with the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
| package org.apache.phoenix.util; | ||
|
|
||
| import java.io.IOException; | ||
| import org.bouncycastle.crypto.digests.SHA256Digest; | ||
|
|
||
| /** | ||
| * Utility class for SHA-256 digest state serialization and deserialization. We are not using jdk | ||
| * bundled SHA, since their digest can't be serialized/deserialized which is needed for | ||
| * PhoenixSyncTableTool for cross-region hash continuation. | ||
| */ | ||
| public class SHA256DigestUtil { | ||
|
|
||
| /** | ||
| * Maximum allowed size for encoded SHA-256 digest state. BouncyCastle's SHA256Digest encoded | ||
| * state ranges from 53 to 113 bytes (52 base + 0-60 buffered words + 1 purpose byte). We allow up | ||
| * to 128 bytes as headroom. | ||
| */ | ||
| public static final int MAX_SHA256_DIGEST_STATE_SIZE = 128; | ||
|
|
||
| /** | ||
| * Encodes a SHA256Digest state to a byte array. | ||
| * @param digest The digest whose state should be encoded | ||
| * @return Byte array containing the raw BouncyCastle encoded state | ||
| */ | ||
| public static byte[] encodeDigestState(SHA256Digest digest) { | ||
| byte[] encoded = digest.getEncodedState(); | ||
| if (encoded.length > MAX_SHA256_DIGEST_STATE_SIZE) { | ||
| throw new IllegalArgumentException( | ||
| String.format("SHA256 encoded state too large: %d, expected <= %d", encoded.length, | ||
| MAX_SHA256_DIGEST_STATE_SIZE)); | ||
| } | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This check makes no sense to me, can you explain what exactly are you protecting against? Also not sure about the exception type implies. |
||
| return encoded; | ||
| } | ||
haridsv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| /** | ||
| * Decodes a SHA256Digest state from a byte array. | ||
| * @param encodedState Byte array containing BouncyCastle encoded digest state | ||
| * @return SHA256Digest restored to the saved state | ||
| * @throws IOException if state is invalid, corrupted | ||
| */ | ||
| public static SHA256Digest decodeDigestState(byte[] encodedState) throws IOException { | ||
| if (encodedState == null || encodedState.length == 0) { | ||
| throw new IllegalArgumentException( | ||
| "Invalid encoded digest state: encodedState is null or empty"); | ||
| } | ||
| if (encodedState.length > MAX_SHA256_DIGEST_STATE_SIZE) { | ||
| throw new IllegalArgumentException( | ||
| String.format("Invalid SHA256 state length: %d, expected <= %d", encodedState.length, | ||
| MAX_SHA256_DIGEST_STATE_SIZE)); | ||
| } | ||
| return new SHA256Digest(encodedState); | ||
| } | ||
|
|
||
| /** | ||
| * Decodes a digest state and finalizes it to produce the SHA-256 checksum. | ||
| * @param encodedState Serialized BouncyCastle digest state | ||
| * @return 32-byte SHA-256 hash | ||
| * @throws IOException if state decoding fails | ||
| */ | ||
| public static byte[] finalizeDigestToChecksum(byte[] encodedState) throws IOException { | ||
| SHA256Digest digest = decodeDigestState(encodedState); | ||
| return finalizeDigestToChecksum(digest); | ||
| } | ||
|
|
||
| /** | ||
| * Finalizes a SHA256Digest to produce the final checksum. | ||
| * @param digest The digest to finalize | ||
| * @return 32-byte SHA-256 hash | ||
| */ | ||
| public static byte[] finalizeDigestToChecksum(SHA256Digest digest) { | ||
| byte[] hash = new byte[digest.getDigestSize()]; | ||
| digest.doFinal(hash, 0); | ||
| return hash; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1207,6 +1207,11 @@ public static boolean isIndexRebuild(Scan scan) { | |
| return scan.getAttribute((BaseScannerRegionObserverConstants.REBUILD_INDEXES)) != null; | ||
| } | ||
|
|
||
| public static boolean isSyncTableChunkFormationEnabled(Scan scan) { | ||
| return Arrays.equals( | ||
| scan.getAttribute(BaseScannerRegionObserverConstants.SYNC_TABLE_CHUNK_FORMATION), TRUE_BYTES); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would suggest to also rename the attribute to indicate that it is a boolean. |
||
| } | ||
|
|
||
| public static int getClientVersion(Scan scan) { | ||
| int clientVersion = UNKNOWN_CLIENT_VERSION; | ||
| byte[] clientVersionBytes = | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all of these instead be named SYNC_TOOL ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have used SyncTableTool for user facing class/config. For others, I have used SyncTable, are you recommending to move all Classes and config to SyncTool instead of SyncTable i.e PhoenixSyncTableRegionScanner -> PhoenixSyncToolRegionScanner ?
I felt SyncTable is more self explainable compared to SyncTool, we can also change it to SyncTableTool at all places ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Its okay. Not a big deal. We can stick with the same naming convention.