CLDSRV-863: Checksums for PutObject and UploadPart#6112
CLDSRV-863: Checksums for PutObject and UploadPart#6112leif-scality wants to merge 10 commits intodevelopment/9.4from
Conversation
Hello leif-scality,My role is to assist you with the merge of this Available options
Available commands
Status report is not available. |
Incorrect fix versionThe
Considering where you are trying to merge, I ignored possible hotfix versions and I expected to find:
Please check the |
❌ 1 Tests Failed:
View the full list of 1 ❄️ flaky test(s)
To view more test analytics, go to the Test Analytics Dashboard |
|
LGTM |
|
LGTM — well-structured checksum pipeline with good error handling and thorough tests. |
| return null; | ||
| } | ||
|
|
||
| if (this.trailerChecksumValue) { |
There was a problem hiding this comment.
validateChecksum() uses a truthy check to detect unexpected trailers on line 56. If setExpectedChecksum is called with an empty string value (e.g. from a malformed trailer like x-amz-checksum-crc32: with nothing after the colon), the check evaluates to false and the unexpected trailer goes undetected. Consider using this.trailerChecksumValue !== undefined for consistency with line 27. — Claude Code
|
| const dataStream = stripTrailingChecksumStream(dataStreamTmp, log, cbOnce); | ||
| return data.put( | ||
| cipherBundle, dataStream, size, objectContext, backendInfo, log, | ||
| cipherBundle, checksumedStream.stream, size, objectContext, backendInfo, log, |
There was a problem hiding this comment.
storeObject.js passes checksumedStream.stream (a ChecksumTransform) to data.put, but ChecksumTransform never has .headers set on it. In the old code, prepareStream always returned a stream with .headers = stream.headers. If any data backend or middleware reads .headers from the stream passed to data.put, this will be undefined.
I checked lib/data/wrapper.js and it does not reference .headers, but worth verifying against all backend implementations (sproxyd, azure, gcp, etc.) that none rely on it.
— Claude Code
There was a problem hiding this comment.
.headers: checked — no backend in lib/data/ or Arsenal reads .headers off the stream passed to data.put.
| if (checksumedStream.stream.writableFinished) { | ||
| return doValidate(); | ||
| } | ||
| checksumedStream.stream.once('finish', doValidate); |
There was a problem hiding this comment.
Potential race: if ChecksumTransform errors during _flush (e.g. crc64nvme digest fails), the error goes through errCb which calls cbOnce. But if writableFinished was false, the doValidate listener is also registered on 'finish'. Since _flush called callback(error), the stream will not emit 'finish' so the listener is harmless — but the 'finish' and error paths are independent, and both can potentially reach cbOnce. The jsutil.once(cb) guard prevents double-callback, but the upgraded onStreamError (line 88-96) could trigger batchDelete even though doValidate also triggers batchDelete on checksum failure. Consider whether both paths could fire in an edge case.
— Claude Code
There was a problem hiding this comment.
double-delete: not a risk — finish and error are mutually exclusive in the Node.js stream lifecycle, so both handlers can't fire for the same stream.
|
Well-structured PR that adds checksum validation for PutObject and UploadPart across all streaming modes. The ChecksumTransform, trailer parsing, and error mapping are clean. Two items to verify: |
|
LGTM |
a284c9f to
dade07b
Compare
|
LGTM |
a45ea10 to
65aa501
Compare
|
|
||
| for (const protocol of protocols) { | ||
| for (const algo of algos) { | ||
| itSkipIfAWS( |
There was a problem hiding this comment.
Test names using itSkipIfAWS() in makeScenarioTests start with should, but the dynamically generated tests in the for loops (e.g. line 472 below) don't. They should follow the same convention.
— Claude Code
|
LGTM — solid implementation of streaming checksum validation for PutObject and UploadPart. The stream pipeline design (V4Transform / TrailingChecksumTransform → ChecksumTransform), the once-guarded error callbacks, and the mutable onStreamError upgrade pattern in storeObject are all well done. The race condition fix in services.js (NoSuchKey guard for isCompleteMPUInProgress) is a good defensive addition. |
| for (const protocol of protocols) { | ||
| for (const algo of algos) { | ||
| itSkipIfAWS( | ||
| `${protocol.name} with wrong x-amz-checksum-${algo.name}: returns 400 BadDigest`, |
There was a problem hiding this comment.
Test names passed to itSkipIfAWS() (which wraps it()) should start with should per project convention.
— Claude Code
Review by Claude Code |
|
LGTM |
jonathan-gramain
left a comment
There was a problem hiding this comment.
Partial review (stopped at commit "emit trailer in TrailingChecksumTransform"), will continue tomorrow
| if (!this.streamClosed && this.readingTrailer && this.trailerBuffer.length === 0) { | ||
| // Nothing came after "0\r\n", don't fail. | ||
| // If the x-amz-trailer header was present then the trailer is required and ChecksumTransform will fail. | ||
| return callback(); | ||
| } else if (!this.streamClosed && this.readingTrailer && this.trailerBuffer.length !== 0) { | ||
| this.log.error('stream ended without trailer "\r\n"'); | ||
| return callback(incompleteBodyError); | ||
| } else if (!this.streamClosed && !this.readingTrailer) { |
There was a problem hiding this comment.
We could simplify the flow to avoid repeating conditions, e.g.:
if (streamClosed) {
// return ok
}
if (!readingTrailer) {
// log and return incomplete body error
}
if (trailerBuffer.length === 0) {
// no trailer: ok
}
// log and return incomplete body error
There was a problem hiding this comment.
It could be even cleaner with an FSM when state management becomes a bit more complex, by maintaining an explicit state variable and acting based on its value.
There was a problem hiding this comment.
reworked as FSM
| @@ -20,6 +24,10 @@ class TrailingChecksumTransform extends Transform { | |||
| this.bytesToDiscard = 0; // when trailing \r\n are present, we discard them but they can be in different chunks | |||
| this.bytesToRead = 0; // when a chunk is advertised, the size is put here and we forward all bytes | |||
| this.streamClosed = false; | |||
There was a problem hiding this comment.
It's not related to this PR, but how about renaming this variable with a more specific name like this.receivedLastChunk? I got confused as it can easily be mistaken for a nodejs stream state. But feel free to ignore.
There was a problem hiding this comment.
reworked as FSM
| * headers and trailing checksum trailer from the stream, forwarding only | ||
| * the raw object data. The trailer name and value are emitted via a | ||
| * 'trailer' event so that ChecksumTransform can validate the checksum. | ||
| */ |
There was a problem hiding this comment.
JSDoc comment has broken indentation — lines 208-210 are missing a leading space before the *.
```suggestion
- This class handles the chunked-upload body format used by
- STREAMING-UNSIGNED-PAYLOAD-TRAILER requests. It strips the chunk-size
- headers and trailing checksum trailer from the stream, forwarding only
- the raw object data. The trailer name and value are emitted via a
- 'trailer' event so that ChecksumTransform can validate the checksum.
| } | ||
|
|
||
| const chunkLenStr = this.buffer.toString('ascii', 0, this.bufferOffset).trim(); | ||
| // AWS does not do this check, it returns 500 if it is to large. |
There was a problem hiding this comment.
Typo: "to large" should be "too large".
```suggestion
// AWS does not do this check, it returns 500 if it is too large.
|
bd3d857 to
22c656c
Compare
|
LGTM |
jonathan-gramain
left a comment
There was a problem hiding this comment.
Done up to commit add ChecksumTransform to calculate and verify stream checksums
| if (!this.streamClosed && this.readingTrailer && this.trailerBuffer.length === 0) { | ||
| // Nothing came after "0\r\n", don't fail. | ||
| // If the x-amz-trailer header was present then the trailer is required and ChecksumTransform will fail. | ||
| return callback(); | ||
| } else if (!this.streamClosed && this.readingTrailer && this.trailerBuffer.length !== 0) { | ||
| this.log.error('stream ended without trailer "\r\n"'); | ||
| return callback(incompleteBodyError); | ||
| } else if (!this.streamClosed && !this.readingTrailer) { |
There was a problem hiding this comment.
It could be even cleaner with an FSM when state management becomes a bit more complex, by maintaining an explicit state variable and acting based on its value.
| if (xAmzChecksumCnt > 1) { | ||
| return { error: ChecksumError.MultipleChecksumTypes, details: { algorithms: checksumHeaders } }; | ||
| } | ||
|
|
There was a problem hiding this comment.
Nitpick suggestion for clarity:
- add
const checksumHeader = checksumHeaders[0]; - thereafter, check if a header is present via
if (checksumHeader)and use it directly, instead of checking the count and usingchecksumHeaders[0]
| } | ||
|
|
||
| _transform(chunk, encoding, callback) { | ||
| const input = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk); |
There was a problem hiding this comment.
You can probably assume that it's a Buffer, I don't think we would send string contents to this stream in any case.
|
|
||
| _transform(chunk, encoding, callback) { | ||
| const input = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk); | ||
| this.hash.update(input, encoding); |
There was a problem hiding this comment.
Since input is a buffer, encoding will be ignored, so better leave it out (if you wanted to support strings, then you would have to pass encoding to the Buffer.from call above, but as I said I don't think we would have to deal with strings).
| this.trailerChecksumValue = undefined; | ||
| } | ||
|
|
||
| setExpectedChecksum(name, value) { |
There was a problem hiding this comment.
I feel like whether the checksum comes from the checksum header or a trailer, they should be treated the same way when it comes to comparing with the computed checksum.
I understand that you defer validation of the checksum value itself to validateChecksum, but maybe it would be best to validate it directly here, then update this.expectedChecksum so the following logic of validateChecksum can be agnostic of where the checksum comes from. (If you really need to defer the error reporting to validateChecksum, you could set an error flag this.checksumError or similar and check it at the beginning of validateChecksum, but it may not be needed)
There was a problem hiding this comment.
The thing is that when we call setExpectedChecksum we don't know if _flush has been called and finished, so this.digest may not be ready. In dataStore we make sure the checksumed stream is finished before calling validateChecksum, so we know this.digest is ready.
Let me know if this responds your concern or if I missed something
There was a problem hiding this comment.
What I meant is that every sanity check done in the this.isTrailer case (https://github.com/scality/cloudserver/pull/6112/changes#diff-5ea0da6c64111c8a94b664848cc7c8e9201b530aa77f0a6e962a905af8055f9eR25) except the final digest check could be done within setExpectedChecksum, where the function would:
- on success, set
this.expectedChecksumto be validated byvalidateChecksumat the end, i.e. using the same code path than for non-trailer checksum - on error, either return a validation error immediately or mark the error in an attribute
this.checksumError(for example) to be checked and returned first thing byvalidateChecksum
jonathan-gramain
left a comment
There was a problem hiding this comment.
Will review the new FSM tomorrow
| stream.pipe(trailingChecksumTransform); | ||
| trailingChecksumTransform.headers = stream.headers; | ||
| return trailingChecksumTransform; | ||
| let stream = request; |
There was a problem hiding this comment.
I don't think you benefit from creating and overwriting this temporary variable, you could just use request or middleware streams like v4Transform directly where stream is used.
| const doValidate = () => { | ||
| const checksumErr = checksumedStream.stream.validateChecksum(); | ||
| if (checksumErr) { | ||
| log.debug('failed checksum validation stream', checksumErr); |
There was a problem hiding this comment.
| log.debug('failed checksum validation stream', checksumErr); | |
| log.debug('failed checksum validation stream', { error: checksumErr }); |
| }); | ||
| }); | ||
|
|
||
| it('should not delete stored data when checksum validation passes', done => { |
There was a problem hiding this comment.
The test is marked as "checksum validation passes" but it seems there is no checksum passed at all, is it meant to pass a valid x-amz-checksum-crc32 header matching the hello world payload for example?
| ]; | ||
|
|
||
| // Build the raw chunked body for STREAMING-UNSIGNED-PAYLOAD-TRAILER | ||
| function buildTrailerBody(algoName, wrongDigest) { |
There was a problem hiding this comment.
nitpick on the variable name wrongDigest: technically this function doesn't care if the digest is correct or not, so it could just be called digest (meaning it could also potentially be used to build correct bodies)
| // Constants for protocol scenario tests | ||
|
|
||
| const trailerContent = Buffer.from('trailer content'); // 15 bytes, hex 'f' | ||
| const trailerContentSha256 = '4x74k2oA6j6knXzpDwogNkS6E3MM49tPpJMjfD+ES68='; |
There was a problem hiding this comment.
| const trailerContentSha256 = '4x74k2oA6j6knXzpDwogNkS6E3MM49tPpJMjfD+ES68='; | |
| const trailerContentSha256 = crypto.createHash('sha256').update(trailerContent).digest('base64'); |
|
|
||
| // Create the 24 common protocol-scenario tests for a given URL factory. | ||
| // urlFn() is called lazily at test runtime so that uploadId is available. | ||
| function makeScenarioTests(urlFn) { |
There was a problem hiding this comment.
Could be worth adding some tests with larger synthetic bodies (e.g. 10MB of As), both with correct and wrong checksum, to check that streaming checksumming works.
| const byte = data[idx++]; | ||
| if (byte === CR && this.bufferOffset === 0) { | ||
| log.error('unexpected CR at start of chunk length'); | ||
| return errorInstances.InvalidArgument; |
There was a problem hiding this comment.
If you don't use customizeDescription you can use errors to have a new instance with the current stacktrace, if passing an error with a fresh stacktrace matters.
Otherwise you'd have a pre instanciated error without .stack trace. https://github.com/scality/Arsenal/blob/362f7908b6126709add96ee981d5235d96bab58c/lib/errors/index.ts#L196-L214
| } | ||
| const chunkLen = parseInt(chunkLenStr, 16); | ||
|
|
||
| if (chunkLen > maximumAllowedPartSize) { |
There was a problem hiding this comment.
Should we rather use maximumAllowedUploadSize ? It's for PutObject, even if it's the same value, the maximumAllowedPartSize is for MPU (with lower value for MPU_TESTING)
| }); | ||
| }); | ||
|
|
||
| describe('TrailingChecksumTransform trailer parsing and emitting', () => { |
There was a problem hiding this comment.
this can be inside the above top-level describe, to keep only 1 top-level describe
No description provided.