Skip to content

feat: identity verification 5.8#2599

Open
nan-li wants to merge 45 commits intomainfrom
feat/identity_verification_5.8
Open

feat: identity verification 5.8#2599
nan-li wants to merge 45 commits intomainfrom
feat/identity_verification_5.8

Conversation

@nan-li
Copy link
Copy Markdown
Contributor

@nan-li nan-li commented Mar 30, 2026

Description

One Line Summary

Add Identity Verification (JWT-based) support to the Android SDK, gating API operations behind server-issued JWTs when enabled.

Details

Motivation

Identity Verification allows app developers to authenticate users with JWTs before the SDK sends operations to the OneSignal backend. This prevents unauthorized API calls by requiring a valid JWT for all user-scoped operations when the feature is enabled server-side via the jwt_required remote param.

Scope

Public API additions:

  • OneSignal.login(externalId, jwtBearerToken) — accepts an optional JWT on login
  • OneSignal.updateUserJwt(externalId, token) — supply a fresh JWT when the current one expires
  • OneSignal.addUserJwtInvalidatedListener(listener) / removeUserJwtInvalidatedListener(listener) — get notified when a JWT is invalidated (e.g. 401 from backend) so the app can provide a new one
  • IUserJwtInvalidatedListener / UserJwtInvalidatedEvent — listener interface and event class

Internal changes:

  • JwtTokenStore — persistent store mapping externalId → JWT, backed by SharedPreferences
  • IdentityVerificationService — reacts to the jwt_required remote param arriving via config HYDRATE; purges anonymous operations when IV is enabled; wires JWT invalidation callbacks
  • OperationRepo — gates operation execution on valid JWT when IV is enabled (hasValidJwtIfRequired); handles FAIL_UNAUTHORIZED by invalidating the JWT, re-queuing operations, and notifying the developer; stamps externalId synchronously at enqueue time to avoid race conditions
  • LogoutHelper — when IV is enabled, disables the push subscription internally and suppresses the backend login operation (the device-scoped user is local-only)
  • ConfigModelStoreListener / ParamsBackendService — reads jwt_required from remote params, defaults to false when absent
  • HTTP layer — attaches Authorization: Bearer <jwt> header to requests when IV is enabled
  • In-App Messaging — uses alias-based fetch endpoint and includes JWT in IAM backend calls
  • Operation executors — propagate externalId to result operations and attach JWT to backend requests

What is NOT affected:

  • Apps without Identity Verification enabled behave exactly as before — jwt_required defaults to false and no JWT gating occurs
  • Notification display, open handling, push processing, and confirm deliveries are unchanged

Testing

Unit testing

WIP Adding more testing

  • OperationRepoTests — added tests for FAIL_UNAUTHORIZED handling (identified user fires handler + retries; anonymous user drops operations), and synchronous externalId stamping before async dispatch
  • LoginUserOperationExecutorTests — updated for identity verification parameters
  • LogoutHelperTests — covers IV-enabled logout flow
  • LoginHelperTests — covers IV-enabled login flow with JWT

Manual testing

Tested with the demo app on an Android emulator (Pixel 7, API 35):

  • Login with valid JWT → operations execute normally
  • Login with expired JWT → 401 triggers IUserJwtInvalidatedListener, call updateUserJwt → operations resume
  • Logout with IV enabled → push subscription disabled, new anonymous user created locally
  • App without IV enabled → SDK operates normally, no regression

Affected code checklist

  • Notifications
    • Display
    • Open
    • Push Processing
    • Confirm Deliveries
  • Outcomes
  • Sessions
  • In-App Messaging
  • REST API requests
  • Public API changes

Checklist

Overview

  • I have filled out all REQUIRED sections above
  • PR does one thing
  • Any Public API changes are explained in the PR details and conform to existing APIs

Testing

  • I have included test coverage for these changes, or explained why they are not needed
  • All automated tests pass, or I explained why that is not possible
  • I have personally tested this on my device, or explained why that is not possible

Final pass

  • Code is as readable as possible.
  • I have reviewed this PR myself, ensuring it meets each checklist item

@nan-li nan-li changed the title Feat/identity verification 5.8 Feat: identity verification 5.8 Mar 30, 2026
@nan-li nan-li changed the title Feat: identity verification 5.8 feat: identity verification 5.8 Mar 30, 2026
@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from 4509a8f to 457b745 Compare March 30, 2026 09:35
@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch 4 times, most recently from 113e96b to 00d3799 Compare March 30, 2026 23:05
@nan-li
Copy link
Copy Markdown
Contributor Author

nan-li commented Mar 31, 2026

@claude re-review

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @nan-li's task in 3m 55s —— View job


Re-review: feat/identity_verification_5.8

  • Read current state of files related to previous review findings
  • Verify fixes for the 4 critical issues flagged
  • Look for any remaining or new issues
  • Post updated review

All four critical issues flagged in the previous review have been addressed. Here's a status breakdown:

Previously Critical — Now Fixed ✅

1. Runtime 401 not firing IUserJwtInvalidatedListener (OperationRepo.kt:292-297)

Fixed correctly. OperationRepo now holds a _jwtInvalidatedHandler: ((String) -> Unit)? that is set in IdentityVerificationService.start():

_operationRepo.setJwtInvalidatedHandler { externalId ->
    _userManager.fireJwtInvalidated(externalId)
}

And in the FAIL_UNAUTHORIZED branch, _jwtInvalidatedHandler?.invoke(externalId) is called after _jwtTokenStore.invalidateJwt(externalId). The queue correctly re-enqueues the operations, and the developer receives the callback to provide a fresh JWT. When updateUserJwt is called, forceExecuteOperations() wakes the queue — the full round trip is wired.

2. Logout race condition — UpdateSubscriptionOperation stamped with null externalId (OperationRepo.kt:165-168)

Fixed correctly. stampExternalId(operation) is now called synchronously in enqueue() before scope.launch { internalEnqueue(...) }. Since SubscriptionModelStoreListener.getUpdateOperation() fires synchronously when isDisabledInternally = true is set — and createAndSwitchToNewUser() has not yet run at that point — _identityModelStore.model.externalId is still the old user's ID when it's captured.

3. JwtTokenStore.ensureLoaded() throws uncaught JSONException (JwtTokenStore.kt:32-42)

Fixed correctly. The JSON parsing is wrapped in try { } catch (e: Exception) and isLoaded = true is placed unconditionally at the end of ensureLoaded(). Corrupted SharedPreferences data is now logged and discarded gracefully instead of causing an infinite throw loop.

4. useIdentityVerification stays null for non-IV apps, permanently blocking the queue (ParamsBackendService.kt:87)

Fixed correctly. The field now defaults to false when jwt_required is absent from the backend response:

useIdentityVerification = responseJson.safeBool("jwt_required") ?: false

Non-IV apps get false after HYDRATE, getNextOps() proceeds, and IdentityVerificationService.onModelReplaced() still calls forceExecuteOperations() to wake the queue.


Remaining Observations (non-blocking)

A. _jwtInvalidatedHandler is not @Volatile (OperationRepo.kt:47)

private var _jwtInvalidatedHandler: ((String) -> Unit)? = null

This field is written from IdentityVerificationService.start() (likely the main thread) and read from executeOperations() running on OSOperationRepoScope. In practice this is safe because start() runs before any operations execute, but the JMM doesn't guarantee visibility without @Volatile or a synchronisation point. Worth annotating for correctness. Fix this →

B. Expired JWT at logout time blocks the unsubscribe operation (LogoutHelper.kt)

When IV=true and a 401 has already invalidated the JWT before the user calls logout(), the UpdateSubscriptionOperation (to disable the push subscription) gets a null result from _jwtTokenStore.getJwt(externalId) and is permanently held by hasValidJwtIfRequired. The developer would need to call updateUserJwt with a fresh token to unblock it — but if they're logging out, they likely won't. The backend push subscription stays enabled for the logged-out user until the next cold start (when pruneToExternalIds cleans up the orphaned operation).

This is an edge case (JWT expired + user logs out before providing a refresh) and the overall security posture of the IV feature prevents worse outcomes, but it may be worth documenting or handling explicitly — for example, by skipping the JWT gate for the unsubscribe-on-logout operation. Not blocking the PR.


The implementation is solid. All critical issues are resolved and the approach is well-structured. The IdentityVerificationService / OperationRepo / JwtTokenStore responsibilities are cleanly separated.

@nan-li
Copy link
Copy Markdown
Contributor Author

nan-li commented Mar 31, 2026

@claude re-review

@claude
Copy link
Copy Markdown

claude bot commented Mar 31, 2026

Claude finished @nan-li's task in 2m 39s —— View job


Re-review: feat/identity_verification_5.8

  • Read current state of key files
  • Verify fixes for the 2 observations from previous review
  • Look for any remaining or new issues
  • Post updated review

Both remaining observations from the previous re-review have been addressed. The PR is clean.

Previously Non-blocking — Now Fixed ✅

A. _jwtInvalidatedHandler not @Volatile (OperationRepo.kt:47)

Fixed in commit 76dd958. The field is now correctly annotated:

@Volatile
private var _jwtInvalidatedHandler: ((String) -> Unit)? = null

B. Expired JWT at logout blocks the push subscription disable (UpdateSubscriptionOperation.kt:89)

Fixed in commit 130aa16. The first approach (skip in LogoutHelper) was tried and reverted; the final solution is cleaner — it adds Operation.requiresJwt: Boolean = true as an open property on the base class and overrides it in UpdateSubscriptionOperation:

// Operation.kt:69
open val requiresJwt: Boolean get() = true

// UpdateSubscriptionOperation.kt:89
override val requiresJwt: Boolean get() = false

OperationRepo.hasValidJwtIfRequired() short-circuits on !op.requiresJwt, so subscription updates always proceed regardless of JWT state. This is backend-aligned — the subscription endpoint doesn't enforce JWT auth — and the executor still passes the JWT opportunistically when one is available (val jwt = lastOperation.externalId?.let { _jwtTokenStore.getJwt(it) }).


Minor Observation (no action needed)

The requiresJwt = false behavior on UpdateSubscriptionOperation has no dedicated test in OperationRepoTests. The behavior is implicitly covered by LogoutHelperTests, but a unit test verifying that an operation with requiresJwt = false executes when hasValidJwtIfRequired would otherwise block could add confidence. Given the simplicity of the change, this is not blocking.


The implementation is solid. All six issues identified across both review passes are resolved. The architecture is clean: IdentityVerificationService / OperationRepo / JwtTokenStore responsibilities are well-separated, the requiresJwt flag gives per-operation control over JWT gating, and the developer-facing callback round-trip is fully wired. Good to merge.

@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from 972eace to f6e4227 Compare March 31, 2026 08:49
@nan-li nan-li requested a review from a team March 31, 2026 16:01
Comment thread OneSignalSDK/detekt/detekt-baseline-core.xml
<CurrentIssues>
<ID>ComplexCondition:InAppMessagesManager.kt$InAppMessagesManager$!message.isTriggerChanged &amp;&amp; isMessageDisplayed &amp;&amp; (isTriggerOnMessage || isNewTriggerAdded &amp;&amp; isOnlyDynamicTriggers)</ID>
<ID>ComplexMethod:TriggerController.kt$TriggerController$private fun evaluateTrigger(trigger: Trigger): Boolean</ID>
<ID>ComplexMethod:InAppMessagesManager.kt$InAppMessagesManager$private suspend fun fetchMessages(rywData: RywData)</ID>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should break it down if its a complex method.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I undid the detekt changes and then added them back in the last few commits with reasoning. I actually didn't end up making changes to in-app-messages in detekt.

Comment thread OneSignalSDK/onesignal/core/src/main/java/com/onesignal/user/UserModule.kt Outdated
* @param externalId The external ID of the user whose token is being updated.
* @param token The new JWT bearer token.
*/
fun updateUserJwt(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like i mentioned elsewhere, we need suspend methods for these

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch 3 times, most recently from ca64cdf to 258c8e6 Compare April 3, 2026 03:46
@nan-li nan-li requested a review from abdulraqeeb33 April 3, 2026 06:57
Copy link
Copy Markdown
Contributor

@abdulraqeeb33 abdulraqeeb33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets try and do a bug bash on this today/tomorrow?

@nan-li
Copy link
Copy Markdown
Contributor Author

nan-li commented Apr 13, 2026

@claude review

@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from b445a01 to 17fc13b Compare April 14, 2026 00:55
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two bugs I flagged in the last pass (IAM rate-limit poisoning and JWT exposure in login() logs) have been fixed in the latest commit — thanks for addressing them. Given the security-sensitive nature of this feature (JWT-based identity gating across 86 files), a final human look before merge is warranted.

Extended reasoning...

Overview

This PR adds Identity Verification (JWT-based authentication gating) to the Android SDK, spanning 86 files including new public API, a new JwtTokenStore, OperationRepo changes, IAM fetch gating, and HttpClient JWT header injection.

Prior Review Rounds

  • Rounds 1–2 (late March): 6 bugs found and fixed.
  • Round 3 (2026-04-14): 4 new bugs filed as inline comments. The latest commit (82678a5 "Address Claude review bot feedback on IV PR") addresses all of them:
    • ✅ IAM rate-limit poisoning: Guard now checks both and , returning before is set.
    • ✅ JWT logging in / : Both now use .
    • ✅ FAIL_UNAUTHORIZED loop when IV=OFF: Gated on .
    • ✅ JWT in HttpClient logs: Authorization header filtered from logged request properties.
    • 403/401 conflation and stale ops at logout explicitly deferred by author (out of scope / by design).

Security Risks

The PR touches auth-critical paths: JWT storage/retrieval, JWT injection into HTTP headers, FAIL_UNAUTHORIZED error routing, and the public-facing / APIs. No new vulnerabilities found in this pass, but the breadth of the change means a human reviewer should give final sign-off.

Level of Scrutiny

This is production-facing security code that gates API calls behind developer-issued JWTs. The implementation looks correct across the key paths reviewed, but the sheer scope (86 files, new public API surface, threading concerns) means automated review alone is insufficient.

Other Factors

  • The LoginHelper pre-HYDRATE logic is sound: the null case falls through to IV=OFF behavior, and IdentityVerificationService cleans up on HYDRATE.
  • intentionally omits UNAUTHORIZED handling (requiresJwt=false by design; JWT is passed opportunistically).
  • Test infrastructure () correctly gates test suites from the IV hold path.

@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from 16f9650 to fbc53f2 Compare April 15, 2026 02:56
nan-li added 2 commits April 14, 2026 20:03
…ication nullable, OptionalHeaders.jwt

Foundational models and infrastructure for identity verification:

- Create JwtTokenStore: persistent Map<externalId, JWT> backed by SharedPreferences, supporting multi-user JWT storage with getJwt/putJwt/invalidateJwt/pruneToExternalIds
- Add var externalId to Operation base class so OperationRepo can stamp and gate operations per-user; remove redundant externalId from LoginUserOperation and TrackCustomEventOperation (same Model data-map key, no migration needed)
- Change ConfigModel.useIdentityVerification from Boolean to Boolean? (null = unknown, false = off, true = on) to eliminate race between operation processing and remote params
- Add jwt field to OptionalHeaders for passing Bearer tokens through HTTP layer
- Add PREFS_OS_JWT_TOKENS key to PreferenceOneSignalKeys

Made-with: Cursor
nan-li and others added 19 commits April 14, 2026 20:03
When the server omits jwt_required from the params response,
safeBool() returned null, leaving useIdentityVerification unset.
getNextOps() treated null as "not yet known" and returned null
permanently, silently blocking every SDK operation for the session.

Default to false when the field is absent so non-IV apps are
unaffected by the identity verification gating logic.

Made-with: Cursor
The field is written from the main thread in
IdentityVerificationService.start() and read on the OperationRepo
coroutine thread. @volatile guarantees cross-thread visibility.

Made-with: Cursor
When IV is enabled and the JWT has already been invalidated (e.g. by
a prior 401), the UpdateSubscriptionOperation to disable the push
subscription would be permanently blocked by hasValidJwtIfRequired.
Since the backend call would fail with 401 anyway, skip it entirely
and just switch to the new anonymous user.

Made-with: Cursor

Revert "Skip push subscription disable on logout when JWT is already expired"

This reverts commit 5ce284257b6e03b8c862745ff58fcf4293299577.

Exempt UpdateSubscriptionOperation from JWT gating

The subscription update endpoint does not require a JWT on the
backend. Add Operation.requiresJwt (default true) and override it
to false in UpdateSubscriptionOperation so these operations are not
blocked by hasValidJwtIfRequired when the JWT is expired or missing.

This fixes the edge case where logging out with an expired JWT would
permanently block the push subscription disable operation.

Made-with: Cursor
…ationRepo

Pass externalId alongside onesignalId in every operation constructor, eliminating
the fragile stampExternalId workaround and parentExternalId propagation from
OperationRepo. Every creation site (model store listeners, session listener,
executors, etc.) now explicitly provides the externalId at construction time.

Made-with: Cursor
Add suspend variant of updateUserJwt for API consistency with
login/loginSuspend. Also add waitForInit to the non-suspend
updateUserJwt so it safely handles calls before SDK initialization.
- Baseline 2 new UseCheckOrError entries for OneSignalImp.kt for updateUserJwt/updateUserJwtSuspend init guards that follow the existing login/logout pattern.
- Baseline ReturnCount for hasValidJwtIfRequired — the guard-clause style is clearer than collapsing into a single boolean expression for this JWT gating logic.
- Narrow JwtTokenStore catch from Exception to JSONException to satisfy detekt's TooGenericExceptionCaught rule, no baseline added.
…r params

All new _prefix constructor parameters (e.g. _jwtTokenStore, _identityModelStore,
_configModelStore) follow the existing codebase convention — 80+ identical entries
already exist in the baseline. These are DI-injected dependencies across
IdentityVerificationService, OperationRepo, JwtTokenStore, and various executors/listeners.
- Update 4 existing baseline entries (added jwt/_jwtTokenStore to signatures)
- Baseline 4 new LongParameterList entries for IV constructors: CreateSubscriptionOperation, UpdateSubscriptionOperation, RefreshUserOperationExecutor, and UpdateUserOperationExecutor crossed the constructor threshold (8) after adding externalId and _jwtTokenStore params. This follows existing patterns.
OperationRepo: on FAIL_UNAUTHORIZED, invalidate JWT and re-queue before
notifying the app; invoke setJwtInvalidatedHandler synchronously with
try/catch so listener failures do not hit executeOperations outer catch.
UserManager.fireJwtInvalidated performs the single async hop (SupervisorJob
+ Dispatchers.Default) with Otel-style launch try/catch and per-listener
isolation.

IdentityVerificationService: call forceExecuteOperations before
fireJwtInvalidated on HYDRATE.

Documentation: IOperationRepo handler runs on the op-repo thread and must
return quickly; public API documents background-thread delivery for
IUserJwtInvalidatedListener, addUserJwtInvalidatedListener, and
UserJwtInvalidatedEvent (LiveData postValue guidance).

Tests: FAIL_UNAUTHORIZED JWT handler coverage; no artificial delay after sync repo handler.
…2600)

Co-authored-by: AR Abdul Azeez <abdul@onesignal.com>
…s post-enqueue

OperationRepo: gate FAIL_UNAUTHORIZED re-queue on useIdentityVerification == true.
When IV is OFF, hasValidJwtIfRequired() always returns true so re-queued ops
were immediately eligible, creating a ~200ms infinite retry loop. Now IV-OFF
treats FAIL_UNAUTHORIZED as FAIL_NORETRY (drop + wake waiters).

OperationRepo: in removeOperationsWithoutExternalId(), clear local
existingOnesignalId on queued LoginUserOperations. When IV=ON arrives via
HYDRATE, anonymous CreateUserOperations are purged, orphaning the local ID
that LoginUserOperation.canStartExecute was waiting on translateIds to resolve.
Clearing it unblocks the operation and routes the executor through createUser().

LoginUserOperation: widen existingOnesignalId setter to internal.

Fix Operation.externalId KDoc to reflect that subclass constructors
set this field, not IOperationRepo at enqueue time.

Made-with: Cursor
LogoutHelper: add explicit null branch for unknown IV state (pre-HYDRATE).
Disables push subscription and suppresses backend op (like IV=ON) while also
enqueuing a LoginUserOperation so the anonymous user is created on the backend
if IV turns out to be OFF. If IV=ON, removeOperationsWithoutExternalId() purges
the anonymous op at HYDRATE. Known IV=ON and IV=OFF paths are unchanged.

HttpClient: filter Authorization header from requestProperties before passing
to logHTTPSent() so the full Bearer token is not written to debug logs.

OneSignalImp: redact token in updateUserJwt/updateUserJwtSuspend debug logs.
Made-with: Cursor
- Redact JWT from login/loginSuspend logs (show last 8 chars)
- Fix error message in updateUserJwtSuspend
- Add FAIL_UNAUTHORIZED to deleteSubscription and CustomEvent executors
- Drop TransferSubscriptionOperation when IV=ON
- Skip IAM fetch when JWT is invalidated (prevent rate-limit poisoning)
- Clear existingOnesignalId unconditionally in removeOperationsWithoutExternalId
- MagicNumber(8): log suffix length in takeLast(8) — extracting a
  named constant for a single-use log format arg adds noise without
  improving readability
- ReturnCount(transferSubscription): early-exit guard follows the same
  pattern as LoginUserFromSubscriptionOperationExecutor; restructuring
  to reduce returns would obscure the guard-clause intent
- NestedBlockDepth(executeOperations): pre-existing from IV branch,
  the when-block nesting is inherent to the dispatch logic
- UseCheckOrError: pre-existing non-suspend login/logout/accessor
  throws — consistent with existing codebase pattern
- Removed stale entry for renamed error message
@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from fbc53f2 to fb2353f Compare April 15, 2026 03:04
- LogoutHelper: remove suppressBackendOperation=true in pre-HYDRATE
  branch so CreateSubscriptionOperation is enqueued alongside the
  anonymous LoginUserOperation (fixes IV=OFF path)
- InAppMessagesManager: capture JWT once before guard check to
  eliminate TOCTOU between guard and backend call
- nit: LoginUserOperationExecutor: add error log when anonymous
  LoginUserOperation is dropped with no subscription op
- Move anonymous operation suppression from individual model store
  listeners into OperationRepo.enqueue/enqueueAndWait so all enqueue
  sites (SessionListener, TrackGooglePurchase, CustomEventController,
  etc.) are covered. LoginUserOperation is exempt.
- Remove redundant shouldSuppressForAnonymousUser() from
  IdentityModelStoreListener, PropertiesModelStoreListener, and
  SubscriptionModelStoreListener.
- Add LogoutHelper tests for IV=true and IV=null (pre-HYDRATE) branches.
@nan-li nan-li force-pushed the feat/identity_verification_5.8 branch from c5086c3 to 3f3dd09 Compare April 16, 2026 08:06
Comment on lines +433 to +456
operationRepo.forceExecuteOperations()
} else {
if (!isInitialized) {
throw IllegalStateException("Must call 'initWithContext' before 'updateUserJwt'")
}
jwtTokenStore.putJwt(externalId, token)
operationRepo.forceExecuteOperations()
}
}

override suspend fun updateUserJwtSuspend(
externalId: String,
token: String,
) = withContext(runtimeIoDispatcher) {
Logging.log(LogLevel.DEBUG, "updateUserJwtSuspend(externalId: $externalId, token: ...${token.takeLast(8)})")

suspendUntilInit(operationName = "updateUserJwt")

if (!isInitialized) {
throw IllegalStateException("Must call 'initWithContext' before 'updateUserJwt'")
}

jwtTokenStore.putJwt(externalId, token)
operationRepo.forceExecuteOperations()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Two issues in updateUserJwt()/updateUserJwtSuspend(): (1) forceExecuteOperations() wakes the operation queue with force=false, so after a 401-triggered re-queue the retry is delayed by the full ~5-second opRepoExecutionInterval batching window instead of executing immediately; (2) neither method signals the IAM subsystem to re-fetch in-app messages, leaving any IAM fetch that was skipped due to a missing JWT permanently stuck until the next session-start or push-subscription event.

Extended reasoning...

Bug 1 — forceExecuteOperations() uses force=false, causing a ~5-second delay after JWT renewal

When the operation queue receives a FAIL_UNAUTHORIZED result it re-enqueues the failing operation and sets the JWT to null. Control returns to processQueueForever(), which calls waitForNewOperationAndExecutionInterval() and enters the idle-wait state. When the app calls updateUserJwt() or updateUserJwtSuspend(), the implementation calls operationRepo.forceExecuteOperations() (OperationRepo.kt line 218). However forceExecuteOperations() wakes the main waiter with LoopWaiterMessage(false)force=false. Inside waitForNewOperationAndExecutionInterval() (lines 237–253), the while (\!wakeMessage.force) loop stays true and the function waits up to opRepoExecutionInterval (default 5 000 ms) before breaking out to execute. The retryWaiter.wake(LoopWaiterMessage(true)) call also in forceExecuteOperations() only shortens the delayBeforeNextExecution() retry-backoff wait, which is not the active state here.

Concrete proof: App calls login("alice", jwt1) → LoginUserOperation enqueued → backend returns HTTP 401 → executor sets jwtTokenStore.putJwt("alice", null) and returns FAIL_UNAUTHORIZED → op re-queued → processQueueForever loop calls waitForNewOperationAndExecutionInterval(). Now app calls updateUserJwt("alice", jwt2). forceExecuteOperations() fires waiter.wake(LoopWaiterMessage(false)). waitForNewOperationAndExecutionInterval receives the wake but wakeMessage.force == false, so it continues waiting for up to opRepoExecutionInterval (5 000 ms). The retry only fires after that full delay.

Fix: Change the waiter.wake call inside forceExecuteOperations() to LoopWaiterMessage(true), or add a dedicated immediateExecuteOperations() path that passes force=true to the main waiter.


Bug 2 — JWT renewal does not trigger an IAM re-fetch

InAppMessagesManager.fetchMessagesWhenConditionIsMet() is called from exactly four places: initial appId configuration, ConfigModel HYDRATE, push-subscription ID change, and explicit session-start. The new JWT guard added in this PR (InAppMessagesManager.kt lines 310–313) correctly avoids fetching IAMs when JWT is null, but there is no corresponding call to re-fetch once a valid JWT becomes available via updateUserJwt().

Concrete proof: App is configured with IV required. SDK starts, JWT is missing → fetchMessagesWhenConditionIsMet() short-circuits because jwtTokenStore.getJwt(externalId) == null. App later calls updateUserJwt("alice", validJwt). OneSignalImp.updateUserJwt() (lines 424–444) calls jwtTokenStore.putJwt() then operationRepo.forceExecuteOperations() — nothing else. InAppMessagesManager has no listener on JwtTokenStore and is never signalled. Result: in-app messages are never fetched for this session unless a push-subscription ID change or new session start coincidentally fires.

Fix: Call inAppMessagesManager.fetchMessagesWhenConditionIsMet() inside both updateUserJwt() and updateUserJwtSuspend() after putJwt(), or adopt the IJwtUpdateListener pattern already used in the companion PR #2613 and have InAppMessagesManager subscribe to JWT-renewal events.

Comment on lines 325 to 339
lastTimeFetchedIAMs = now
}

val (aliasLabel, aliasValue) =
IdentityConstants.resolveAlias(
_configModelStore.model.useIdentityVerification,
externalId,
_identityModelStore.model.onesignalId,
)
// lambda so that it is updated on each potential retry
val sessionDurationProvider = { _time.currentTimeMillis - _sessionService.startTime }
val newMessages = _backend.listInAppMessages(appId, subscriptionId, rywData, sessionDurationProvider)
val newMessages = _backend.listInAppMessages(appId, aliasLabel, aliasValue, subscriptionId, rywData, sessionDurationProvider, jwt)

if (newMessages != null) {
this.messages = newMessages as MutableList<InAppMessage>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In fetchMessages(), externalId and jwt are captured before fetchIAMMutex.withLock as a TOCTOU guard, but _identityModelStore.model.onesignalId is read live after the mutex suspension point at line 332. If a user switch occurs while suspended on the mutex, the IV=true path will fetch IAM messages for the old user (using stale externalId+jwt) while the new user is active. The fix is to snapshot onesignalId alongside externalId and jwt before the mutex, so the entire identity triple is consistent.

Extended reasoning...

The inconsistency

The PR carefully snapshots two identity fields before fetchIAMMutex.withLock to avoid TOCTOU issues — externalId at line 304 and jwt at line 307 (the comment on line 305 even explains the rationale). However, a third identity field is still read live: _identityModelStore.model.onesignalId is accessed inside resolveAlias() at line 332, which runs after the coroutine suspends on the mutex.

Why a user switch can happen during the mutex wait

fetchIAMMutex is a coroutine Mutex, not a blocking lock. When fetchIAMMutex.withLock { ... } cannot immediately acquire the lock, the coroutine suspends, yielding the thread. During that suspension window other coroutines can run — including a user login/logout that replaces _identityModelStore.model with a new user's identity model. When fetchMessages() resumes after the mutex, externalId and jwt are stale (user A), but _identityModelStore.model.onesignalId is live (user B).

Impact per IV mode

IV=true path: resolveAlias(true, externalId, onesignalId) returns ("external_id", externalId)onesignalId is ignored. The backend call uses user A's externalId as the alias and user A's jwt for authentication. This fetches and applies user A's in-app messages while user B is the active user — a cross-user data leak.

IV=false path: resolveAlias(false, externalId, onesignalId) returns ("onesignal_id", onesignalId)externalId is ignored. Here the live read of onesignalId is actually correct: it reflects the current user B, so the fetch targets the right user. This path is benign.

Addressing the refutation

A refutation argued that resolveAlias uses XOR logic, so externalId and onesignalId are never both used in the same URL, meaning there's no "mixed" snapshot. This is correct but beside the point. The problem is not that two fields are mixed for the same URL — it is that the entire fetch (URL + JWT) belongs to user A while user B is active. The XOR logic means we consistently use user A's identity for the request; the inconsistency is between which user is fetched vs. which user is displayed.

Concrete proof

  1. User A (externalId="alice") is logged in; fetchMessages() is called.
  2. Line 304: externalId = "alice". Line 307: jwt = alice_jwt. IV guards pass.
  3. fetchIAMMutex.withLock cannot acquire lock — coroutine suspends.
  4. Concurrent coroutine: user switches to user B (externalId="bob"); _identityModelStore.model is replaced.
  5. Mutex is released; fetchMessages() resumes.
  6. Line 332: resolveAlias(true, "alice", <bob's onesignalId>) → returns ("external_id", "alice").
  7. _backend.listInAppMessages(..., "external_id", "alice", ..., alice_jwt) is called — fetching alice's IAMs, applying them to bob's session.

Suggested fix

Snapshot onesignalId alongside the other two identity fields, before the mutex:

val externalId = _identityModelStore.model.externalId
val onesignalId = _identityModelStore.model.onesignalId  // snapshot here for consistent identity triple
val jwt = externalId?.let { _jwtTokenStore.getJwt(it) }

Then pass the captured onesignalId variable to resolveAlias() instead of the live model read.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment on lines 380 to 403
externalId: String,
jwtBearerToken: String?,
) {
Logging.log(LogLevel.DEBUG, "Calling deprecated login(externalId: $externalId, jwtBearerToken: $jwtBearerToken)")
Logging.log(LogLevel.DEBUG, "Calling deprecated login(externalId: $externalId, jwtBearerToken: ...${jwtBearerToken?.takeLast(8)})")

if (isBackgroundThreadingEnabled) {
waitForInit(operationName = "login")
suspendifyOnIO { loginHelper.login(externalId, jwtBearerToken) }
} else {
if (!isInitialized) {
throw IllegalStateException("Must call 'initWithContext' before 'login'")
}
}

val context = loginHelper.switchUser(externalId, jwtBearerToken) ?: return

if (isBackgroundThreadingEnabled) {
suspendifyOnIO { loginHelper.enqueueLogin(context) }
} else {
Thread {
runBlocking(runtimeIoDispatcher) {
loginHelper.login(externalId, jwtBearerToken)
loginHelper.enqueueLogin(context)
}
}.start()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The refactored non-suspend login() now calls loginHelper.switchUser() synchronously on the caller's thread before spawning the background Thread — exposing jwtTokenStore.putJwt() and userSwitcher.createAndSwitchToNewUser() (both SharedPreferences writes) to the main/UI thread for apps with isBackgroundThreadingEnabled=false (Java, Unity, React Native). The fix is to move switchUser() inside the Thread { runBlocking(runtimeIoDispatcher) { ... } } block alongside enqueueLogin(), restoring the pre-PR behaviour where the entire login operation ran off the calling thread.

Extended reasoning...

What the bug is and how it manifests

Prior to this PR the non-suspend login(externalId, jwtBearerToken) override wrapped its entire body (then a single call to loginHelper.login()) inside Thread { runBlocking(runtimeIoDispatcher) { ... } }.start() when isBackgroundThreadingEnabled was false. This PR split that single helper call into two steps — switchUser() and enqueueLogin() — but only moved enqueueLogin() into the background Thread. switchUser() is now called synchronously at line 393, before the Thread is even constructed, meaning it executes on whatever thread called login().

The specific code path that triggers it

OneSignalImp.login(externalId, jwtBearerToken)          // caller's thread (UI thread in typical apps)
  └─ loginHelper.switchUser(externalId, jwtBearerToken)  // ← still on caller's thread
       ├─ jwtTokenStore.putJwt(externalId, jwtBearerToken)
       │     └─ _prefs.saveString(...)                   // SharedPreferences commit — DISK WRITE
       └─ userSwitcher.createAndSwitchToNewUser(...)
             ├─ identityModelStore.replace(...)           // SharedPreferences commit — DISK WRITE
             ├─ propertiesModelStore.replace(...)         // SharedPreferences commit — DISK WRITE
             └─ subscriptionModelStore.replace(...)       // SharedPreferences commit — DISK WRITE
  Thread { runBlocking(runtimeIoDispatcher) {
      loginHelper.enqueueLogin(context)                  // correctly off-thread
  }}.start()

Why existing code doesn't prevent it

The isBackgroundThreadingEnabled = false branch intentionally skips waitForInit() (which suspends on an IO dispatcher) and jumps straight to line 393. There is no coroutine context switch between the return of the if/else block and the switchUser() call, so no dispatcher redirect occurs. StrictMode will fire a DiskWriteViolation and potentially a DiskReadViolation (SharedPreferences lazy-loads on first access) on every call to this overload from the UI thread.

Impact

Any Java, Unity, or React Native consumer that calls the deprecated synchronous login() from the UI thread — the common pattern shown in OneSignal's own migration guides — will now trigger StrictMode violations that were absent before this PR. Under SharedPreferences contention (e.g., another thread flushing a large prefs file) the commit can block for tens of milliseconds, contributing to jank or, in edge cases, an ANR. This is a regression introduced solely by this PR's refactoring.

How to fix it

Move switchUser() inside the background Thread block so both steps run off the calling thread, matching the original pre-PR behaviour:

} else {
    Thread {
        runBlocking(runtimeIoDispatcher) {
            val context = loginHelper.switchUser(externalId, jwtBearerToken) ?: return@runBlocking
            loginHelper.enqueueLogin(context)
        }
    }.start()
}

Step-by-step proof

  1. A React Native app calls OneSignal.login("user42") on the JS/UI thread after SDK init.
  2. isBackgroundThreadingEnabled is false (default for non-Kotlin hosts).
  3. Execution reaches line 393: loginHelper.switchUser("user42", null) — still on UI thread.
  4. Inside switchUser: jwtTokenStore.putJwt("user42", null) calls SharedPreferences.edit().putString(...).commit() — synchronous disk write on UI thread → StrictMode DiskWriteViolation.
  5. userSwitcher.createAndSwitchToNewUser() performs three more SharedPreferences commits (identity, properties, subscriptions) — three additional violations.
  6. Only after all four disk writes complete does the code reach the Thread { ... }.start() line.
  7. In the old code, step 4–6 were entirely inside the spawned Thread, so the UI thread returned immediately after .start().

Comment on lines 178 to 180
verify(atLeast = 1) { mockUserSwitcher.createAndSwitchToNewUser() }
verify(atLeast = 1) { mockOperationRepo.enqueue(any()) }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The thread-safety test 'logout is thread-safe with synchronized block' uses verify(atLeast = 1) with a relaxed UserSwitcher mock that never updates the identity model, so all 10 concurrent threads pass the early-exit guard and execute the full logout path — meaning the test never actually verifies the single-switch invariant that synchronized(lock) is meant to guarantee. Fix by configuring the mock to set externalId = null after the first createAndSwitchToNewUser() call and asserting exactly = 1.

Extended reasoning...

What the bug is and how it manifests

The test spawns 10 threads all calling logoutHelper.logout(). The intent is to verify that only the first thread executes the full logout path; subsequent threads should hit the early-exit guard (if (identityModelStore.model.externalId == null) return) and do nothing. However, because mockUserSwitcher is constructed as mockk<UserSwitcher>(relaxed = true), the call to createAndSwitchToNewUser() is a no-op — it never touches identityModelStore. As a result, identityModelStore.model.externalId remains "current-user" (non-null) for every thread from start to finish.

The specific code path that triggers it

LogoutHelper.logout() checks identityModelStore.model.externalId == null as its early-exit guard. In production, the real UserSwitcher.createAndSwitchToNewUser() switches the active user and clears the external ID, so only the first thread to acquire the lock proceeds; the rest exit early. In the test, the relaxed mock leaves the external ID untouched, so every one of the 10 threads independently acquires the lock, passes the guard, calls createAndSwitchToNewUser(), and enqueues a LoginUserOperation.

Why existing code doesn't prevent it

The assertions on lines 178–179 use verify(atLeast = 1):

verify(atLeast = 1) { mockUserSwitcher.createAndSwitchToNewUser() }
verify(atLeast = 1) { mockOperationRepo.enqueue(any()) }

This trivially passes whether the methods were called 1 time or 10 times. There is no upper-bound check, so the over-invocation (10 actual calls vs. 1 expected) is invisible to the test framework.

Impact

The test provides false confidence that the synchronized(lock) block enforces the single-switch invariant. A future refactor that accidentally removes the early-exit guard would still pass this test, because verify(atLeast = 1) is satisfied by any non-zero call count. The bug is in test code only; production behavior is unaffected.

How to fix it

Configure the mock to simulate the real side-effect: after createAndSwitchToNewUser() is called, clear the external ID so subsequent threads exit early.

every { mockUserSwitcher.createAndSwitchToNewUser() } answers {
    mockIdentityModelStore.model.externalId = null
}

Then tighten the assertions to exactly = 1:

verify(exactly = 1) { mockUserSwitcher.createAndSwitchToNewUser() }
verify(exactly = 1) { mockOperationRepo.enqueue(any()) }

Step-by-step proof

  1. Test sets up mockIdentityModelStore with externalId = "current-user".
  2. 10 threads are started; each calls logoutHelper.logout().
  3. Thread 1 acquires the lock, reads externalId = "current-user" (non-null), proceeds past the guard.
  4. Thread 1 calls mockUserSwitcher.createAndSwitchToNewUser() — relaxed mock, no-op, externalId still "current-user".
  5. Thread 1 enqueues a LoginUserOperation, releases the lock.
  6. Thread 2 acquires the lock, reads externalId = "current-user" (still non-null), proceeds past the guard — executes the full logout path again.
  7. Steps 5–6 repeat for all 10 threads.
  8. verify(atLeast = 1) passes because createAndSwitchToNewUser and enqueue were each called 10 times.

*/
var useIdentityVerification: Boolean
get() = getBooleanProperty(::useIdentityVerification.name) { false }
var useIdentityVerification: Boolean?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider maintaining an Enum state here as opposed to a boolean that signifies three states?

con.setRequestProperty("OneSignal-Session-Duration", headers.sessionDuration.toString())
}

if (headers?.jwt != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try using the kotlin version of null check here?


val ops = getNextOps(executeBucket)
Logging.debug("processQueueForever:ops:\n$ops")
val queueSnapshot = synchronized(queue) { queue.toList() }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate on what we are doing here?

}
ExecutionResult.FAIL_UNAUTHORIZED, // TODO: Need to provide callback for app to reset JWT. For now, fail with no retry.
ExecutionResult.FAIL_UNAUTHORIZED -> {
val identityVerificationEnabled = _configModelStore.model.useIdentityVerification == true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this as a convenience method in the configModelStore so that its easy enough to refer to it. you could event create a Kotlin extension for it. also it could be testable

}

private fun dispatchJwtInvalidatedToApp(externalId: String) {
_jwtInvalidatedHandler?.let { handler ->
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these handlers are not setup in time, we could silently skip this. should we queue up the invocations and invoke them when the handler is actually set?

*/
private fun shouldSuppressAnonymousOp(op: Operation): Boolean {
if (op is LoginUserOperation) return false
return _configModelStore.model.useIdentityVerification == true && op.externalId == null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is useIdentityVerification set to be volatile? that way has the accurate value in every read?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants