Skip to content

perf: optimize telemetry initialization and reduce startup overhead#3620

Open
eablack wants to merge 13 commits intomainfrom
eb/overhaul-analytics-collection
Open

perf: optimize telemetry initialization and reduce startup overhead#3620
eablack wants to merge 13 commits intomainfrom
eb/overhaul-analytics-collection

Conversation

@eablack
Copy link
Copy Markdown
Contributor

@eablack eablack commented Mar 26, 2026

Summary

This PR significantly improves CLI performance by optimizing telemetry collection and reducing startup overhead. Testing shows 83% faster startup (2.7s → 0.43s for ./bin/run version). The changes eliminate blocking operations through a background worker process, implement lazy initialization, and refactor the telemetry codebase into modular, well-tested components with full TypeScript type safety.

Type of Change

Patch Updates (patch semver update)

  • perf: Performance optimization
  • refactor: Refactoring telemetry code into modular architecture
  • test: Added 32 comprehensive unit tests
  • feat: Added Sentry error reporting via finally hook

Performance Improvements

Before: ~0.85s
After initial optimizations: ~0.608s
After worker process: ~0.43s

Key Optimizations:

  1. Removed upfront instrumentation initialization - Deferred OpenTelemetry setup until telemetry is actually sent
  2. Implemented lazy Sentry initialization - Only initialize when errors need to be reported
  3. Removed empty instrumentation registration - Eliminated unnecessary require-in-the-middle overhead
  4. Background worker process - Spawn detached process for non-blocking telemetry collection
  5. Cached auth token retrieval - Avoid repeated config/netrc reads

Code Quality Improvements

Modular Architecture

Split monolithic global-telemetry.ts (357 lines) into focused modules:

  • telemetry-utils.ts - Shared utilities, types, and helpers
  • honeycomb-client.ts - OpenTelemetry/Honeycomb integration
  • sentry-client.ts - Sentry error reporting
  • global-telemetry.ts - Thin orchestrator (now 123 lines)
  • worker-client.ts - Background worker process management

Refactored bin/run.js from 116 lines to 34 lines (70% reduction).

Test Coverage

Added 32 passing tests across 4 test files:

  • test/unit/analytics-telemetry/telemetry-utils.unit.test.ts (12 tests)
  • test/unit/analytics-telemetry/honeycomb-client.unit.test.ts (6 tests)
  • test/unit/analytics-telemetry/sentry-client.unit.test.ts (5 tests)
  • test/unit/analytics-telemetry/global-telemetry.unit.test.ts (9 tests)

New Features

Sentry Error Reporting via Finally Hook

  • Added oclif finally hook to report command errors to Sentry
  • Filters out 4xx client errors (user errors, not bugs)
  • Filters out command_not_found errors (user typos)
  • Only reports 5xx server errors and internal CLI exceptions
  • Sends errors to both Honeycomb and Sentry

Telemetry Worker Process

  • Created telemetry-worker.ts that runs as detached background process
  • Main CLI exits immediately without waiting for telemetry HTTP requests
  • Worker process inherits stderr for debug visibility
  • Properly serializes/deserializes Error objects across process boundary

Enhanced Debug Output

  • Added DEBUG=analytics-telemetry support throughout
  • Shows full payload data before sending to Honeycomb/Sentry
  • Logs all major telemetry operations (initialization, span creation, flushing)

File Structure

New Organization (src/lib/analytics-telemetry/):

  • telemetry-utils.ts - Utilities and type definitions
  • honeycomb-client.ts - OpenTelemetry implementation
  • sentry-client.ts - Sentry integration
  • global-telemetry.ts - Public API orchestrator
  • worker-client.ts - Worker process spawning
  • telemetry-worker.ts - Worker process entry point

Modified Files:

  • bin/run.js - Simplified to 34 lines, extracts telemetry setup
  • src/hooks/finally/sentry.ts - Error reporting hook with filtering
  • src/hooks/prerun/analytics.ts - Fixed duplicate telemetry setup
  • src/hooks/postrun/performance_analytics.ts - Added telemetry check
  • src/hooks/init/performance_analytics.ts - Added telemetry check
  • src/hooks/command_not_found/performance_analytics.ts - Added telemetry check
  • package.json - Added finally hook registration

Test Files:

  • test/unit/analytics-telemetry/*.unit.test.ts - 4 test files with 32 tests

What Gets Reported to Sentry

✅ Reported (actual bugs):

  • 5xx server errors (API/backend failures)
  • Internal CLI exceptions (uncaught errors, null pointers, type errors)
  • Network errors (timeouts, connection failures)
  • File system errors (ENOENT, permission errors)
  • Signal interruptions (SIGINT/SIGTERM)

❌ Filtered out (user errors):

  • 4xx HTTP errors (bad request, auth failures, not found, validation errors)
  • Command not found errors

Testing

Notes:

  • Windows telemetry remains disabled by default (can enable with ENABLE_WINDOWS_TELEMETRY=true)
  • All telemetry operations fail silently to ensure user experience is never degraded
  • Worker process runs detached and unref'd so it doesn't block CLI exit

Steps:

  1. Performance testing: ./bin/run version (10 runs) - verified 83% improvement
  2. Debug output: DEBUG=analytics-telemetry ./bin/run version - verified telemetry payload logging
  3. Error filtering: DEBUG=analytics-telemetry ./bin/run apps:info fake-app - verified 4xx errors filtered correctly
  4. Unit tests: All 32 unit tests passing
  5. Worker process: Verified background telemetry collection with manual testing
  6. Error serialization: Verified Error object properties preserved across process boundary
  7. Integration: Tested with various commands to ensure no regressions

Screenshots (if applicable)

N/A - Performance improvement, no UI changes

Related Issues

GUS work item: https://gus.lightning.force.com/lightning/r/ADM_Work__c/a07EE00002X1ZIBYA3/view

@eablack eablack requested a review from a team as a code owner March 26, 2026 20:10
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 20:49 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 20:49 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 20:49 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 20:49 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:03 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:03 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:03 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:03 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 21:31 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:00 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:00 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:00 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:00 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:35 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:35 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:35 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:35 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 22:44 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:34 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:37 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:37 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:37 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 26, 2026 23:37 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@michaelmalave michaelmalave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment on these overall welcome changes! Nice.

eablack added 12 commits March 27, 2026 10:47
This commit significantly improves CLI startup performance by optimizing
how OpenTelemetry and Sentry telemetry is initialized and sent.

Key improvements:
- Remove upfront initializeInstrumentation() call to eliminate 28,000+
  require-in-the-middle hook registrations on every command
- Implement lazy initialization of OpenTelemetry during exit handler
- Separate Sentry initialization to only load when errors occur
- Cache auth token to avoid recreating Config/APIClient instances
- Remove empty instrumentation registration that added overhead
- Consolidate platform checks into isTelemetryEnabled() utility
- Fix duplicate setupTelemetry() calls that overwrote timing data
- Batch file existence checks in analytics using Promise.all
- Remove blocking await in beforeExit handler
- Add analytics-telemetry debug scope for troubleshooting

Performance results:
- 17-21% faster startup time (0.770s -> 0.608s)
- Zero require-in-the-middle overhead (28,003 lines -> 0)
- HTTP calls still made successfully to Honeycomb/Sentry
- All telemetry data identical to production build

The telemetry functionality remains fully intact - we've only optimized
when and how the infrastructure is initialized.
- Created finally hook that sends command errors to Sentry and Honeycomb
- Filters out 4xx client errors (user errors) to reduce noise
- Only reports 5xx server errors and internal CLI exceptions
- Restored missing sentryClient variable in global_telemetry.ts
Command not found errors are user typos, not bugs. Filter them out by:
- Matching the error message pattern
- Checking for exit code 127 (standard "command not found" code)
Implemented background worker process for telemetry to eliminate blocking:
- Created telemetry_worker.ts that handles all OpenTelemetry/Sentry initialization
- Spawn detached worker process via stdin for data transfer
- Main CLI exits immediately without waiting for HTTP requests
- Worker inherits stderr for DEBUG=analytics-telemetry visibility

Performance improvement: ~25% faster (0.608s → 0.45s)

Also enhanced debug output:
- Added payload logging for Honeycomb and Sentry
- Shows full telemetry data structure before sending
- Helps verify data and troubleshoot issues
Organized telemetry code into its own directory to avoid confusion
with src/commands/telemetry:
- Moved global_telemetry.ts to lib/analytics-telemetry/global-telemetry.ts
- Moved telemetry_worker.ts to lib/analytics-telemetry/telemetry-worker.ts
- Renamed files to use hyphens instead of underscores
- Updated all import paths across hooks, bin/run.js, and tests
- Fixed process.exit linter rule for telemetry-worker.ts
Split monolithic global-telemetry.ts (357 lines) into focused modules:

- telemetry-utils.ts: Shared utilities, types, and helpers
- honeycomb-client.ts: OpenTelemetry/Honeycomb integration
- sentry-client.ts: Sentry error reporting
- global-telemetry.ts: Thin orchestrator (123 lines)
- worker-client.ts: Background worker process management

Refactored bin/run.js from 116 to 34 lines by extracting telemetry
setup and signal handlers into worker-client.ts.

Added comprehensive test coverage (32 tests):
- test/unit/analytics-telemetry/telemetry-utils.unit.test.ts (12 tests)
- test/unit/analytics-telemetry/honeycomb-client.unit.test.ts (6 tests)
- test/unit/analytics-telemetry/sentry-client.unit.test.ts (5 tests)
- test/unit/analytics-telemetry/global-telemetry.unit.test.ts (9 tests)

Benefits:
- Single responsibility per module
- Easier to test in isolation
- Easier to maintain and understand
- Public API unchanged (backward compatible)
The tests have been replaced with new modular tests in test/unit/analytics-telemetry/
Improvements:
- Extended CLIError interface with all error properties (code, statusCode, http, oclif)
- Added TelemetryData union type for Telemetry | CLIError
- Added TelemetryGlobal interface for global.cliTelemetry
- Added TelemetryOptions interface for hook options
- Used Config type from @oclif/core/interfaces instead of 'any'
- Replaced all '(data as any)' casts with proper CLIError type
- Replaced all '(global as any)' casts with proper global typing
- Added proper declare global block in worker-client.ts

All 32 tests passing with improved type safety.
The changes to analytics.ts (reformatting and Promise.all optimization)
were unintentional and not relevant to this telemetry refactoring PR.

Reverted to keep the PR focused on telemetry-specific improvements.
Add windowsHide: true option to spawn() calls to prevent console windows
from appearing on Windows when telemetry is explicitly enabled.

This ensures a better user experience for Windows users who enable
telemetry with ENABLE_WINDOWS_TELEMETRY=true.

Changes:
- Added windowsHide: true to worker-client.ts spawn options
- Added windowsHide: true to finally hook spawn options
- Prevents console window flash on Windows
- No impact on Unix/macOS behavior
Remove all re-exports from global-telemetry.ts and have consumers import
utilities directly from telemetry-utils.ts where appropriate.

Changes:
- Removed re-exports of getProcessor, initializeInstrumentation
- Removed re-exports of ensureSentryInitialized
- Removed re-exports of computeDuration, isTelemetryEnabled
- Removed re-exports of types (CLIError, Telemetry, TelemetryGlobal)

Updated imports:
- bin/run.js: Import computeDuration from telemetry-utils
- All hooks: Import isTelemetryEnabled from telemetry-utils
- Hooks still import orchestrator functions from global-telemetry
  (setupTelemetry, reportCmdNotFound)

Updated tests:
- Removed duplicate computeDuration and isTelemetryEnabled tests from
  global-telemetry.unit.test.ts (already tested in telemetry-utils.unit.test.ts)
- Removed unused sinon imports
- Test file now only tests orchestrator functions

Windows compatibility:
- Added windowsHide: true to sentry.ts spawn call to prevent console
  windows on Windows

Benefits:
- Clearer API surface - global-telemetry only exports what it owns
- No unnecessary indirection for utility functions
- Makes it obvious which functions are orchestrators vs utilities
- Better test organization - tests colocated with implementations
- Reduces coupling between modules

The public API of global-telemetry now consists only of:
- setupTelemetry()
- reportCmdNotFound()
- sendTelemetry()
@eablack eablack force-pushed the eb/overhaul-analytics-collection branch from c2739b3 to 5edc9c9 Compare March 27, 2026 17:47
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 17:47 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 17:47 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 17:47 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 17:47 — with GitHub Actions Inactive
Move serializeTelemetryData and spawnTelemetryWorker functions from
worker-client.ts and sentry.ts to telemetry-utils.ts to eliminate
duplication. Keep isUserError in sentry.ts since it's only used there
for Sentry-specific error filtering.
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 18:01 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 18:01 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 18:01 — with GitHub Actions Inactive
@eablack eablack temporarily deployed to AcceptanceTests March 27, 2026 18:01 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@michaelmalave michaelmalave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates and overall implementation looks good. Nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants