Health check errors silently dropped when channel buffer full

Disclosed: 2026-04-07 16:48:23 By misop00p To aws_vdp
None
Vulnerability Details
**Component:** pkg/plugin/plugin.go:153-156, pkg/plugin/plugin_v2.go:156-158 **Affected Version:** aws-encryption-provider @ 4341c70 (all versions) **Found by:** Source audit **TLP:** TLP:Amber --- ## Summary When KMS operations fail, the error is sent to a buffered channel (`healthCheckErrc`, size 100) via a non-blocking send. When the buffer is full, errors are silently dropped. Under sustained KMS failure, the health check goroutine's error state becomes stale, and `/healthz` may report healthy when KMS is actually down. --- ## Vulnerable Code ```go // pkg/plugin/plugin.go:152-156 (also plugin_v2.go:155-159) if err != nil { select { case p.healthCheck.healthCheckErrc <- err: default: // ERROR SILENTLY DROPPED } ``` The channel is created with a fixed buffer: ```go // pkg/plugin/shared_health_check.go:34 healthCheckErrc: make(chan error, errcBufSize), // errcBufSize = 100 ``` ## Root Cause The non-blocking send pattern is intentional — it prevents Encrypt/Decrypt from blocking when the error channel is full. However, the silent drop means the `SharedHealthCheck.Start()` goroutine (which reads from this channel and calls `RecordErr`) may miss recent errors. If errors arrive faster than the goroutine consumes them (which happens under sustained failure), health state becomes stale. ## Suggested Fix Replace channel-based error propagation with direct `RecordErr` calls (already used in `Health()` method), or log when errors are dropped: ```go select { case p.healthCheck.healthCheckErrc <- err: default: p.healthCheck.RecordErr(err) // update state directly when channel full } ``` ## Platform All platforms. Source audit finding. ## Impact ## Impact - Under sustained KMS failure (>100 errors queued), new errors are dropped - Health check timestamp (`lastTs`) isn't updated from dropped errors - `/healthz` reports stale status, potentially masking ongoing outages - Kubernetes may not detect the provider is non-functional - The existing `TestHealthManyRequests` test verifies non-blocking behavior but doesn't verify state correctness after drops
Actions
View on HackerOne
Report Stats
  • Report ID: 3620761
  • State: Closed
  • Substate: informative
Share this report