Health check errors silently dropped when channel buffer full
None
Vulnerability Details
**Component:** pkg/plugin/plugin.go:153-156, pkg/plugin/plugin_v2.go:156-158
**Affected Version:** aws-encryption-provider @ 4341c70 (all versions)
**Found by:** Source audit
**TLP:** TLP:Amber
---
## Summary
When KMS operations fail, the error is sent to a buffered channel (`healthCheckErrc`, size 100) via a non-blocking send. When the buffer is full, errors are silently dropped. Under sustained KMS failure, the health check goroutine's error state becomes stale, and `/healthz` may report healthy when KMS is actually down.
---
## Vulnerable Code
```go
// pkg/plugin/plugin.go:152-156 (also plugin_v2.go:155-159)
if err != nil {
select {
case p.healthCheck.healthCheckErrc <- err:
default:
// ERROR SILENTLY DROPPED
}
```
The channel is created with a fixed buffer:
```go
// pkg/plugin/shared_health_check.go:34
healthCheckErrc: make(chan error, errcBufSize), // errcBufSize = 100
```
## Root Cause
The non-blocking send pattern is intentional — it prevents Encrypt/Decrypt from blocking when the error channel is full. However, the silent drop means the `SharedHealthCheck.Start()` goroutine (which reads from this channel and calls `RecordErr`) may miss recent errors. If errors arrive faster than the goroutine consumes them (which happens under sustained failure), health state becomes stale.
## Suggested Fix
Replace channel-based error propagation with direct `RecordErr` calls (already used in `Health()` method), or log when errors are dropped:
```go
select {
case p.healthCheck.healthCheckErrc <- err:
default:
p.healthCheck.RecordErr(err) // update state directly when channel full
}
```
## Platform
All platforms. Source audit finding.
## Impact
## Impact
- Under sustained KMS failure (>100 errors queued), new errors are dropped
- Health check timestamp (`lastTs`) isn't updated from dropped errors
- `/healthz` reports stale status, potentially masking ongoing outages
- Kubernetes may not detect the provider is non-functional
- The existing `TestHealthManyRequests` test verifies non-blocking behavior but doesn't verify state correctness after drops
Actions
View on HackerOneReport Stats
- Report ID: 3620761
- State: Closed
- Substate: informative