Today, we experienced intermittent service interruption across VitalSource systems. During this incident, users and API integrations encountered slow performance, error messages, and intermittent access issues.
The cause was due to significant request queuing in our core API infrastructure. This request queuing caused latency to some of our critical API endpoints, and this latency was then felt across several of our end-user-facing applications as well as the APIs that power our partner applications. Our engineering team resolved the request queuing issue by adjusting database configurations and Kubernetes cluster settings. This added additional capacity, and we were able to return to normal operations.
All systems are now operating normally. Our engineering team's highest priority is to understand exactly why we saw severe request queuing, and we will continue to monitor performance closely.