TTBR Climbing on Prod01 Due to spike in Writes

Incident Report for InfluxDB Cloud

Resolved

The incident has been resolved
Posted Oct 29, 2021 - 22:52 UTC

Update

We are continuing to monitor for any further issues.
Posted Oct 29, 2021 - 21:38 UTC

Monitoring

A fix has been implemented and we are monitoring the results.
Posted Oct 29, 2021 - 21:36 UTC

Update

TTBR for a single partition in the region has been growing uncontrollably. Current over 40 minutes. The impact is that when a user runs a query, any data bound for that partition will not be available. This may appear to them as dataloss, but the data is safe in kafka and will get written.

We suspect that the issue is that a user had their rate limits for writes increased and that user is targeting a single series. We have reduced their rate limits to normal, but this did not solve the problem as expected.
Posted Oct 29, 2021 - 21:18 UTC

Investigating

Recently discovered TTBR climbing on Prod01 due to spike in write. We are currently investigating the issue.
Posted Oct 29, 2021 - 21:10 UTC
This incident affected: Cloud Serverless: Azure, W. Europe (API Queries).