A guy from support added me in a ticket about an issue with a customer. Their API calls were failing. Some issue with incorrect API keys.

I was out of office at the time and the only other guy that could help was fired last week.

Until I came back, dozen messages arrived in my mail and chat:

  • “The main issue is db replication”
  • “This is a small data inconsistency issue”
  • “We have seen this again in the past three times”
  • “Just delete and recreate the replicas in other regions”
  • “Customer escalated”
  • “Please do it asap”

Well, I will certainly do NOT delete a set of production dbs, just because someone says so. Especially in this case which their setup was completely unknown to me.

I asked for evidence. Logs, dashboards, RCAs of the issues. Nothing.

Started checking myself. A dev guy helped me reproduce it and check logs from DB.

The error was something like: “Duplicate key xxxxxx in yyyyyy when executing query: INSERT INTO db ….”

Hmmm. This was from the replica. An INSERT query in the replica.

Check again. Oh, we have read AND write replicas.

Across regions.

Meh.