//
you're reading...
Uncategorized

Tellybug off air last night

At about 18:45 last night the Tellybug service went off-air, and it was brought back up about 03:45 this morning. We’re sorry this happened, and want to apologise to everyone who couldn’t use Tellybug last night.

The underlying cause of the problem was a lightning strike on the Amazon Web Services data centre in Ireland where our servers live. This took out one of the availability zones in the data centre, impacting our database server.

Since without the database nothing works, our database is configured in “Multi-AZ” mode, which should automagically fail-over to another AZ if the primary AZ goes *pooof*. So we should have had about a 3 minute outage while the failover happened, and all would have been well.

Unfortunately, for reasons that AWS are still investigating, our database didn’t fail over successfully. That meant we were offline until Amazon could bring the AZ and the database back up.

In summary: we failed because lightning struck, and (Amazon’s) failsafe software designed to handle the failure failed.

We’ll keep looking for ways to make our service more reliable, and sorry again for the inconvenience.

Discussion

No comments yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 198 other followers