I don’t know how many of you use Facebook. I use it for sending messages and keeping up with what my friends and family are up to – but I don’t actually spend that much time on it. You probably know that they had some problems last week, though.
Last Monday evening, the whole of Facebook, Messenger, Instagram and WhatsApp went down – world wide – for six hours! And they were having more problems on Friday. Not as big this time, but people were struggling with Instagram and Messenger for a couple of hours before it got sorted.
Now, Facebook is a huge operation. They’ve got tens of thousands of miles worth of cables connecting computers and data centres all over the world, and more users than I can even begin to imagine! It’s not surprising that they sometimes have issues (honestly, I have enough problems with my printer!). What was surprising was just how big the problem was and how long it took them to fix it.
Interestingly, they’ve been very open and honest about what went wrong (unusual for the big tech companies) and they’ve written a detailed article explaining what tripped them up – you can read it here if you like, but it does get quite technical in places.
Without going into all the details, it was an unfortunate mixture of human error and a programming glitch that caused the original problem. But it took such a long time to fix because of all the security measures Facebook have in place.
With the whole network down and no access to the servers from the internet, engineers had to be sent in person to the main data centres to fix the problem. But you can’t just walk into somewhere like that (they didn’t describe how you do get in – for obvious reasons! – but I get the impression it wasn’t quick). And even once they’d physically got into the building, there are layers and layers of protections on the servers themselves to stop anyone tampering with them.
The upshot of which was a six-hour outage.
They haven’t explained exactly what the issue was on Friday, but it sounds to me like them putting right various problems triggered by the Monday crisis.
There are some interesting questions to come out of this whole palaver though.
When security measures are so strong that they stop you from fixing your own system, have they gone too far? Facebook think not. In that article I was telling you about earlier, they say “I believe a tradeoff like this is worth it – greatly increased day-to-day security vs. a slower recovery from a hopefully rare event like this.”
Also, are we all just a bit too dependent on Facebook and the like for our own good? There was so much wailing and gnashing of teeth over what was, after all, just six hours of downtime. Twitter nearly crashed as well as huge numbers of people switched from Facebook to Twitter that evening!
If it happens again, could I maybe suggest reading a book? Or watching the telly? Or if you need to get in touch with someone, picking up the phone?