Internet

Now we know exactly what caused the downfall of Facebook, Instagram and WhatsApp

Now we know exactly what caused the downfall of Facebook, Instagram and WhatsApp

It has been more than 24 hours since the internet faced another apocalypse. For approximately 6 hours, Facebook, Instagram and WhatsApp were completely inaccessible worldwide . Its fall, in addition, caused platforms such as Twitter and Telegram to also present problems – due to the huge number of users who were looking for another means of communication. After the disaster, Facebook has come out to explain exactly what caused the failure in its network.

According to Facebook, the disaster occurred during a routine maintenance session on the “backbone” of its network. They entered a seemingly harmless configuration command and, surprisingly, cut all connections on the backbone , which in turn disconnected the data centers that the company It has spread throughout different parts of the world. However, this was not the only drawback.

Facebook has a system to verify that this type of configuration does not cause failures, but it did not work correctly . “Our systems are designed to audit commands like these to avoid errors like this, but an error in the audit tool did not stop the command correctly,” they mention. Once the spinal column was disconnected, the next to fall was the Border Gateway Protocol (BGP), which we already talked about in Hypertext.

The BGP of Facebook, the great victim

What is BGP? Basically, in a protocol that announces the existence of a network to the internet. If BGP doesn't work, the internet can't find you. This is the reason why, for several hours, there was no trace of Facebook, they disappeared. The company's engineers point out that when the protocol fails to establish a connection with the data centers (because the command caused it to crash previously), the DNS servers disable the BGP advertisement tasks.

Once the BGP cannot fulfill its functions, the DNS follows the same destination . The latter is a system that allows you to access a website from its domain name —facebook.com, for example— instead of entering its IP address. Would you be able to learn the numerical addresses of all the websites that you visit daily? That is why DNS was created, to translate IP addresses into easily recognizable names.

“The end result was that our DNS servers became unreachable even though they were still operational. This made it impossible for the rest of the internet to find our servers,” they add. On the other hand, Facebook confirms that it was necessary to send engineers to solve the problem with manual intervention , since, since its entire network was down, it was not possible to access the configuration remotely.

A learning experience

Another situation that you may have noticed is that Facebook, Instagram and WhatsApp recovered slowly after solving the problem on the network. Why? They clarify it themselves. “We knew that getting our services back on in one go could cause a new round of accidents due to increased traffic. Individual data centers were reporting drops in power use in the tens of megawatts range, and suddenly reversing such a drop in power consumption could put everything , from electrical systems to caches, at risk. “

Facebook concludes its report by mentioning that this experience is a gold mine of learning that will allow them to avoid it in the future. “Every failure like this is an opportunity to learn and improve, and there is a lot to learn from it. After every problem, big or small, we go through an extensive review process to understand how we can make our systems more resilient. That process is already underway. “

Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Most Popular

To Top