Russia recently took the world by surprise when it announced its intention to test the country’s Internet infrastructure against a ‘case of foreign aggression to disconnect the country from the rest of the Internet’. This is apparently in response to a law proposed in late 2018, which seeks to ensure the nation’s Internet infrastructure is able to function with only domestic resources available. There is no firm confirmation of what precise shape this experiment will take, or indeed if it will merely be a paper exercise. However, if the media reports are to be believed, an experiment of this scale will require the participation of major Internet service providers within Russia.
While testing for contingencies is common in our industry, the scale of this potential experiment would have to be massive and warrants additional thought about its potential impact, also for entities outside of Russia. Cyberwar is not a new concept and Russia is far from being the only state to think about its potential consequences. Putting aside the issues of whether such an attack might be carried out by terrorists, cybercriminals or state actors, our social and economic current dependence on the Internet makes the impact of such an event worth analysing. And the key question remains, how can such a test be conducted while limiting risks?
The Internet is designed to be resilient, when one route is blocked, traffic follows another path. Its sheer complexity and scale mean that at any given time, most systems have bugs present that can lead to the failure of a sub-system. These small failures happen continuously and are usually flagged by monitoring systems and then fixed. In the meantime, a backup ensures that service continues, with the odd group of service users facing downtime for a limited duration.
An example here is LOT flight 16, which was forced to make an emergency landing at Warsaw airport when its landing gear failed to deploy via both the primary and backup system. As investigations later showed, the primary system failed because a hydraulic line was severed – a malfunction that was already known to the crew while flying. However, the backup system failed because nobody recognized that the circuit breaker for that system had also been pulled. While the aircraft was damaged beyond repair, all 231 passengers and 11 crew members, remarkably escaped without injury. This incident shows how a small oversight, in combination with another failure can lead to disaster.
Most Internet service providers and data centres test their backup systems regularly. But these tests are usually carried out one a time, and over different systems. The sheer scale of the experiment means that it is in uncharted territory. What will happen if multiple failures occur at the same time? The test alone can tell us that.
Risk of a snowball effect/the ‘thundering herd’ problem
Deliberately failing certain services or, as reports suggest, re-routing and restricting traffic across multiple different networks and service providers at the same time, could dramatically increase the likelihood of latent failures surfacing. It’s possible that these little failures line up and snowball into bigger failures. One can argue that the test might push systems towards failure rather than check resilience.
We also have to consider how operations will be restored to normal. A “thundering herd problem”, is a common failure, where a system that is just getting back online is essentially DDOS-ed by its own users, who all want to login at the same time. So even while the test itself appears successful, it can still go wrong at the very last moment.
Impact on the rest of the internet
This experiment might disrupt services to over a hundred million users. This is largely dependent on the exact implementation and duration. For example, think about how many emails cross in and out of Russia per minute. How big does a server queue have to be to hold on to all those messages that cannot be delivered immediately? And what happens once networks are back online and all those delayed messages start flowing again?
In recent times, we have seen Internet in different parts of the world shut down for varying durations, most recently Zimbabwe. These events do leave traces in our Internet measurement systems, and we will only have a clearer picture of what actually took place after the experiment. If actually carried out on live networks, this experiment will not just test the Russian networks on the Internet. This will be a test for the Internet as a whole, both technically as well as at the governance level. Let us all hope that the Internet’s much-touted resilience holds strong and that even a test on a scale this extensive will not cause any widespread and prolonged outages.
The RIPE NCC recently published the Russia Country Report, analysing the Internet in Russia, as seen from its measurement tools. It focuses primarily on trends in Internet growth over the past five years in terms of Internet number resource holdings, routing, connectivity, IPv6 readiness and other useful metrics for Internet development.
You can check out the full report here.