This spring, services from heavy hitters like Google and Facebook seemed glitchy or inaccessible for people worldwide for more than an hour. But it wasn’t a hack, or even a glitch at any one organization. It was the latest mishap to stem from design weaknesses in the “Border Gateway Protocol,” the internet’s foundational, universal routing system. Now, after years of slow progress implementing improvements and safeguards, a coalition of internet infrastructure partners is finally turning a corner in its fight to make BGP more secure.
Today the group known as Mutually Agreed Norms for Routing Security is announcing a task force specifically dedicated to helping “content delivery networks” and other cloud services adopt the filters and cryptographic checks needed to harden BGP. In some ways the step is incremental, given that MANRS has already formed task forces for network operators and what are known as “internet exchange points,” the physical hardware infrastructure where internet service providers and CDNs hand off data to each others’ networks. But that process coming to the cloud represents tangible progress that has been elusive up until now.
“With nearly 600 total participants in MANRS so far, we believe the enthusiasm and hard work of the CDN and cloud providers will encourage other network operators around the globe to improve routing security for us all,” says Aftab Siddiqui, the MANRS project lead and a senior manager of internet technology at the Internet Society.
BGP is often likened to a GPS navigation service for the internet, enabling infrastructure players to swiftly and automatically determine routes for sending and receiving data across the complex digital topography. And like your favorite GPS mapping tool, BGP has quirks and flaws that don’t usually cause problems, but can occasionally land you in major bridge traffic. This happens when entities like internet service providers “advertise a bad route,” sending data on a haphazard, ill-advised journey across the internet and often into oblivion. That’s when web services start to seem like they’re down. And the risks from this BGP insecurity don’t end with service disruptions—the weaknesses can also be exploited intentionally by bad actors to reroute data over networks they control for interception. This practice is known as “BGP hijacking” and has been used by hackers around the world, including by China, for espionage and data theft.
A handful of prominent CDNs have already been vocal about implementing BGP best practices and safeguards in their own systems and promoting them to others. After the so-called route leak in April, for example, Cloudflare launched a tool called “Is BGP Safe Yet?” to give regular web users insight into whether their internet service provider has implemented cryptographic route checks and filters yet. And on Wednesday, Google published an update on its efforts with MANRS to overhaul its own BGP infrastructure and convince industry contacts to do the same.
Organizations like Google and Cloudflare are increasingly motivated to back this change for the overall health of the internet, but also because BGP route leaks that result in outages reflect poorly on them regardless of where the issue actually originates. Those sorts of major organizations are key to driving adoption of these types of voluntary, cooperative technical changes, because they have relationships with infrastructure providers around the world.
“I spent 20 years in financial services doing cybersecurity for big banks, but a little over two years ago I joined Google, because you start to see that the societal dependence on this infrastructure is so great,” says Royal Hansen, vice president of security engineering for Google Cloud. “My leverage was going to be so much bigger at a Google than it would ever be in one enterprise.”
One of the main BGP safeguards MANRS promotes is RPKI, or “Routing Public Key Infrastructure,” a public database of routes that have been cryptographically signed as a testament of their validity. RPKI adoptees publish the routes they offer and check the database to confirm others’ routes, but the system can only eliminate route leaks and outages through universal adoption. If lots of ISPs or other organizations aren’t using it, providers will still need to accept unsigned, meaning unvalidated, routes.