15–16 Oct 2016
The Fairmont Dallas
US/Central timezone

Anycast Latency: How Many Sites Are Enough?

16 Oct 2016, 11:00
30m
Gold (The Fairmont Dallas)

Gold

The Fairmont Dallas

1717 N Akard St Dallas, TX 75201 USA
Standard Presentation Public Workshop Public Workshop: Anycast

Speaker

Mr John Heidemann (USC/Information Sciences Institute)

Description

Today service and content providers often use IP anycast to replicate instances of their services, improving reliability and lowering latency to their users. In IP anycast a single logical address associated to a service is announced from multiple physical locations (*anycast sites*). Anycast then uses BGP routing to divide the Internet into different *catchments*, each associating some users with a nearby anycast site. Ideally users are associated with the topologically closest site, but BGP routing is based on limited information and influenced by policies, making catchment assignment more chaotic and sometimes causing users to match with more distant sites. Prior studies have shown the surprising complexity in the latency and association between anycast services and their users, but have not explored root causes or characterized the relationship between anycast deployment on service quality. This talk will report on our recent study of real-world anycast deployments and how deployment approach affects latency between users and the anycast service [Schmidt16a]. Our target is four Root DNS services (C-, F-, K- and L-Roots); their deployments ranged from 8 to 140 anycast sites which we measure using more than 7,500 RIPE Atlas probes (Vantage Points, or VPs) around the world. (These sizes were as of measurement in 2015 and 2016 and have since grown.) We evaluate the effect of different anycast configurations on the latency of DNS queries, and we believe our results generalize to the use of IP anycast for other applications such as CDNs.

Summary

This talk will evaluate anycast latency. An anycast service uses
multiple sites to provide high availability, capacity and redundancy,
with BGP routing associating users to nearby anycast sites. Routing
defines the catchment of the users that each site serves. Although
prior work has studied how users associate with anycast services
informally, in this paper we examine the key question how many anycast
sites are needed
to provide good latency, and the worst case latencies
that specific deployments see. To answer this question, we must first
define the optimal performance that is possible, then explore how
routing, specific anycast policies, and site location affect
performance. We develop a new method capable of determining optimal
performance and use it to study four real-world anycast services
operated by different organizations: C-, F-, K-, and L-Root, each part
of the Root DNS service. We measure their performance from more than
worldwide vantage points (VPs) in RIPE Atlas. (Given the VPs uneven
geographic distribution, we evaluate and control for potential bias.)
Key results of our study are to show that a few sites can provide
performance nearly as good as many, and that geographic location and
good connectivity have a far stronger effect on latency than having many
nodes. We show how often users see the closest anycast site, and how
strongly routing policy affects site selection.

This presentation will show the results obtained from, to the best of
our knowledge, the first systematic study of the effects of IP anycast
on service latency. To answer our main research question of how many
anycast instances are enough to get good latency
, we addressed the
following more specific questions:

Does anycast give absolute performance? We show that each of the
Root Letters we study provide a very good median performance, with half
of RIPE VPs seeing latency of 40ms or less, and only 10% of vantage
points see latencies of 150ms or higher. We also show that in practice,
median latency of these four roots is quite close even though C-root has
far fewer locations than the the other letters we study.

Do users get the closest anycast instance? We show that latency is
close to optimal for C Root because most VPs are routed to their closest
C instance. With deployments with more anycast sites it becomes harder
to match all VPs to their closest anycast site—more than half of VPs are
routed to sites that are not the closest, although the latency penalty
is usually small (around 15 to 24ms).

How much does the location of each anycast instance affect the latency
it provides to users?
We show how the incremental addition of new
instances reduces median latency to users. We also demonstrate the
importance of location of additional instances.

How much do local routing policies affect performance? Finally, we
examine how routing policies affect latency. We observe K-Root at two
times, when about half of its sites use local routing policies and later
when only one site has that policy. We see that local routing policies
do increase latency to VPs, but somewhat surprisingly, relaxing routing
policies (with more global nodes) does not completely eliminate this
overhead. Observations of K (before and after its policy change) and
F-Root suggest that manual investigation of routing may be required to
identify suboptimal routing.

<span>Acknowledgments. We thank the RIPE Atlas team for helping on
the measurements set up, and the root service operators, particularly C,
F, K, and L Roots for their comments on this work. We thank Benno
Overeinder (NLnet Labs), Cristian Hesselman (SIDN Labs), Duane Wessels
(Verisign), Geoff Huston (APNIC), George Michaelson (APNIC), Jaap
Akkerhuis (NLnet Labs), Paul Vixie (Farsight), and Ray Bellis (ISC) for
their valuable technical feedback. </span>

Talk duration 30 Minutes

Primary authors

Jan Harm Kuipers (U. Twente) Mr John Heidemann (USC/Information Sciences Institute) Dr Ricardo Schmidt (University of Twente)

Presentation materials