This is a technical report that cover three things:
- Evaluation of using TCP to measure DNS client latencies to auth servers
- Use of DNS/TCP RTT to engineer anycast, and fix problems such as Anycast Polarization . We use it to improve latency between Google and SIDN's Anycast AS, reducing Google's latency to SIDN from 110ms to 20ms.
- We show anteater, a real-time monitoring system that evaluates auth servers and clients, and notifies SIDN OPs of latency problems.
All of this is using only passive DNS data , so no extra active measurements are required. And it measures latency from real clients.
Below we show the paper abstract, and include the original paper.
An earlier version of this paper can be found at: https://ant.isi.edu/bib/Moura20a.html
DNS latency is a concern for many service operators: CDNs exist to reduce
service latency to end-users, but must rely on global DNS for reachability
and load-balancing. Today, DNS latency is monitored by active probing
from distributed platforms like RIPE Atlas, or centralized Verfploeter.
While Atlas coverage is wide, its 10k sites see only a
fraction of the Internet, and Verfploeter coverage in IPv6 is limited.
In this paper we show that passive observation of TCP handshakes can
measure live DNS latency, continuously, providing good coverage of
current service clients. Estimating RTT from TCP is an old idea,
but applying this approach to DNS has never before been considered.
We show that there is sufficient TCP DNS traffic today to provide good
operational coverage (particularly of IPv6), and very good temporal
coverage (better than existing approaches), enabling near-real time
evaluation of DNS latency. We quantify coverage and show that estimates
of DNS latency from TCP is consistent with UDP latency. Our approach
finds previously unknown, real problems: DNS polarization is a
new problem where a hypergiant sends global traffic to one anycast
site rather than taking advantage of the global anycast deployment.
Correcting polarization in Google DNS cut its latency from 100\,ms
to 10\,ms; correcting polarization from Microsoft cut Azure latency
from 90\,ms to 20\,ms. Finally, real-time use of our approach for
a European country-level domain has helped detect and correct a BGP
routing misconfiguration that detoured European traffic to Australia.
We incorporated our approach into \Entrada, our open source data warehouse
for DNS\@. Our monitoring tool (\ants) has been operational for the
last 9 months on this country-level top-level domain.