DNS-OARC went to Austin, TX for its 31st Workshop!
DNS-OARC is a non-profit, membership organization that seeks to improve the security, stability, and understanding of the Internet's DNS infrastructure. Part of these aims are achieved through workshops.
DNS-OARC Workshops are open to OARC members and to all other parties interested in DNS operations and research, with attendees from NANOG 77 and ARIN 44 particularly welcome this time around - as OARC 31 takes place in the same venue, right after NANOG77 and in parallel with ARIN44.
Annual Workshop Patrons for 2019 are available. Details at: https://www.dns-oarc.net/workshop/patronage-opportunities
Sponsorship opportunities for OARC 31 are available. Details at: https://www.dns-oarc.net/workshop/sponsorship-opportunities
Video:
Jabber: xmpp:dns-operations@conference.dns-oarc.net
Twitter hashtag: #OARC31
Sponsors: We have various sponsor opportunities for OARC workshops.
If your organization is interested in sponsoring OARC workshops, please e-mail sponsor@dns-oarc.net for more information.
Introduction to DNS-OARC and welcome for first-time attendees from President, Keith Mitchell.
At Facebook, we had been leveraging tinydns server since 2012 when we started owning our DNS Load Balancer. Tinydns is simple, stable, resource efficient, pragmatic as well as opinionated. As much as those are its strength, evolving/modifying our DNS stack was limited, engineers that would feel comfortable and confident supporting/changing the code and deploying changes could be counted on one hand in a mitten.
Early 2018, we started playing with an alternative solution. Having a PoC in February 2018, we started rolling our software (FBDNS) in production in early April 2018 and reached full production deployment across 100% of our “b” nameservers a month later.
FBDNS has been running on 100% of our production for the last year and has enabled us, amongst other things, to easily add support for DNS over TCP, DoT, better instrument the internals of our software, leverage unittests to prevent regressions and validate business logic, and increase the amount of engineers that are able to contribute and support our nameserver software.
This talk will cover the journey from project inception and decisions that were made, going from a proof of concept to implementing feature parity with tinydns and starting to incrementally deploy to production until finally building trust to run FBDNS on 100% of our infrastructure. We will also cover some of the mechanism we used to guarantee the accuracy of fbdns as well some of the issues that we encountered along the way.
This talk will provide an overall introduction to Alibaba Cloud DNS, the largest DNS service provider in Asia. It will cover a status of our services, technical architecture, challenges, and roadmap. In particular, it will include our operational experience with security, IPv6, VPC DNS, and plans for DNSSEC and DNS Flag Day.
Alibaba is a new DNS-OARC member. We think the audience will be interested in the experience and challenges we have as a large cloud DNS operator. And hopefully, we can identify some technical issues that OARC members are interested in jointly working together in the future.
We are creating a system that will allow domain owners to protect their domain name resources by making record changes available for the domain owners and other interested parties to verify. We are calling this service "DNS Transparency" and we want to work with all companies and stakeholders to enable a more transparent naming infrastructure for the future!
This talk would cover the current status of the project and be a chance to get feedback from the community about the system and what features would be useful to OARC.
Propose Speakers: Allison Mankin and Han Zhang
Salesforce
amankin@salesforce.com, hzhang@salesforce.com
Abstract:
A large enterprise greatly valuing customer trust may decide to deploy DNSSEC on its domains. However, thanks to the camel that is DNS [0], there are many obstacles on the way there, including obstacles not specific to DNSSEC, and obstacles that are heightened because of deploying DNSSEC. This is especially the case if that enterprise has large zones with frequently changing records. In this presentation, we share some experiences and lessons from deploying DNSSEC on our very large live production zones, hoping these help other organizations when they take this road.
Challenges during Preparation:
Our preparation included comparing services and features of multiple managed DNS providers (a topic we presented in [1]). We also conducted DNSSEC functional and performance testing against those providers, simulating workloads of our production zones. We have a requirement to use multiple providers for resilience, and our preparation led us to the insight that multi-provider DNSSEC models are needed [2].
Because our enterprise constantly creates, modifies and deletes domains and records, using DNS in fairly expansive ways, one of the greatest challenges during preparation was to find vendors to handle monster and mega-monster-sized dynamic zones, where monster refers to O(1M), and mega-monster to O(10M). These large zones also have 5-6 figures of changes per day. We learned how vendors manage signed versions of zones with these properties. We will share some information about tradeoffs.
Throughout this talk, we will not identify vendors by name.
Challenges during Deployment:
Because of the feature gaps, our DNSSEC deployment included migrating live zones to new vendors, which had to be done without causing any downtime for the customers and internal applications depending on these DNS data. We learned that the move phase and the DNSSEC-enabling phase needed to be well-separated.
Specifically, the challenges of seamlessly migrating a zone include i) ensure that there are no assumptions in applications that will be impacted by a vendor switch (and be ready to roll back test migrations and wait for fixes); ii) migrate a zone to be provisioned in an active-active but be prepared to handle some inconsistencies because vendors vary in how they bootstrap a zone, so there will be period in which the new zone does not receive updates; iii) be prepared to handle issues around delegation and sub-zones, one example is if you have delegated a sub-zone at a vendor but not the parent zone, but the parent zone is being updated there in preparation for a move, the vendor may answer for it even though the NS for that parent zone hasn’t changed. We will present both versions of this and their implication about the publishing DS stage of deployment; iv) be prepared for surprises by being able to roll back quickly; v) in contrast to that, be aware of the impact of the 48 hour COM TTL, which we will report about (it wasn’t as high impact as feared). Besides the zone migration, we will also talk about other challenges, such as digging out the non-standard and dynamic records (such as GLSB).
Avoiding Hazards after Deployment:
Our testing made us aware that very large signed zones could create enormous surges of resigning traffic, which would not necessarily be handled well by XFR. This isn’t a new observation, but it had large impact on us, and we worked with stack vendors to increase the skew of resigning to avoid a crisis further down the road. This is one example and the talk may present some other hazards where the camel and the trip could be blocked.
We are monitoring our DNSSEC zones for multiple hazards going forward. Because DNSSEC introduces more data for XFRs, the first thing we monitor is whether zones are synced among the multiple name servers (in the multiple providers) and/or have significant propagation lags introduced. Zone propagation lags have improved for us because of the extensive review, and the migrations with which we prepared for DNSSEC. Another thing we monitor is DNSSEC correctness of the zones, which we have done by using dnsviz [3] to produce data for a dashboard and trigger pages when any errors are found. The DNSSEC change is not complete, so we will not cover customer experience testing and responses.
Overall, the road to DNSSEC had surprises but was travel-worth. This talk offers our experiences, as a first pass on best practices for large enterprises. We will conclude with thoughts on the cadence of DNSSEC changes.
References:
[1] DNSSEC for a Complex Enterprise Network https://indico.dns-oarc.net/event/28/contributions/523/
[2] Multi-Signer DNSSEC Models https://indico.dns-oarc.net/event/31/contributions/683/ (https://indico.dns-oarc.net/event/31/contributions/683/attachments/667/1096/multi-signer.pdf)
[3] dnsviz: http://dnsviz.net (http://dnsviz.net/)
A talk about trying to implement DoH and DoT at a large ISP. Discussion may include:
DENIC wants to be ready for future business models and additionally wants to improve the operation exellence of their processes. Because of that DENIC decided to speedup our zone propagation times to a duration of a few minutes or faster from registration of a domain to serve it at a nameserver location. This should be a big enhancement of user experience for the Registrars and domain owners. To reach this goal, we developed a totally renewed signing cluster with the following requirements:
To fullfill this requirements we created a signing cluster based on Kubernetes, dynamic DNS updates and KNOTdns as signing software.
During the development there was a need to discuss several core questions again like:
.. and many more..
During our journey to this fast cluster we had a lot of challenges to master and we found out again how great our DNS community is and what we can reach all together if we share informations and work together. With this presentation we will give this interesting experiences back to the community.
Lunch Break (lunch is provided)
Essentially all Internet communication relies on the Domain Name System (DNS), which first maps a human-readable Internet destination or service to an IP address before two endpoints establish a connection to exchange data.
Today, most DNS queries and responses are transmitted in cleartext, making them vulnerable to eavesdroppers and traffic analysis.
Past work has demonstrated that DNS queries can reveal everything from browsing activity to user activity in a smart home.
To mitigate some of these privacy risks, two new protocols have been proposed: DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). Rather than sending queries and responses as cleartext, these protocols establish encrypted tunnels between clients and resolvers.
This fundamental architectural change has implications for the performance of DNS, as well as for content delivery.
We measure the effect of DoH and DoT on name resolution performance and content delivery.
We find that although DoH and DoT response times can be higher than for conventional DNS (Do53), DoT can perform better than both protocols in terms of page load times, and DoH can at best perform indistinguishably from Do53.
However, when network conditions degrade, webpages load quickest with Do53, with a median of almost 0.5 seconds faster compared to DoH.
Furthermore, in a substantial amount of cases, a webpage may not load at all with DoH, while it loads successfully with DoT and Do53.
Our in-depth analysis reveals various opportunities to readily improve DNS performance, for example through opportunistic partial responses and wire format caching.
When using a publicly available DNS-over-HTTPS (DoH) server, some
clients may suffer poor performance when the authoritative DNS server
is located far from the DoH server. For example, a publicly
available DoH server provided by a Content Delivery Network (CDN)
should be able to resolve names hosted by that CDN with good
performance but might take longer to resolve names provided by other
CDNs, or might provide suboptimal results if that CDN is using DNS-
based load balancing and returns different address records depending
or where the DNS query originated from.
In this talk, we will propose a new HTTP header intended to
lessen these issues by allowing the web server to indicate to the
client which DoH server can best resolve its addresses. This proposal
defines an HTTP header field that enables web host operators
to inform user agents of the preferred DoH servers to use for
subsequent DNS lookups for the host's domain. This presentation is
based on an IETF Internet-Draft presented at IETF 105.
DNS over TLS (DoT) has been gaining attention, primarily as a means of communication between stub resolver and recursive resolver. There have also been discussions and experiments involving the use of DoT to communicate with authoritative nameservers (Authoritative DNS over TLS or “ADoT”), including communication between recursive and authoritative resolvers with a focus on the lower levels of the DNS hierarchy.
In this presentation, we will discuss operational concerns that need to be addressed prior to ADoT’s deployment at scale by DNS operators in order to maintain the stability and resilience of the global DNS. This presentation will also discuss suggested next steps to advance the operator community’s understanding of ADoT’s operational impact.
DNS packets are designed to travel in unencrypted form through the Internet based on its initial standard. However, recent studies show that adversaries are actively exploiting this design vulnerability to compromise Internet users' security and privacy. To mitigate such threats, several protocols have been proposed to encrypt DNS queries between DNS clients and recursive servers, which we jointly term as DNS-over-Encryption. Particularly, two prominent protocols, DNS-over-TLS and DNS-over-HTTPS, have been standardized by IETF and gaining strong support from the industry.
Despite the “top-down” effort, little has been done to understand the operational status of DNS-over-Encryption from the view of Internet users. In this work, we aim to perform a comprehensive and end-to-end measurement study on DNS-over-Encryption. We seek answers to questions including: 1) How many providers are offering DNS-over-Encryption services? 2) What does their performance look like for users distributed globally? 3) How is the current real-world usage of DNS-over-Encryption?
The study is made possible by our extensive collection of data, including Internet-wide scanning, user-end measurement and passive monitoring logs. To begin with, we launch periodical Internet-wide scan to discover all service providers of DNS-over-Encryption, and verify their SSL certificates. To evaluate client-side usability, we measure the reachability and performance of popular DNS-over-Encryption servers by using 122K vantage points globally. Finally, we use large-scale passive datasets (NetFlow and Passive DNS) to measure the usage of the new protocols.
So far, we have gained several unique insights on the “early” view of the DNS-over-Encryption ecosystem. In general, the service quality of DNS-over-Encryption is satisfying, in terms of accessibility and latency. For DNS clients, DNS-over-Encryption queries are less likely to be disrupted compared to traditional DNS, and the extra overhead is minor with reused connections. On the other hand, we also discover several configuration issues regarding how the services are operated. As an example, we find 25% DNS-over-TLS service providers use invalid SSL certificates. Furthermore, compared to traditional DNS, DNS-over-Encryption is used by far fewer users but we have witnessed a growing trend.
Our study performs by far the first large-scale analysis on DNS-over-Encryption, which we believe will provide guidance in pushing the adoption and improving the ecosystem of DNS-over-Encryption. Our data is also available on our project website. This work is accepted by IMC’19.
PLEASE NOTE at the speaker's request, this talk will not be webcast or video recorded.
Different implementations of DNS resolvers take different approaches to the authoritative server selection problem, i.e. resolvers are faced with the question "Which authoritative server should I ask now?". This presentation introduces a new tool for testing server selection strategies implemented in DNS resolvers.
The tool runs DNS resolvers inside a simulated environment containing various authoritative server configurations (good and lame delegations, invalid signatures, …) and also network parameters (latency, packet loss, …), and gathers statistical data about communication between the resolver and authoritative servers.
Using this tool, we can expose multiple versions of different implementations to various scenarios and get better understanding of their performance, and thus make informed decisions refactoring/rewriting/configuring server selection part of DNS resolvers.
We will also present our findings about various resolver implementations.
The DNS Flag Day is an initiative of DNS vendors (both open-source and proprietary) and DNS operators. Its aim is to make the Domain Name System (DNS) protocol more reliable, secure, and resilient while gradually removing workarounds for broken DNS behavior. Sometimes it takes a coordinated group effort to remove support for a broken behavior; if only one DNS server package implemented new rules on its own, users could simply use different software that still permitted the unsupported behavior.
For DNS Flag Day 2020, the idea is the same: make the Internet a better place through a coordinated effort across participating DNS implementors, vendors, and operators. This time, however, the target might seem not directly related to DNS: IP fragmentation. The truth is that DNS is one of the few prominent users of IP fragmentation. When DNS messages are transferred between the DNS server and a DNS client over UDP, they can exceed the Maximum Transfer Unit (MTU) on any part of the path between the two endpoints. The MTU might vary between any two interconnects; while the standard MTU of Ethernet is 1500, the unit size is effectively reduced by encapsulation into different protocols (the most basic example would be VPN). When the MTU is exceeded, the IP packet gets fragmented (split into more parts) and reassembled.
DNS Flag Day 2020 is an effort to fix the IP fragmentation in DNS by making small, albeit important, changes. First, the default maximum EDNS Buffer Size will be changed to a value that would prevent IP fragmentation. The recommended value is going to be slightly smaller than the minimum IPv6 fragment size, around 1220-1232 bytes. The second change stems from the first one; when the DNS response won’t fit into a UDP packet, the default behavior of DNS is to fall back to TCP. That means that either you MUST make sure all your DNS responses fit into a 1232-byte maximum packet size, or both the DNS client and the DNS server MUST be able to communicate via TCP.
EDNS Client Subnet when finally published in 2016 (rfc7871), attempted to solve a need for some content delivery networks to know where queries originated. When published, the RFC clearly outlines privacy concerns, but how does that look in practice?
More than 20+ million Internet properties use Cloudflare's network and approximately 18% of the top 10k websites on the internet use at least one Cloudflare product. In terms of DNS queries, we consistently do approximately 2 million DNS queries per second. That's around 170 billion queries per day, and 5 trillion queries a month. In this talk, we’ll dive into the ECS landscape from Cloudflare’s perspective… where is ECS data coming from, who is sending it, and what does it contain?
In this talk, we focus on a Distributed Denial of Service (DDoS) attack known as Slow Drip, also referred to as Random Subdomain or Water Torture Attack. Studying data obtained via passive DNS collectors, we used machine learning to investigate the Slow Drip attack. First, we built a statistical classifier to identify these attack events. Then, using unsupervised learning we were able to group the events and investigate the malware that was used to create them. We discuss newly discovered features of Slow Drip and compare to past work. Using these new features, we can characterize the malware and describe its scope.
This presentation looks at the previous analysis of use of aggressive NSEC caching (Petr Spacek, OARC, March 2018) and compares the results obtains in those tests to the results obtained in a large scale study of NSEC caching behaviour in the Internet. The results of this second study point to some issues with the interaction between NSEC caching and DNS load balancer behaviour.
DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and op- erators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand for several reasons: DNS is a distributed service, DNS resolu- tion is security-sensitive, and resolvers require multiple types of information as they traverse the DNS hierarchy. These complications mean there are multiple frequently interacting, places TTLs can be specified. This paper provides the first careful evaluation of how these factors affect the effective cache lifetimes of DNS records, and provides recommenda- tions for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise, reducing median latency from 183 ms to 28.7 ms for one country-code TLD.
The RIPE Atlas active measurement network consists of approximately 10,000 probes, each of which regularly sends queries to the DNS root server system. Through analysis of this data we show how root server system performance has changed, or in some cases remained the same, over the past 7-8 years. In particular we use the data to derive metrics for latency, availability, and levels of interception. Additionally, we also use the data to understand the relationship between a varying subset of Atlas probes and the subset of the root server system that those probes are able to observe.
During the summer of 2018, A USC student performed extensive trend analysis of domain names in the 2017 and 2018 DITL datasets for B-Root. The student broke down each incoming request name by length, TLD components, language, etc and plotted the results with respect to time. The result showed a surprising number of buried trends within the DITL datasets, such as the percentage of root traffic is from the Chrome web browser, which languages are reflected in DNS traffic, and hidden repetitious signals discovered in requests based on the number of contained labels. This presentation would be the first public presentation of that resulting study.
The importance of DNSSEC is increasing day by day. Meanwhile, penetration of DNSSEC signed zone is still low. One of the reasons such low penetration is due to difficulty of detecting DNSSEC failure, especially at end user side including ISP’s customer support.
We have been studying on detecting DNSSEC failure at authoritative DNS server side (TLD level) and found one possible indicator regarding DNSSEC related queries. The indicator candidate is DNSKEY queries, which increases several times than usual when DNSSEC failure has happened. We still have unresolved research questions, such as difference in public and other resolvers, TTL effects in failure, effective (quasi-)realtime detection method at TLD servers’ side, and so on, but we would like to share our experiences and have feedbacks from attendees to improve our research work.
The Domain Name System (DNS) Security Extensions (DNSSEC) introduced additional DNS records (NSEC or NSEC3 records) into negative DNS responses, which records can prove there is no translation for a queried domain name. We introduce a novel technique to estimate the size of a DNS zone by analyzing the NSEC3 records returned by only a small number of DNS queries issued. We survey the prevalence of the deployment of different variants of DNSSEC negative responses across a large set of DNSSEC-signed zones in the wild, and identify over 50% as applicable to our measurement technique. Of the applicable zones, we show that 99% are composed of fewer than 40 names.
Lunch Break (lunch is provided)
Anyone wishing to participate in the PGP signing session should email an ASCII export of their public key, as an email attachment, to pgpsign@dns-oarc.net by 09:00 CDT on Friday.
Anyone needing assistance with generating and exporting a key, or with understanding how a key signing works, can view the slides at https://mpounsett.github.io/key_signing_party/ for assistance.
The recent discussions over encrypted DNS deployment and "applications doing DNS" suffer a lot from circular arguments and talking past each other. In my opinion, this is due to a lack of agreement on the fundamental nature of today's DNS. So - what's the DNS? When you look for a definition, you cannot really find a clear, widely shared one, but most people would say that it is a distributed database. But is it really? When you look at its current uses and properties, the DNS looks more like a direction system, i.e. something that gives different directions to different people trying to reach the same service from different places.
The talk will demonstrate that this is the current nature of the DNS, explore its properties and suggest a more useful topic for discussion: are we happy with this model and if so, can we agree on it and make it work properly in the future?
Late-breaking short talks
in this presentation i would like to quickly introduce 2 new features of ENTRADA.
- serverless processing
- Round Trip Time analyses for quality of service monitoring
Quick overview of a DANE/DNSSEC survey that explores DNSSEC adoption and the use of DNSSEC to publish DANE TLSA records for MTA-to-MTA SMTP.
Surprising behaviour when chained forwarding resolvers retry queries that fail. A survey of DNS domains that gave failed lookups a "second chance" created an unexpected traffic storm.
RFC 8085 UDP Usage Guidelines specifies that an application SHOULD NOT send UDP datagrams that result in IP packets that exceed the MTU along the path to the destination.
However, DNSSEC requires large UDP payload size and IP fragmentation.
draft-fujiwara-dnsop-avoid-fragmentation proposes to avoid IP
fragmentation in DNS.
My flight is around 1p so I can present in the morning, if not Ralf Weber has offered to speak to my slides as well. I expect I will leave for the airport around 11a so if there is space in the agenda I would be happy to fill in, even on-demand. Otherwise, I can do a updated version next time I'm at OARC as well.
https://docs.google.com/presentation/d/1mVvl06LLfi-UH78zxCZuZWUcrTOR07PBsI5lbv6LpWI/edit#slide=id.p
What to expect when upgrading OpenDNSSEC from 1.4.x to 2.1.x.
OARC Inc. Reports, Annual General Meeting and Board Elections