4–5 Feb 2021
UTC timezone
Webinar doors will open at 15:45 UTC. The first session will start promptly at 16:00 UTC.

Not So Popular: A Sample of Domain Names for Typical Web Sites

5 Feb 2021, 17:45
15m
Standard Presentation Online Workshop OARC 34 Day 2

Speaker

Paul Hoffman (ICANN)

Description

Many measurements on the DNS are based on collections of "most popular" web sites, such as the Alexa list. Using these lists skew the results of analysis because those sites are often well-managed and available from many locations (such as through CDNs). A different tactic would be to look for collections of more typical web sites. For this study, data is collected from Wikipedia around the world to create a large list of domain names used in web URLs on Wikipedia pages. That data is used to analyze how many of those domain names are protected by DNSSEC, as well as how many have IPv6 addresses.

This presentation explains an efficient method to collect the data from Wikipedia world-wide. Although the resulting data is more typical of names from the web than "most popular" lists, it cannot be considered "typical" of the web for many reasons, which are also discussed.

Summary

This presentation describes a new database of domain names that represents typical domain names used on the web better than those from "most popular" lists. The data is derived from URLs from Wikipedia from around the world. Measurements of DNSSEC signing and IPv6 addresses are given. There is also a discussion of problems with the database.

Primary author

Paul Hoffman (ICANN)

Presentation materials