Cloudflare operates multiple DNS services in over 100 data centers around the globe, which makes troubleshooting with unstructured logs or packet captures impractical due to its storage and computational costs. In the first part of this talk we’ll go over our current data analytics architecture and how we got there, after a few false starts.
This will cover logging infrastructure, that securely transports structured logs from all data centers, to an OLAP system built with ClickHouse open source RDMS. ClickHouse enables us and our customers to explore the the dataset in real time to get operational insights. Due to many of the optimizations built into ClickHouse we are able to store the data for a long time allowing us to look events is perspective and at historical trends.
In the second part of the talk we will demonstrate some of the capabilities of the system, both on the granular level: track down problems with a particular record; and high level experiment with the dataset to show how it’s possible to track resolver affinity.
This talk will go over design choices and architecture of a DNS analytics system that ingests almost 100 billion events daily, and is capable of scanning billions of rows per second with modest hardware requirements. It will show real world examples of how can the system be used for data exploration of DNS traffic.