29–30 Nov 2021
UTC timezone
OARC 36 Day 1 - begins 14:00 UTC Today 29 November

Slack’s DNSSEC Rollout: Third Time’s the Outage

29 Nov 2021, 15:40
25m
Standard Presentation Online Workshop OARC 36 Day 1

Speaker

Rafael de Elvira (Senior Software Engineer @ Slack)

Description

On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC, but which ultimately led to a series of unfortunate events.

On this talk we'll cover our DNSSEC rollout to all Slack critical domains and the three failed attempts to enable DNSSEC on slack.com ­– doing a deep dive into our third attempt (the Sept 30th outage) – where we'll cover what was done during the outage, why we did it and ultimately the root cause of the outage, which was a bug in the DNSSEC implementation on our cloud provider authoritative DNS server.

Primary author

Rafael de Elvira (Senior Software Engineer @ Slack)

Presentation materials