Speaker
            
    Rafael de Elvira
        
            (Senior Software Engineer @ Slack)
        
    Description
On September 30th 2021, Slack had an outage that impacted less than 1% of our online user base, and lasted for 24 hours. This outage was the result of our attempt to enable DNSSEC, but which ultimately led to a series of unfortunate events.
On this talk we'll cover our DNSSEC rollout to all Slack critical domains and the three failed attempts to enable DNSSEC on slack.com – doing a deep dive into our third attempt (the Sept 30th outage) – where we'll cover what was done during the outage, why we did it and ultimately the root cause of the outage, which was a bug in the DNSSEC implementation on our cloud provider authoritative DNS server.
Primary author
        
            
                
                
                    Rafael de Elvira
                
                
                        (Senior Software Engineer @ Slack)
                    
            
        
    
        