At Abbot, we’re building SlackOps for Customer Success teams. We interact with a lot of external systems. Those external systems define their API and tell us what to expect, but if we’re not careful, that could just change out from under us and cause a bunch of problems. Even if we limit ourselves to our own components, there are all sorts of undefined behaviors that could happen if component X does something component Y doesn’t expect. There are so many “impossible” scenarios that it seems obvious that someday one of them will suddenly become possible.
So, diligent engineers that we are, we do our best to be prepared for the unexpected. We check for those edge cases and try to fail fast and early. Even better, we want our system to be able to quickly identify when one of these “impossible” things happens and notify us so we can fix the issue with minimal impact to our customers. Fortunately, .NET 7 adds a new type to help us out here: UnreachableException.
Enums and Switches
Here’s a concrete example. I’ve written more “switch” statements over enum options than I could count. But in C#, enums are “open” (for now), which means any integer can be cast into any enum type. It might not match one of the defined options in my enum, but it’s still valid behavior. So of course, every one of those switch statements has a “default” case which throws if an unexpected value occurs. In .NET 7, we now throw an UnreachableException. For example, here’s some code that provides a friendly status message for a Slack conversation we’re monitoring, depending on its current state:
There at the bottom, you can see the “default” case where we’re throwing an UnreachableException. There’s no need for a message since we’ll get a stack trace with the exception and we’ll know exactly why it’s happening based on the location in the code (assuming we can track that down; more on that later).
Null Checking
Another common pattern is null checking. When we’re dealing with all those external systems, we have a number of scenarios where the API of an external service is clear that a value should never be null, but we’re not confident enough to mark it as non-nullable (we use nullable reference types heavily). For example, this code processes incoming events from Slack and stores them in our own data structure:
At the bottom, we’re assigning the “envelope.TeamId” to “SlackEvent.TeamId”. Well, “envelope” is an object deserialized from Slack’s payload, and not all messages provide a “TeamId”, so we defined it as “string?” to ensure we had null checks in place. But at this point in our code, we know it’s a message that should have the TeamId property set. The “SlackEvent.TeamId” property is non-nullable, because we absolutely know we need a TeamId by the time we’ve created it. We could just write “TeamId = envelope.TeamId!” to bypass the null checking entirely. In all expected circumstances, that would work fine. The value “shouldn’t ever be null” so we’re good, right? Well, we’re not quite so trusting. That’s where the “Require” extension method comes into play:
By using “[CallerArgumentExpression]” on the “expression” parameter, the C# compiler will automatically fill that parameter in with a string representation of the expression that was used to specify the “o” parameter. So in “envelope.TeamId.Require()”, if the TeamId is null, the exception message will be “The expression ‘envelope.TeamId’ should not be null.”.
Normally, I’m not a big fan of defining extension methods so broadly. And I’m also not normally a fan of extension methods that can take ‘null’ for their ‘this’ parameter (after all, regular methods can’t). But this one is just so handy that it seems worth it.
Other Unexpected Behaviors
There are a myriad of other unexpected behaviors in our system, as with any system. For example, when we display a dialog to a Slack user, Slack validates that the fields we marked as required are actually filled in. For example, when a user wants to create a HubSpot Ticket from a Slack conversation, the “Subject” field is required:
When Slack informs us of a successful form submission, we can assume that required fields are indeed present. But still… suppose the value is missing for some reason? Better to just check (this is actually from a slightly different example, but the gist is the same):
Sounding the Alarms
Ok, so we’ve been putting UnreachableException throughout our code to detect these cases where something that “shouldn’t ever happen” actually happened. Great. Now we need to detect and alert on those situations so we can find them and track them down. I’ll skip to the end first and show you what the result looks like:
This alert gets posted to a Slack channel if any of a set of “very bad” exceptions get thrown in production. That allows us to react and fix the problem quickly, before too many customers are affected.
We use Azure Application Insights to collect our logs and exceptions. Any time an exception bubbles up to the top of a request stack, it gets logged to App Insights. Then, we have an Azure Alert set up to check for exceptions at a regular interval:
We’re checking for several kinds of “that shouldn’t happen” exceptions, but you can see “System.Diagnostics.UnreachableException” tucked away in there. The full query is something like this:
The alert is configured to post to an HTTP Triggered Skill we created using Abbot’s ChatOps platform, which then posts the message you saw above to Slack. Of course, using Abbot to monitor Abbot is cool and works most of the time, but doesn’t really help us if Abbot is down! So, we also configured the Azure Alert to send an email out to our engineering team.
When we want to investigate the exceptions, we can go to App Insights and run the same query the alert runs:
Digging into those results we can see a stack trace (the line numbers are cut off, but they are there!) and, just as critically, we have a GitCommitId value that tells us the commit from which the app was built. We use Nerdbank.GitVersioning to embed the commit hash in our assembly, and then right as the app starts, we start a logger scope with the commit hash so every message we log has that property attached:
Conclusion
Unexpected behavior is the shadow that lurks in every production application. It’s scary! Fortunately, .NET provides some tools to help bring light to those areas, and Azure’s monitoring platform gives us a way to sound the alarm whenever one of “That Shouldn’t Happen” events actually happens. In the past, I’ve just kinda mashed in whatever exception type seemed relevant to the situation (ArgumentException? InvalidOperationException?), but with UnreachableException, we have a type designed exactly for things that should never happen.
Fast-moving teams use Slack for customer support. With automated insights, reminders, integrations, and more, Abbot makes using Slack for customer service easy. Interested in learning more? Email us at [email protected]!