When you have lots of pipelines feeding into various data systems and the business hungry for accurate and on-time data, small disruptions in service can lead to unhappy consumers. At some point, you will need to implement a notification system to send emails, texts, phone calls and other types of alerts when things go wrong. When data is critical, there are a few different approaches for triggering notifications.
Event Driven Notifications
Almost everyone starts here. You identify a few key jobs that are critical to the business. You set up emails that get sent based on the success or failure of those jobs. As you scale up the number of jobs, this approach will result in notification storms everytime something goes wrong (or right, depending on your configuration). The emails fill up your inbox making this way of triggering notifications less palatable.
Data Driven Notifications
You are tired of event driven notifications. You disable the less critical alerts. You also create more error resistant and self healing data pipes. Any individual failure doesn’t impact the business significantly and doesn’t require any hands on fixing to get things flowing again. This also means that, to determine if there’s anything wrong with processing, you need to look into the data at various stages and get statistics on freshness, quantity, and rate of errors over time. For most companies, alerts triggered based on this type of analysis is sufficient. But, at a certain scale and volume, this way of handling notifications can also become overwhelming and trigger false alerts.
Anomaly Driven Notifications
This strategy for triggering notifications is highly sophisticated. When you start down this path, you will start with something simple, like a trigger based on statistical anomalies outside of a certain percentage of change from prior history. Eventually, this will lead into machine learning models designed to detect anomalies under different circumstances. It is a significant investment to implement this strategy, but, if your data processing is at the appropriate scale, it will be worth the effort.
Regardless of the level of sophistication in your notification trigger design, you will most likely have a mix of the strategies above. However, the crucial goal is to create an environment where the on-call teams responsible for your data can preserve their sanity and consumers can continue to get the quality and continuity of service that they deserve.