As your business grows and your data matures, your happiness with your data handling will go through various peaks and valleys. The valleys, or times of data indigestion, will occur during three major stages of growth.
Your team can’t see any data! Just connect to the production database. What could go wrong?
It feels great when you’re getting your first reports. As time progresses, you are happy that you have access to quite a bit of data, but you feel that you sometimes need information to properly analyze the data. You start adding information to the production database that is not necessary for your application, but it makes the data easier to analyze. You create staging and temporary tables. You grant access to more people and departments. Somewhere along the way…
Data indigestion kicks in when you feel like the production application performance and space requirements have been impacted by making changes for reporting and analysis.
Production systems begin to look messy when you create semi-organized tables to facilitate reporting. The line between what is required for your application and what is required for reporting can blur. If you continue down this path, you will create a monolithic system that will seem impossible to escape.
It feels great when you have your first data warehouse setup complete. You have moved off of that tiny application database and onto a system where you can store real amounts of data! You are able to collect all that data that you didn’t have space to collect before. You can finally serve up data to all of the analysts, data scientists, and developers who need more information. You give warehouse access to the Finance, Marketing, Sales and Operations teams. Your teams have their own sandboxes. Somewhere along the way…
Data ingestion kicks in when you feel like there is a single point of failure for your company’s analytics and reporting.
You start to ask yourself questions. Why can we only have 3 months of history? Why do we have so much business logic in so many different places? Procedures? SQL? Code? Reports? Who started a query that is hogging all the resources on the cluster? Why doesn’t our nightly processing doesn’t finish at night? Why does everything go down when there is maintenance?
It feels great when you finally don’t have a single point of failure. You’ve got microservices running everything. You’re on cloud resources and you have multiple reporting systems people can consume data from. You’re using multiple database technologies on the backend to track and prepare your data. Your developers have combined multiple open source technologies on a custom data processing framework. Somewhere along the way….
Data ingestion kicks in when you feel like you have more microservices than you can manage.
You ask yourself questions, again. Why do we have ten times as many repositories as we have developers? Why does it take us days to figure out which repository to make changes in when we need to fix something? Does anyone know how everything works? Our most senior person working on this custom microservice spaghetti was only hired six months ago. She has to manage a five year old framework designed by an architect that left three years ago? How can we hire enough people to maintain this system?
It feels great when you have a partner to help your company through these bouts of indigestion. You have a scalable solution customized to your company’s needs, regardless of which stage it is currently at.
Nom Nom Data – Making Data Digestible