How to Become Data-Driven Without Breaking the Bank

Published in

Varada

5 min readMar 18, 2021

A Cautionary Tale About the So-Called ‘Cost Savings’ of Managed Data Analytics Services

It’s 2021, and every enterprise is now “data-driven.” We’re a decade into the data revolution, and the most agile and tech-literate companies reap the compounding results of their years-long investments in data to sprint ahead. Those who’ve not seized the data opportunity are getting left behind as the competition only grows more intense.

Today, organizations possess vast amounts of data. Enterprise data consumers can mine that data and uncover valuable customer and prospect engagement insights. They can identify features to differentiate and improve their products, detect and analyze security threats, and strengthen global supply chains. And, they can shave wasteful and value-neutral expenses, forge new business strategies, and recruit top talent.

No organization wants to get left behind or become obsolete because it didn’t seize the opportunity to use the vast data resources at its disposal to make smart business decisions.

It’s Easier than Ever to Dive Into Your Data

Enterprises spend massive amounts of resources to close their data gaps and get short-term results. However, maintaining that pace and making data work for you reliably and cost-effectively day after day is an entirely different matter.

The good news is that it’s really not very hard for companies to launch a data analytics infrastructure, which was not the case a short time ago. The ubiquity and ease-of-use of cloud storage and compute resources as well as commonly available best practices allow any company to jump head-first into data without worrying about operations. They can focus first on building data analyst teams and showing business value for data.

The highly competitive big data arena offers many options for cloud-based data analytics providers who can quickly deliver data-driven insights and accelerate the speed of business. In theory, by outsourcing data-crunching workloads to managed data analytics service providers (e.g., Snowflake or AWS Redshift), organizations can save themselves the cost of supporting such demanding operations with their internal infrastructure. In other words, outsourcing the data analytics to third parties offers the oh-so-appealing promise of lower DevOps costs.

Beware the Hidden Costs

In reality, outsourced services are a great option for getting started, but they come with hidden costs that escalate over time, particularly as the number of analytics projects within the organization increases. Here’s why: As you expand the use of data analytics across the organization (which is a good and desirable thing), more and more business units request queries for their own purposes. As your use of managed solutions scales out, the costs scale up accordingly.

So, most organizations put a cap on the spending. That puts the onus on internal teams to manage the organization’s use of the managed analytics provider. For the sake of simplicity, let’s refer to this internal managing entity as the DataOps team.

DataOps as a Cost Center

The DataOps team now has responsibility for managing the overall data analytics budget, prioritizing query requests, and figuring out ways to make the data analytics budget stretch further. The DataOps team faces several unhealthy quandaries:

Optimization takes time. The DataOps team spends time pouring over queries, improving robustness and looking for where they can optimize queries (i.e., speed up, improve resource efficiencies) to save money and improve data consumers’ user experience. Starting with the slowest queries or most budget consuming queries (depending on the business need), on a query by query basis, the DataOps team throws its bag of tricks at optimizing each query by hand. The team may try changing the join order, manually adding indexes, creating materialized tables to speed up common predicates…there’s a well-known play book for making queries go faster. Not only does manual optimization consume enormous time resources of your staff, but also this manipulation for efficiency’s sake can force compromises on what insights are possible. For example, your queries may be run on a subset of data instead of all of it.
Backlogs are costly. As more users work through more data every day, the backload of optimizations grows. Meanwhile, DataOps isn’t going back and cleaning up the old optimizations, so the system costs that you’ve been managing to push down slowly start to creep back up until everything comes to a halt and the team needs to do a big reset on the indexes, caches, and materializations. As the complexity of operations and the corresponding backlog grow, the fog of war severely limits the ability to assess the impact of actions and focus on the most impactful ones.
Insufficient visibility hobbles the DataOps team. With managed analytics providers, the DataOps team has very little visibility into how the provider prioritizes workloads. Generally, all you know is, if you pay more, you can make queries go faster. At best, most analytics providers supply query statistics that tell you how much data was scanned or how much time was spent for a given query, making it easier to focus on specific queries. What your team really needs is visibility at the workload level. By grouping queries into the business actions they support, DataOps can start to identify patterns among which workloads need priority and which are slow relative to business needs, not the needs of the individual user request.
Burnout burns up your ROI. User frustration and DataOps team burnout can stymie your best-made plans to capitalize on big data and build a data-driven culture.

All of these DataOps dilemmas create rising operational expenses, so much so, in fact, that the “cost savings” of zero DevOps is ultimately negated by the rising cost of DataOps.

Don’t Let It Happen: Focus on What Matters and Automate the Rest

Fortunately, there’s a better way to handle the onslaught of user demand and control costs as your organization transforms into a data-driven business. Your DataOps teams need the right level of visibility and control, with enough automation to handle the basic needs of your entire user base. Seek these features in your data management solution.

Workload-level visibility gives DataOps an open view, unobstructed by proprietary barriers, through which they look at how the system is performing both as a whole and in the parts most relevant to business needs. A workload might be the set of production queries that access a specific data set, or all the queries run by the product development team. The DataOps team now has a broader field of view into how analytics are being used across the organization. Instead of tackling queries from slowest to next slowest, DataOps can make decisions about where to focus resources based on business priorities. Furthermore, a strong transparency at the workload level enables IT executives to better allocate resources to DataOps, plan ahead and understand the return on investment.

The DataOps team can also reduce the overall cost of managing an analytics system by employing automation. For instance, DataOps teams should be able to tell their query management system which workloads are more important. Based on this information, the query management system should automatically and dynamically create appropriate indexes, refine which queries to cache, and even materialize tables with the right columns sets, including pre-joining dimensions.

If you want to avoid trading DevOps savings for DataOps costs as you transform your organization into a data-driven business, make sure your DataOps team is equipped with a data management solution that offers workload-level visibility, automation, and control over performance and cost.

How to Become Data-Driven Without Breaking the Bank

It’s Easier than Ever to Dive Into Your Data

Beware the Hidden Costs

DataOps as a Cost Center

Don’t Let It Happen: Focus on What Matters and Automate the Rest

Written by Shira Sarid