Best Practices

Build or Buy? 4 Reasons NOT to Build Your Own Reverse ETL | Census

Allie Beazell
Allie Beazell December 07, 2022

Allie Beazell is director of developer marketing @ Census. She loves getting to connect with data practitioners about their day-to-day work, helping technical folks communicate their expertise through writing, and bringing people together to learn from each other. Los Angeles Metropolitan Area, California, United States

Welcome to part two of our series on evaluating reverse ETL tools. This article was updated December 7, 2022 to better capture the current state of reverse ETL tooling. You can read the updated part one here and grab the new-and-improved companion guide, The definitive guide for evaluating reverse ETL tools.

In this installment, you’ll learn why reverse ETL is so challenging for many teams to build. We’ll breakdown:

  • The cost to build 🏗️
  • The cost to maintain 🔧
  • The opportunity cost 💸
  • Why reverse ETL isn’t just another data pipeline🚰

We’re assuming you already have some savvy data and engineering talent on staff. You may be asking yourself: Why can’t these existing resources within my company just build reverse ETL for our team themselves?

We often see companies wrestle with this. On the one hand, you can spin up an MVP of a reverse ETL tool yourself without too much trouble using custom python scripts and open-source options. But the labor isn’t in the initial development cost, it’s in the ongoing time and resources maintaining that tool will cost you. A DIY build will lock your engineers right back in API purgatory and ensure that your data team is always busy, but not necessarily productive.

On the flip side, when you use a pre-built tool like Census, you don’t have to spend time worrying about the ins and outs of each API your team uses (and you can reuse the data modeling and logic work you’ve already built for tools like dbt and Airflow). Even better, you can go from request to deployment within the same day. Sounds great, right?

But in case you’re not convinced, let’s take a look at some numbers around the incurred cost of the classic build vs buy decision (since we all love numbers, right?) Here are three reasons you don't want to build your own reverse ETL solution.  

1. Cost to build 🏗️

If you choose to build reverse ETL yourself, consider the labor cost of whoever builds and maintains your data integrations. This could be a data engineer, a data analyst, or an integrations specialist. For this exercise, we’ll assume that the task would fall on a data engineer.

To start, let’s take the average annual salary of a data engineer in San Francisco: $157,494 (but keep in mind that a good data engineer’s annual salary quickly rises above $200k).

For ease of calculations, we’ll assume your engineers are in that $200k range. Let’s do some back-of-the-napkin math to get a ballpark estimate of how much it would cost your organization to build a bespoke reverse ETL tool (not including the potentially infinite number of maintenance hours on that tool after it ships – we’ll get to that in a second):

Let’s say you’re starting out with three CRM connectors: Salesforce, Marketo, and Zendesk. And let’s say, for the sake of napkin science, it takes a (proficient) engineer about 2 weeks to build a custom DIY connector (or ~$8k per engineer per connector). That’s just for spinning one connector up. All three would run you about $24k and 6 weeks in initial engineering time alone. And that’s the cheap part. The expensive part is maintaining it and hoping that, if it fails, someone notices within a short period of time and fixes it before your business teams spend months working off stale data.

Compare that ☝ to a reverse ETL tool where it takes anyone 15 minutes to set up an initial sync with no upfront cost. With most reputable reverse ETL tools, you can get started for free with a yearly subscription that includes unlimited connectors available for less than 10K per year. Now that napkin math just makes sense.

2. Cost to maintain 🔧

We probably don’t have to tell you this, but the cost of DIY software doesn’t stop when the integration goes live. Instead, it becomes exponentially more expensive to deal with as it becomes more out of date. And with new tools and API updates always around the corner, new team members have to take the time to make heads or tails of it.

This cost is much harder to estimate with a high-level formula. People will inevitably leave your company or move teams, APIs will change, new features will be needed, and the organization will grow. Add in the added pressure to maintain these pipelines quickly since they’re transporting business-critical data. If these updates are relegated to the back burner, there are real-world, real-time data impacts like new trials not receiving welcome emails.

As your integrations and organization become more complex, you’ll end up with a dangerous, interdependent web of connectors to maintain and debug.

So how does a pre-built, dedicated reverse ETL tool eliminate this risk and stress? There are four main ways:

1. Increased uptime: Your business depends on customer data. If your integrations aren’t performing as needed or they’re not consistently built using up-to-date best practices, you’ll lose time and money quickly. For example, if you’re not automatically de-duping your ad platform audiences or if your sync is running up your API bills by syncing unnecessary fields.

2. Data quality assurance: Trust in data is an integral part of building a modern, data-driven organization. If your tools aren’t solving this (or, at the very least, not making it worse), you’re going to come out net negative. For example, if your sales team consistently gets out-of-date or incorrect contact information for prospects, they’ll hesitate to leverage the data at their disposal and fail to perform as well as they could.

3. Scalability: The cost of building and maintaining custom features for your tooling will quickly add up. A quality pre-built reverse ETL tool will come out of the box with free core features to get you up and running (and running some more) from day one. Plus, a quality pre-built reverse ETL tool will scale arbitrarily to meet your data volumes. For example, our free plan includes 50 destination fields, unlimited destination connections, unlimited SQL, dbt, and Looker models, incremental syncing, and more from day one.

4. API Quotas: Do you know how close you are to hitting the API quota of yourSalesforce, Braze, insert-name-of-costly-SaaS API? Do you want to be thinking about that every day? These are concerns that you will adopt when you trigger a python script on a cron job. Put the * in the wrong place, or forget to use a batch API, and you might end up exceeding your API quotas by orders of magnitude.

If you take down all these considerations and decide that it’s worth the ongoing cost to build your reverse ETL tooling in-house, more power to you. But, if you’re like us, you’d rather spend that time and resources building your core product.

3. Opportunity cost 💸

What could your talent build if all the busy work and distractions of maintenance were taken off their plate? Ultimately, the real cost of building a reverse ETL solution in-house and dedicating engineering effort to maintaining your own connectors isn’t the dollar amount of the time spent coding the project, it’s the cost of the work they’re not doing instead.

Not only that, it’s the cost of the work your analysts could do with the right tooling, but are blocked otherwise. If data professionals are constantly busy with maintenance, they’re less likely to innovate. After all, you want your team working on adding value using the pipelines, not building reverse ETL pipelines. Tools like Census don’t just let data engineers do better work. They give your analysts an avenue of greater business impact by building data pipelines themselves with just SQL vs the complex skills required to instrument Airflow or custom scripts.

On the data engineering side, practitioners want to spend less time writing integrations and more time unlocking value in data through models and applications. But they can’t do that if all their time is spent troubleshooting a complicated integration they inherited from another engineer. So, by investing in reverse ETL, you cut your troubleshooting time, invest in your data, and, as a result, invest in your product. After all, an uphill battle doesn’t scale for anyone.

4. Reverse ETL is not just another pipeline 🚰

Now that we’ve weighed the costs of building an in-house reverse ETL tool, the final point we want to make is that there is significant complexity in building a reverse ETL tool that goes beyond just building pipelines. It’s easy to think that reverse ETL is “just another data pipeline” but here are a few reasons why that’s not true:

  • Reverse ETL is designed for a different use case. Data being synced to downstream operational tools has a different purpose than data landing in your data warehouse; it’s a completely different use case. With reverse ETL you’re not just putting a data tool in place for the data team, you’re bridging the organizational gap between that data team and your BizOps teams. Operational data is used to power workflows and drive decision-making, and your reverse ETL tool needs to be carefully designed with these end-use cases and users in mind. This affects many requirements such as data quality, alerting, and usability for business users.
  • Writing data is a different animal than reading it. Reverse ETL writes business-critical data into hundreds of destination databases at scale, all with their own nuances. Read APIs on most SaaS tools tend to be relatively straightforward, whereas write APIs are much more complex (e.g., each tool handles deletes differently). TL;DR: It’s a harder problem to solve.
  • Reverse ETL requires a new data governance strategy. Reverse ETL tools interact with any application they’re connected to and must be well-governed to ensure there’s a tight feedback loop throughout the organization. Good data stewardship is a uniquely important virtue for reverse ETL tools, making it an overall riskier endeavor.

We know that’s a lot to think about when deciding if a managed reverse ETL solution is for you. So here’s a recap:

  • The cost of building your own tool is often more than it makes sense for many companies to take on. Our best case back of the napkin estimates put it at almost $24K and six weeks to build only three connectors, and that doesn’t come close to factoring in the maintenance costs.
  • The cost of maintaining a bespoke reverse ETL tool often balloons well beyond initial estimates and never really stops. Outsourcing your reverse ETL tool to a vendor like Census lets you do higher leverage work with that time and gain some peace of mind.
  • The opportunity cost of having your talented data engineers focus on building and maintaining a DIY tool (and the wasted potential of your analysts) is more than enough to offset the annual cost of a dedicated reverse ETL tool.
  • Reverse ETL is not just another pipeline and there is significant complexity in trying to build a tool with the functionality to satisfy business intelligence use cases and reliably meets technical requirements like dealing with write APIs.

Hopefully, by now you’re in the camp of folks who know they want to invest in a reverse ETL tool from a dedicated, expert vendor like Census. If you’re ready to start the journey toward operational analytics (AKA doing more with your data (and time)), there are some key considerations to keep in mind as you search for the best reverse ETL tool.

Ready to learn more? Check out part 3.1 of our series on evaluating reverse ETL tools: 4 key considerations for finding the best reverse ETL tool. Or, if you want to skip ahead, grab The definitive guide to evaluating reverse ETL tools.

Related articles

Best Practices
Retail Media Networks & Cloud Data Warehousing: An Introduction
Retail Media Networks & Cloud Data Warehousing: An Introduction

Retail Media Networks (RMNs) are gaining significant traction in today's ever-evolving digital landscape. At the heart of this transformation is the use of Cloud Data Warehouses, revolutionizing the way businesses store, analyze, and utilize data. This guide aims to provide an in-depth understanding of Retail Media Networks, the role of Cloud Data Warehouses, and how to leverage technologies like Snowflake and Census Embedded to build robust RMNs atop your Data Warehouse or Data Lake.

Best Practices
CDP vs MDM, What's the difference?
CDP vs MDM, What's the difference?

In the modern business world, data is the new gold. It can give businesses the edge they need to outperform the competition. However, managing and making sense of vast amounts of data can be a daunting task. Two key technologies that businesses use to manage their data effectively are Customer Data Platforms (CDPs) and Master Data Management (MDMs). These two technologies, though different, can work together to help businesses make the most of their data. In fact, integrating a Cloud Data Warehouse can make your MDM or CDP the best of both worlds.

Best Practices
What is Reference Data?
What is Reference Data?

Data management is a broad discipline that includes a variety of types of data. Within this spectrum, the concept of Reference Data holds a significant place. This article will delve into the intricacies of reference data, its role in master data management, and its importance in maintaining data consistency and quality.