Events

Final OA Book Club wrap-up: Author AMA highlights & last lessons on data engineering fundamentals | Census

Zachary Pollatsek

12 January 2023

Welcome to the 3rd and final volume of the TitleCase OA Book Club catch-up series on The Fundamentals of Data Engineering by Joe Reis and Matt Housley! If you missed the first or second volume, you can find those here (Volume 1) and here (Volume 2).

To recap: Part one focused on the decision frameworks covered in the first section of reading for our group. Part two covered more intricate details from the data engineering lifecycle, including storage. This third installment discusses the Ask Me Anything session with one of the authors, Joe Reis, and wraps up our discussion on a fantastic read.

As our book club meetings conclude, I’m still astounded by the sheer amount of information covered throughout the book. For anyone that has followed this catch-up series or just recently stumbled upon them, the main takeaway from this book should be: Focus on the abstract rather than concrete tools/technologies.

How can we interpret this theme? Business needs must be the driving factor when selecting each component of the data engineering lifecycle. While the authors went into extensive detail on the technologies regarding each step of the lifecycle, they continuously returned to the idea that the business and its data needs should drive the selection of specific technologies.

Moving onto some key takeaways from the Ask Me Anything session with Joe Reis, I took three key points from his answers 👇

Companies interested in using data/having data teams must build a culture of open communication and accepting mistakes.
Data teams must find a way to drive growth for their organizations rather than simply becoming a cost sink.
Analytics offering recommendations for future action rather than simply describing the past will become more mainstream within organizations.

As for the final points regarding the book, I’ve consolidated our discussion takeaways into three main points as well:

Read this book if you’re new to the data world or looking for a refresher as a veteran data practitioner.
As an organization, determine exactly what you need to learn from your data prior to jumping in headfirst.
Focus on the abstract data needs of your organization prior to selecting relevant technologies or tools.

Below, I’ll dive a little deeper into these key takeaways, and expand a bit on why the OA Club Book Club was so valuable. 🤓

Three key takeaways from the AMA with Joe Reis for data folks of all levels

To help us all get the most out of reading, author Joe Reis joined us for an AMA during this last section of the discussion. He shared a ton of great insight around data engineering best practices, but three, in particular, stood out to me:

Focus on building a culture of open communication and accepting mistakes
Drive business growth with your data team (and share your wins)
Prescriptive analytics are on the verge of dethroning descriptive analytics

Here’s a bit more detail about each of these three points. 👇

1. Build a culture of open communication and accepting mistakes

Joe discussed the importance of facilitating a culture within businesses that praises open communication and accepting mistakes. 👐

He explained that he has seen numerous examples of data teams being punished or ostracized for mistakes or errors that occurred somewhere along the data engineering lifecycle. These mistakes can lead to an “us vs. them” mentality between data teams and the rest of the organization, spurring distrust throughout a company and hindering forward progress on data projects.

Open communication between stakeholders and data teams is necessary to align everyone on company goals for analysis/reports. The ability to quickly accept mistakes, learn from them and move on separates a healthy company culture from the less-collaborative ones.

When Joe discussed fostering this type of culture, my mind immediately thought of Ray Dalio’s notion of “radical transparency”. Radical transparency ensures that issues/mistakes are brought up and tackled immediately rather than being thrown under the rug.

Moving onto the second takeaway from Joe’s AMA: Data teams should strive to promote growth for the organization (revenue, improved analytics, etc.) rather than becoming another cost sink. 🌱

When there’s little to no visibility into the day-to-day work of data teams, it’s easy for management to overlook their work and simply look at costs associated with building the infrastructure for data. Data teams should, instead, promote their own successes and convey specific financial metrics (such as ROI) to management to showcase the value they create.

Beyond communicating your wins to leadership, data teams also need to understand the costs associated with different parts of the data eng lifecycle so they can identify where to trim expenses. This combination of communicating your wins plus advocating for cost savings makes it easier to demonstrate the value of the data team and its work.

3. Prescriptive analytics are on the verge of dethroning descriptive analytics

Perhaps the most exciting portion of the AMA with Joe was his breakdown of the ongoing shift from descriptive analytics to prescriptive analytics.

Today, the majority of analytics in business are descriptive; in other words, these analytics describe something that has already happened. An example of this could be a report that displays customer churn rate over the past 5 years.

Prescriptive analytics, on the other hand, prescribe actions to take in the future; these types of analytics are still in their infancy as they rely on machine learning and artificial intelligence. Prescriptive analytics aims to answer questions like, “Which marketing campaign should be run next quarter to maximize revenue for a new product launch?” 🤔

This is just one example of the future of analytics, and Joe says these types of analytics are still in the very early stage, as ML and AI technologies are rapidly evolving. However, while we wait for ML and AI tech to catch up to our data dreams, we can begin to shift our analytics strategies from descriptive to prescriptive using tools like reverse ETL to power up downstream marketing and sales platforms.

Three final learnings from the Fundamentals of Data Engineering: Everyone should read this book & other takeaways

In case you haven’t gathered, there’s a ton of useful information in this book for data professionals of all seasons and stages (even if you’re brand new to the field or adjacent to data engineering). Now that we’ve wrapped up the reading, here are my top takeaways: 👀

1. Seasoned veteran or data newbie? You should read this book

This book has absolutely surpassed my expectations. I was excited to read it with the OA book club, but after finishing it, I view the book as a must-read for anyone remotely connected to the data space. When I say anyone, I truly mean anyone; with data becoming the currency of the future for businesses, everyone will benefit from reading this book (even non-data engineers).

The Fundamentals of Data Engineering dives into every aspect of the data engineering lifecycle with amazing insights from the authors. As a relatively new member of the data world, this book has given me a rock-solid foundation on the intricacies of data generation, storage, transformation, and so on.

2. Discuss business needs/wants prior to building out your data infrastructure

This second point goes hand in hand with the main overarching theme throughout the book: Focus on what value/information you want to gain prior to selecting any specific components of your infrastructure. 🤝

Members from throughout your organization/business should help determine each component of the infrastructure. This includes both technical and non-technical personnel. Stakeholders and data engineers must determine how their business can benefit from its data and in what ways they will use it prior to building out the infrastructure.

This idea connects well to a key point from our first discussion: Make reversible decisions whenever possible. Reversible decisions allow your organization to make decisions quickly without fear of consequence. If your company solely makes reversible decisions, it will be a piece of cake to adjust your data pipeline as business needs morph/pivot.

3. Keep it abstract; specific tools and technologies come second

Last, but not least, I‘ll wrap up our final installment of the catch-up series with the main theme throughout the book: Abstract business needs drive which tools/technologies are selected. 🏎️

In today’s work environment, it’s easy to get roped into the newest hot tech without considering what features of this technology truly augment your current data stack. When building out a new data stack, or adding a new component to your existing stack, always determine what value you need for your business prior to selecting anything. Countless companies have fallen into the trap of selecting technologies that don’t fit their business model. Keep it abstract.

Wrapping up: Joe’s words of wisdom & how to join the OA Book Club

Thanks, everyone, for coming along on this book club recap journey with me! I have learned so much from this book, and I plan to reread a few chapters to ensure I didn’t miss anything. Hearing Joe discuss building a healthy culture, driving growth with data, and the rise of prescriptive analytics during his AMA was fantastic. The main takeaways from Fundamentals of Data Engineering as a whole are:

Wherever you are in your data journey, read this book.
Discuss your business needs prior to building anything.
Keep it abstract.

If you’re interested in joining the TitleCase OA Book Club, head here. Parker Rogers, data community advocate at Census, leads a discussion every two weeks for about an hour and will be launching the next iteration of the book club soon. It’s incredible, and I can’t emphasize just how much I’ve gained over my time in the club after just one book. I hope to see you in our next Book Club call! 📚