What is the difference between Entity Resolution & Identity Resolution

Daisy McLogan
Daisy McLogan February 17, 2024

I'm a customer Data Architect at Census, and I help our customers implement best practices when it comes to cleaning, transforming, and activating their data.

The digital era has ushered in a new wave of opportunities and challenges for businesses, particularly in the B2B tech and retail sectors. Amidst the vast oceans of data generated daily, the imperative of master data management (MDM) comes to the fore, serving as the backbone for organizing, categorizing, and linking data across the enterprise. 

Within this framework, the concepts of Entity Resolution and Identity Resolution have emerged as critical components. These methodologies are not just pivotal in navigating the complexities of data management but are paramount for businesses aiming to deliver personalized customer experiences, optimize their marketing strategies, and ensure compliance with ever-evolving data regulations. By accurately identifying, linking, and managing data about customers and entities across multiple sources, companies can unlock the full potential of their data assets, paving the way for innovative marketing solutions and operational excellence.

Understanding Entity Resolution: The Basics

Entity Resolution, often referred to as data matching or fuzzy matching, is an essential data management technique. It's a process that identifies and links data records from a single source or across multiple sources that pertain to the same real-world entity. Essentially, it's about pinpointing when two or more records, despite being described differently, represent the same 'entity' in the real world.

For instance, consider two customer profiles in a database. One profile lists the customer's name as 'John Doe', and the other as 'Doe, John'. Despite the variation in how the name is represented, entity resolution recognizes that both profiles belong to the same individual.

Entity resolution is not limited to people. It can also be applied to organizations, products, or any 'noun' that a business cares about. In a broader sense, entity resolution is about creating a holistic and unified view of each entity, facilitating more accurate and meaningful data analysis.

Diving Deeper: Entity Resolution Use Cases

The application of entity resolution extends across various sectors, from B2B tech and retail to healthcare, finance, and more. Here are a few examples:

  • Customer Record Unification: In retail, entity resolution helps create a unified view of each customer, or Customer 360,  by linking their interactions across multiple touchpoints. This allows retailers to tailor their marketing strategies according to each customer's preferences and behaviors, enhancing customer retention and driving revenue growth.

  • Product Record Unification: In the B2B tech sector, entity resolution enables businesses to gain a comprehensive understanding of each product's performance. By linking all data related to a particular product, businesses can optimize their product offerings and make informed decisions about future product development.

  • Account Record Unification: In the financial sector, entity resolution facilitates the linking of multiple accounts that a customer may have. This provides a comprehensive view of each account's performance, enabling financial institutions to offer improved services and optimize their account offerings.

Decoding Identity Resolution

A subtype of entity resolution, Identity Resolution, is more focused on individual users. It involves the process of linking and consolidating different user actions and attributes across various touchpoints and systems to create a unified view of an individual customer or user.

This process is essential for businesses aiming to provide a personalized customer experience. By understanding a customer's interactions across different channels, businesses can tailor their marketing efforts to each customer's preferences, thereby boosting customer satisfaction and loyalty.

Identity Resolution in Action: Key Use Cases

The application of identity resolution is far-reaching, spanning various industries and business functions. Here are a few examples:

  • Enhancing Omnichannel Marketing: With identity resolution, marketers can identify users who interacted on one device and engage them via other channels and on different devices. For instance, if a user leaves items in an e-commerce cart on their desktop, a marketer can send a push notification to their mobile device, encouraging them to complete their purchase.

  • Improving Product Recommendations: Businesses can gain a complete picture of every product a user has interacted with, thereby improving onsite product recommendations and personalizing product offerings.

  • Uplifting Analytics: By joining user sessions and actions together, businesses can easily analyze cross-channel and cross-device user behavior, leading to deeper insights into customer journey performance.

The Interplay Between Entity Resolution and Identity Resolution

While entity resolution and identity resolution might seem distinct, they are closely intertwined. Identity resolution is a specific type of entity resolution, where the target entity is an individual user. Essentially, identity resolution is a process within the broader scope of entity resolution.

The fundamental approaches for both are similar. Both involve the process of deduplication (removing duplicate records), record linkage (identifying which records relate to the same entity), and canonicalization (unifying and consolidating data from the linked records).

Existing Solution on the Market

Several tools and solutions can assist with implementing entity resolution and identity resolution. These include:

  • AWS Entity Resolution: It offers a fully managed, scalable service ideal for AWS-centric enterprises, enhancing data quality with minimal setup. Best for teams that are all in on the AWS cloud and not looking for highly customizable solution

  • Census Entity Resolution: This solution is best for data teams and marketing teams looking to increase the quality of their customer data, offering complex matching capabilities for dynamic operational environments and updating their CRM, marketing Platform, BI  and warehouse data with clean data.

  • Zingg: An open-source framework offering scalability and adaptability, suitable for enterprises with technical expertise seeking customizable solutions. Best for people familiar with Apache Spark

  • Senzing: This AI-powered software delivers real-time analysis and insights for quick decision-making, requiring minimal customization.

  • Python RecordLinkage: This open source package offers extensive customization for data scientists who are comfortable writing Python code, ideal for projects needing tailored deduplication and linking strategies when keeping data up to date in near realtime is not required. 

These solutions, when implemented correctly, can significantly enhance a business's data management capabilities, enabling them to gain deeper insights from their data and make more informed decisions.

Emphasizing the Importance of Entity and Identity Resolution

In the digital age, where data is the new oil, the importance of entity resolution and identity resolution cannot be overstated. They are vital for improving data quality, enhancing customer understanding, bolstering analytics, and ensuring regulatory compliance.

Businesses that ignore or poorly implement these processes risk negative customer experiences, security risks, unjust credit denials, flawed decision-making, and potential legal and regulatory compliance repercussions.

By understanding and leveraging entity resolution and identity resolution, businesses can gain a competitive edge, deliver superior customer experiences, drive revenue growth, and ensure regulatory compliance.

In conclusion, both entity resolution and identity resolution are pivotal for businesses to understand their customers better, enhance their marketing strategies, ensure data compliance, and improve overall business efficiency. As data continues to grow in volume and complexity, these processes will become even more critical for businesses to stay competitive in the digital age. Therefore, mastering these data management techniques is no longer a choice but a necessity for businesses in the B2B tech and retail sectors.

Related articles

Customer Stories
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native
Built With Census Embedded: Labelbox Becomes Data Warehouse-Native

Every business’s best source of truth is in their cloud data warehouse. If you’re a SaaS provider, your customer’s best data is in their cloud data warehouse, too.

Best Practices
Keeping Data Private with the Composable CDP
Keeping Data Private with the Composable CDP

One of the benefits of composing your Customer Data Platform on your data warehouse is enforcing and maintaining strong controls over how, where, and to whom your data is exposed.

Product News
Sync data 100x faster on Snowflake with Census Live Syncs
Sync data 100x faster on Snowflake with Census Live Syncs

For years, working with high-quality data in real time was an elusive goal for data teams. Two hurdles blocked real-time data activation on Snowflake from becoming a reality: Lack of low-latency data flows and transformation pipelines The compute cost of running queries at high frequency in order to provide real-time insights Today, we’re solving both of those challenges by partnering with Snowflake to support our real-time Live Syncs, which can be 100 times faster and 100 times cheaper to operate than traditional Reverse ETL. You can create a Live Sync using any Snowflake table (including Dynamic Tables) as a source, and sync data to over 200 business tools within seconds. We’re proud to offer the fastest Reverse ETL platform on the planet, and the only one capable of real-time activation with Snowflake. 👉 Luke Ambrosetti discusses Live Sync architecture in-depth on Snowflake’s Medium blog here. Real-Time Composable CDP with Snowflake Developed alongside Snowflake’s product team, we’re excited to enable the fastest-ever data activation on Snowflake. Today marks a massive paradigm shift in how quickly companies can leverage their first-party data to stay ahead of their competition. In the past, businesses had to implement their real-time use cases outside their Data Cloud by building a separate fast path, through hosted custom infrastructure and event buses, or piles of if-this-then-that no-code hacks — all with painful limitations such as lack of scalability, data silos, and low adaptability. Census Live Syncs were born to tear down the latency barrier that previously prevented companies from centralizing these integrations with all of their others. Census Live Syncs and Snowflake now combine to offer real-time CDP capabilities without having to abandon the Data Cloud. This Composable CDP approach transforms the Data Cloud infrastructure that companies already have into an engine that drives business growth and revenue, delivering huge cost savings and data-driven decisions without complex engineering. Together we’re enabling marketing and business teams to interact with customers at the moment of intent, deliver the most personalized recommendations, and update AI models with the freshest insights. Doing the Math: 100x Faster and 100x Cheaper There are two primary ways to use Census Live Syncs — through Snowflake Dynamic Tables, or directly through Snowflake Streams. Near real time: Dynamic Tables have a target lag of minimum 1 minute (as of March 2024). Real time: Live Syncs can operate off a Snowflake Stream directly to achieve true real-time activation in single-digit seconds. Using a real-world example, one of our customers was looking for real-time activation to personalize in-app content immediately. They replaced their previous hourly process with Census Live Syncs, achieving an end-to-end latency of <1 minute. They observed that Live Syncs are 144 times cheaper and 150 times faster than their previous Reverse ETL process. It’s rare to offer customers multiple orders of magnitude of improvement as part of a product release, but we did the math. Continuous Syncs (traditional Reverse ETL) Census Live Syncs Improvement Cost 24 hours = 24 Snowflake credits. 24 * $2 * 30 = $1440/month ⅙ of a credit per day. ⅙ * $2 * 30 = $10/month 144x Speed Transformation hourly job + 15 minutes for ETL = 75 minutes on average 30 seconds on average 150x Cost The previous method of lowest latency Reverse ETL, called Continuous Syncs, required a Snowflake compute platform to be live 24/7 in order to continuously detect changes. This was expensive and also wasteful for datasets that don’t change often. Assuming that one Snowflake credit is on average $2, traditional Reverse ETL costs 24 credits * $2 * 30 days = $1440 per month. Using Snowflake’s Streams to detect changes offers a huge saving in credits to detect changes, just 1/6th of a single credit in equivalent cost, lowering the cost to $10 per month. Speed Real-time activation also requires ETL and transformation workflows to be low latency. In this example, our customer needed real-time activation of an event that occurs 10 times per day. First, we reduced their ETL processing time to 1 second with our HTTP Request source. On the activation side, Live Syncs activate data with subsecond latency. 1 second HTTP Live Sync + 1 minute Dynamic Table refresh + 1 second Census Snowflake Live Sync = 1 minute end-to-end latency. This process can be even faster when using Live Syncs with a Snowflake Stream. For this customer, using Census Live Syncs on Snowflake was 144x cheaper and 150x faster than their previous Reverse ETL process How Live Syncs work It’s easy to set up a real-time workflow with Snowflake as a source in three steps: