Identity Graph Data: Best Identity Graph Datasets & Databases
What is Identity Graph Data?
Identity graph data is information connecting all the devices an individual uses. Identity data APIs are used by marketers and advertisers e.g in omnichannel marketing and cross-device targeting. Datarade helps you find the best cross-device graph data providers and datasets. Learn more
Recommended Identity Graph Data Products
DrivenIQ Consumer Identity Graph | Website Visitor Resolution | ResolveID
Factori_Identity Data ( Hashed email linked to unique Id with UID2.0)
Alesco Phone ID Database - Identity Graph Data with over 598 Million Phone Number, covers 94% of the US population - available for licensing!
Identity Graph Data USA by Datastream Group (MAIDs matched to PII)
100 Million US Hashed Emails (HEM) to Raw Email to Mobile Number to Address to Person Linkage - HEM Linkage - Hashed Email Linkage - Identity Graph
Ameribase Digital - Identity Graph Resolution and Linkage Dataset - 60 MM HEMs to IP Pairs
Reklaim | Consumer Identity Graph | USA & Canada | +100m |
TrueData Identity Graph (UID2.0, HEM, Maid, AAID)
Versium's Business REACH Digital, Enrichment (B2B), USA, GDPR and CCPA Compliance - Improve your match rates by 3x-5x, reach your targets online.
Throtle ID Graph - Deterministic Identity Data USA Consumers
More Identity Graph Data Products
The Ultimate Guide to Identity Graph Data 2023
What is Identity Graph Data?
Most consumers today use more than one device to stay connected to the internet. There are thousands of addressable media outlets, with each user using a range of devices.
The entire concept of marketing was simpler in the good old days of television, print, and radio when everything used to be sorted for marketers and all they had a pretty straightforward playground. However, with omnichannel marketing approach today, businesses need to address their consumers on all possible channels - out blogs, social media, review websites, and buying guides and the like.
Until very recently, managing online identity was all about matching customers’ online cookies and other online activities with CRM data and the task was done. However, today, the digital landscape is broader, and this is why their identity across the devices, browsers, video game consoles, and mobile app SDKs needs to be matched to follow the consumer wherever they go, regardless of the device they use.
This is when cross-device identity data comes into the role. Essentially, this data helps businesses in identifying users across various devices with the help of IDs and cross-device graphs.
How does Identity affect consumer behavior?
Defined as the process through which customers select, buy, use, and research goods and services, consumer behavior has grown into a key paradigm area of focus for contemporary marketers. Consumer behavior focuses on the specific activities of consumers in a given marketplace and the core drivers of these undertaken actions. As far as online marketing strategies are concerned, one of the ways to track down consumer behavior in online market places is through the creation of identities for potential customers. Identity affects consumer behavior in the sense that consumer’s habits and trends are used to uniquely identify them.
What is a deterministic Identity Graph?
Deterministic matching refers to the process of matching user profiles with 100% accuracy level by means of key identifiers such as email, phone or logged in username. In this algorithm-based identity resolution, multiple device relationships are established by joining them using personally identifiable information (PII). The user’s devices are linked in such a way that they are directly observed by the means of PII to a consumer, putting more emphasis on accuracy and preventing or limiting the false positives. Deterministic matching is an identity resolution approach whose desirability by marketers stems from its ability to target only actual buyers.
Because deterministic data has a complex collection methodology, on a data marketplace, the price of deterministic identity graph data is typically higher when compared to its corresponding probabilistic-based identity graph. Deterministic identity data matching is often better than probabilistic matching because it can be used to match consumer profiles with 100% certainty whereas probabilistic cannot and relies on user identifiers like Browser, IP address or device type which are less certain. Both types of identity data can be useful depending on your business, but deterministic graphs are the preferred means of creating people-based marketing as specific consumer’s wants and needs can be seen more clearly when you know for certain it is them accessing your site.
However, deterministic matches alone do not guarantee a high-quality identity dataset. Think about the times you may have signed up to a website with a fake email address. The best commercial identity datasets and graphs will use live deterministic match evidence as a base starting point which is then broken down and analyzed to provide accurate and real-time information. When it comes to privacy-compliance, identity datasets that use first-party data are always better because you know that the identity data has come directly from the site accessed by the user and that they have consented to the site’s use of their data.
What is a probabilistic Identity graph?
While a deterministic identity graph presents a user’s highly accurate data based on PII, a probabilistic identity graph identifies a user by means of non-subjective methods such as IP address, device type, browsers and the device’s operating system. Technically, probabilistic graphs formulate device relationships by means of a knowledge base of linkage data and predictive algorithms as the core facet of an ID graph. Probabilistic graphs enable tacit grouping of devices by means of fingerprinting, IP address identification for specific machines, screen resolution, OS, GPRS location tracking and Wi-Fi network. If marketers are looking for people who might buy or be interested in a specific product, then a probabilistic graph is their best bet.
So, when shopping for identity graph data, marketers ought to understand their business needs and make a choice between deterministic and probalistic.
Who uses ID Graph Data and for what use cases?
Essentially, ID graph data is used to recognize customers and users across a range of internet-connected devices that they use. This data is then used to obtain critical hidden insights like user behavior, user preferences, and user demographics and so on.
A range of businesses across all major verticals rely on identity graphs during their data onboarding procedures. It helps the comapnies fuel their marketing campaigns, whether it be a tour and travel company, an apparel brand, ecommerce store, or a healthcare company.
Here are a few use cases of cross-device identity data:
Global frequency management
As important it is to stay connected with your customers, it is equally important that you maintain the right balance. You bombard them with too many marketing communications, and the chances are high that they will unsubscribe themselves from your database.
This is when cross-channel identity data provides you with details on all the users you have reached out to on mobile, search, email, video, or display. Most marketers use this data to ensure that they don’t end up communicating the same message to each of these users on different devices.
Device-identity data allows you to target a user through different ads on different stages during their consumer journey, regardless of the device they are using to connect online. This optimization is what separates award-winning digital marketers from the rest.
Customer journey modeling
Connecting user identity across platforms helps in understanding the mood of the target audiences. From tracking the last click to the first view, all these activities of a user can be tracked through cross-device identity data.
By analyzing disparate databases and datasets, you can match or resolve the identity of the owner of multiple devices. Using this means that businesses can differentiate between user records with similar names which could easily be confused or put into the same record. Often businesses use consent-managed identity resolution services for master data management (MDM) and customer data integration (CDI).
Uploading offline customer data, such as physical sales records, to the online environment is called data onboarding. This information is then used to associate historical data records with digital identifiers to create accurate consumer profiles. The historical identity data that is uploaded in data onboarding is useful for good marketing campaigns because businesses have a more complete profile of their consumers and their interests.
CRM Data Enrichment and Customer Data Enrichment
CRM and customer data enrichment are used by digital businesses for customer personalization, which is essential in today’s online marketplace where the companies that know their customers the best are the most successful. For these methods to work successfully, you need clean, up to date and enriched customer data which is reliable and usable. Enriched data will contain more than just basic customer information, such as phone number and email but will also have more in-depth consent-given identity data such as demographic and firmographic information.
When businesses have more accurate CRM and customer datasets they can create more accurately tailored marketing to campaigns to attract consumers and drive up profitability.
B2B Data Enrichment
B2B businesses need data enrichment to keep track of competition and stay ahead of the curve. B2B data enrichment allows them to make informed business decisions, for example when it comes to lead scoring. Properly evaluated and ranked leads helps businesses set clear goals when it comes to sales and marketing. Data enrichment means you can have the most in-depth consumer profiles possible based on the consent-given identity data available.
B2B data enrichment is also key for personalization. You can easily understand other businesses’ needs and wants by using the data available and from there have a higher success rate with them.
The importance of compliance with privacy regulations is well-known by people and businesses worldwide. Data enrichment can make sure businesses are compliant with these regulations ensuring that their datasets are complete and correct, but most importantly, that they are legal.
The ‘Know Your Customer’ (KYC) process is used to verify and validate a customer’s identity in order to ensure a safer business relationship. When it comes to business, knowing for certain that your customer is a real person is essential and especially in the banking industry there are certain mandatory KYC compliance regulations that must be followed.
There are different KYC checks that can be carried out to check that your customer is a genuine person. Proof of identity (POI) and proof of address (POA) documents are the two main types of documents needed for KYC checks and these must be recent and valid.
KYC is used primarily by finance businesses to reputational and customer risk, especially when it comes to avoiding illegal activities, like fraud and money laundering as well as identity theft. It is also used to verify customer’s identities and make sure they are real people to ensure transactions can be processed securely and smoothly.
The ‘Know Your Business’ (KYB) process is very similar to the KYC one. Both are used for identity verification, however where KYC is used to verify customer identity, KYB is used to identify companies and suppliers. Put simply, the difference is in the purpose of the identity verification check while the features and process of each are the same.
The KYB process is used to identify the legal representative of the person responsible for a business and to then verify their identity. In the same way as KYC checks, KYB checks are used to fight against crimes, such as money laundering and fraud, and in the financial sector they are often a mandatory legal compliance measure.
The amount of compliance regulations in place worldwide means there is a huge demand for KYC and KYB data and this is constantly increasing.
ABM (Account-based marketing)
Account-based marketing (ABM) works with sales to identify leads which are the best possible fit for their business’ product or service and are high value and then marketers can reach out to them. ABM offers a much more personalized approach to marketing than a more general, ‘blanket’ approach and has been shown to yield much more successful results in terms of generating revenue and increasing business profits.
ABM is almost exclusively used by B2B business because of the specificity of the B2B sales cycle and the number of people who are involved in the decision-making process of B2B sales. ABM data can be used by businesses to make the most well-informed decisions when reaching out to targets and can therefore increase revenue by landing the most successful and profitable deals.
Marketers use ideal customer profiles (ICPs) in ABM which help them know clearly what kind of customer they are looking for and then recognize these quickly when browsing through potential targets. Essentially, ABM works on the concept that a high-quality prospect will be more successful contacting a high-volume of leads and there is a lot of evidence to prove this. 91% of companies using ABM in 2020 increased their average deal size and companies have seen up to a 208% increase in revenue. For many companies’ marketing departments, ABM is a go-to method of nurturing their leads and securing conversion rates.
Which brands use Identity Data?
An identity graph refers to a database that stores all the key identifiers that correlate with specific customers. A given business can possess specific customer data in various systems, such as e-commerce platforms, CRM, email marketing tool or an ad platform. It is the work of an ID graph to analyse this information in these tools and stitch in a single identifiable profile. In data marketplaces, identity data vendors collect secure identity data by aggregrating consumers’ online activities.
Brands such a Netflix, Amazon Prime, HBO Max and Doordash are known to be among the biggest consumers of either historical or real-time identity graph data for the purpose of tailoring streaming content according to the user information. Microsoft is particularly keen on using identity graph data to ensure that users attempting to access a platform have the correct permissions. This means they can effectively control site access and prevent people accessing information to which they do not have the right.
What are examples of how Identity Graph Data is used in marketing?
From commercial identity datasets, marketers invest in data acquisition by buying identity graph data for the purpose of enhancing marketing strategies. From identity graphs, marketers are given the opportunity to stitch customer identities and create a ‘single view of customer’ that is considered more precise, up-to-date preview of customer attributes and behaviors. Amazon and Netflix both use identity graphs to enhance their marketing strategies. At Amazon, identity data is used to foster customer insights that are necessary towards the delivery of the type and level of personalization to each customer. It is from this data that the company, through big data analytics, has been able to carry out successful consumer targeted marketing strategies.
What is Identity stitching?
Identity stitching refers to a key use case for identity data, in which multiple information about a person is pulled from a number of platforms and brought together to create a unique identity of that person based on this amassed data. A modern day consumer is most likely to use multiple devices when accessing the internet every day. While it is possible that these consumers will log into these devices, hence identifying themselves, in most cases they do not login, making it hard to identify them from a string of data collected by the device browser’s cookies feature alone. However, enhanced data analysis mechanisms are used to bring together all these consumer data points to create unique profiles for each individual, a process referred to as identity stitching. This consumer data may range from login details such as email, phone numbers, browser cookies and payment data.
What is Identity linkage?
Identity linkage is the means of linking one person’s different user identities on business websites and social media and to ensure that they all belong to the same person. There are various privacy-compliant means of doing this.
Identity linkage can be a very powerful tool for businesses as they can get a much clearer picture of different users’ interests and browsing habits than if they had separate, unanalyzed data. From here, businesses can run powerful marketing campaigns in order to increase their success and ultimately their revenue. Identity linkage can also be used by businesses to build customer loyalty by logging their habits and then predicting their future wants and catering to this. You can also use identity linkage data to determine whether a consumer has a preexisting relationship with your business or if they are a first time user which can be used for promotions and reward schemes.
Increasingly, customers are using one account (e.g. their Facebook or Google account) to sign into different platforms which makes identity linkage much simpler, especially with the rise in cookieless browsing.
What does a cookieless world mean for Identity Data?
Google announced in early 2020 that its Chrome browser will be cookieless by 2022 and many other browsers are doing the same. This will have a huge impact on the way that many online businesses work as cookies have long been considered essential for initiatives such as identity linkage. However, don’t be fooled! The idea of going ‘cookieless’ doesn’t mean that there will be no cookies – instead browsers are blocking third-party cookies, while first-party ones will still remain embedded into websites. This means that website publishers will still have access to enough information to understand their customers and ensure they have the best online browsing experience tailored to them.
The shift to cookieless browsing has come from increasing concerns over data privacy as users demand greater transparency and control over websites’ use of their data. As cookie usage declines, data mining companies are turning solely to ID graphs to obtain consumer data for marketing purposes, which has lead to an increase in the cost of identity data.
What are typical Identity Graph Data attributes?
The attributes of device identity data are all those parameters which help in identifying a user and confirming their identity.
Some examples could include:
- Email or physical address
- Online cookies
- MAID Device IDs
- Phone numbers
- Account usernames
- IP numbers
… and essentially anything that can be linked back to the device and its identifier.
What is MAID to Hashed Email Data?
Mobile Advertising IDs (MAIDs) are digital IDs that are assigned to different mobile devices. These IDs will track your browsing and interests and are then used by websites to show you relevant ads based on these activities. Essentially, marketers use MAIDs to analyze different users’ interests and habits and then personalize advertising based on this. It’s easy to see how the use of MAIDs can be very powerful and effective when it comes to advertising. By matching a user’s mobile habits with their desktop, TV and offline habits, you get a much fuller picture of who they are and what they like. From there, you can tailor the most effective personalized advertising campaigns.
Hashed Email data
Email marketing has a strong ROI. On average for every $1 spent on email marketing, businesses can expect an average return of $42. It’s therefore not hard to see why the majority of businesses rely on email marketing campaigns as a way of targeting both existing and potential customers. A successful email marketing campaign, like all digital marketing campaigns, needs up-to-date and reliable consumer data to personalize emails and increase a business’ revenue. This is where email hashing comes in.
An email hashing algorithm changes email addresses into a hexadecimal string. This means that the code cannot be traced back to the individual email address, ensuring the protection of users’ privacy, but their browsing habits and interests can then be logged, analyzed and used for marketing. Put simply, email hashing means individual consumers’ online behavior can be recorded while their privacy is still protected. And the importance of email addresses is manifold – consider how many different accounts and websites you sign into with your email address. Marketers then have a huge capacity to see what you are browsing for, while your personal identity itself cannot be accessed. Additionally, hashed emails can track users’ activity across multiple devices meaning businesses can access the most coherent and accurate consumer identity data.
Used together, the marketing potential of MAIDs and email hashing is huge. Marketers can then track and record the browsing habits and interests of consumers while still complying with privacy guidelines. Additionally, while MAIDs only work on mobile devices, using the two together means you can link different accounts using the same hashed email address across different devices and browsers. This means a lot to marketers because they can get the most accurate consumer profile for each individual user and personalize marketing according to this. And as we well know, personalized marketing is the most effective and profitable form of consumer outreach.
How is Identity Graph Data collected?
A typical digital consumer owns multiple connected devices. Tracking a user identity in such a scenario is a tough task.
Majorly, there are two ways through which cross-device tracking is conducted: deterministic and probabilistic tracking.
Here, the identity of a user is tracked using personally identifiable information also called PII. This information includes email addresses, Facebook IDs, and so on. However, tracking users through this method requires a proper setup which giants like Facebook, Google, and Apple possess.
Quite predictably, this method obtains data on the basis of probability by tracking millions and billions of anonymous data points and tracking them altogether to gain insights on the devices. The elements that are often used to connect the dots include wifi networks, screen resolution, operating systems, and so on.
How to assess the quality of Identity Data?
Assessing the quality of cross-device identity data is fairly difficult. The best way here is to deal with a reputed data vendor who can assure you of the data quality.
Here are a few methods that you can adopt to assess the cross-device identity data quality:
Sample set for testing
Ask for a sample set for testing. This will help you in testing the waters and gauging that the data shared by you is actually authentic and reliable.
Ask for customer referrals
Perhaps, this is one of the best approaches. If customers of a particular data vendor seem to be happy, it is likely that the identity data company provides quality data.
Research and understand data providers’ collection methods
You might also like to have a look at the collection methods used by data vendor for collecting cross-device identity data. As it goes – logical and appropriate data collection methods will yield reliable data.
How Identity Graph Data is typically priced?
The price of cross-device identity data is generally based on the data delivery method:
You can get the data either through cloud storage or through push and pull APIs.
You can also get it delivered in raw form packed in JSON and CSV files.
What are the common challenges when buying Identity Data?
When it comes to extracting identity graph data, there are some challenges that crop up:
Is data authenticated or not?
Needless to say, the profiles of users come in two forms: authenticated and unauthenticated. It is no brainer that the data extracted from authenticated profiles is what you should be after.
This is because the data provided by the authenticated profile is genuine and reliable. On the other hand, the data from non-authenticated profiles is temporary and majorly consists of device-specific IDs and cookies.
Matching of data
As cross-device identity data spans across a number of devices, it is essential to ensure that the user using their laptop is matched to the same identity when he is using his tablet. A mismatch in these coordinates will disturb the purpose of buying cross-device data.
With the increase in ‘cookieless’ browsing, there is a potential for disrupting the effectiveness of identity linking. However, as we’ve seen, there are many means of identity linking which protect users’ personal data and information which can still be used to populate identity graphs.
What to ask your Identity Data providers?
Here are a few questions that you may want to ask your cross-device identity data provider:
- How do you extract the MAID device IDs?
- How do you test and evaluate your data?
- How are identity assets outsourced? Are they licensed or owned?
- How do you ensure the quality of your data?
- Are both the offline and online identifiers linked?
Where can I buy Identity Graph Data?
Data providers and vendors listed on Datarade sell Identity Graph Data products and samples. Popular Identity Graph Data products and datasets available on our platform are DrivenIQ Consumer Identity Graph | Website Visitor Resolution | ResolveID by DrivenIQ, Factori_Identity Data ( Hashed email linked to unique Id with UID2.0) by Factori, and Alesco Phone ID Database - Identity Graph Data with over 598 Million Phone Number, covers 94% of the US population - available for licensing! by Alesco Data.
How can I get Identity Graph Data?
You can get Identity Graph Data via a range of delivery methods - the right one for you depends on your use case. For example, historical Identity Graph Data is usually available to download in bulk and delivered using an S3 bucket. On the other hand, if your use case is time-critical, you can buy real-time Identity Graph Data APIs, feeds and streams to download the most up-to-date intelligence.
What are similar data types to Identity Graph Data?
Identity Graph Data is similar to Address Data, Phone Number Data, Cross-Device Identity Data, MAID - Hashed Email Data, and Identity Linkage Data. These data categories are commonly used for Advertising and Identity Resolution.
What are the most common use cases for Identity Graph Data?
The top use cases for Identity Graph Data are Advertising, Identity Resolution, and Data Onboarding.