HubSpot Gotcha #1: Identity De-duplication

This post is an excerpt from my upcoming book — Mastering HubSpot. Sign-up for the pre-launch list to receive an exclusive discount.

This gotcha is one that most HubSpot users have no awareness of, but are affected by. Even if you take no action after reading this, it’s very important to know how identity de-duplication works.

Envision the following scenario, if you will. You e-mail a list of prospects to promote an exclusive webinar.

One of your prospects, Dwight Schrute, opens the email and thinks, “Wow! This is the most amazing webinar topic I’ve ever seen!” He clicks the link, fills out the web form, and registers for the event.

Then, it occurs to Dwight that his collegues Jim and Pam would also love this webinar, so he reloads the landing page and registers them, too.

Not only did your champion Dwight reconvert, but now we have 2 new leads from Dunder Mifflin to nurture. Awesome, right?

Not so fast.

What actually happens

Behind the scenes, all 3 form submissions are merged into the original Dwight Schrute contact record! Because each of the form submissions happened within the same browser session, with the same cookies, we don’t get new contact records for Jim and Pam.

In fact, the very last form submission steals Dwight’s identity. Now Pam is the only Dunder Mifflin lead we have in HubSpot and all of Dwight’s prior activity is attributed to her.

Perhaps the worst of it is that you have no way of knowing when this happens. Brand new leads can evaporate and you wouldn’t even know it.


5 minutes ago this was Dwight’s profile. Now it belongs to Pam:

Pam steals Dwight's identity

Common causes

There are other common scenarios that trigger this problem, as you can imagine. The two that I find most commonly are:

The logic

HubSpot’s identity de-duplication logic works like this:

The rationale

The rationale behind the way HubSpot de-duplicates is based on a concept called sandbagging.

The idea is this: Dwight browses your website a few times. You’re tracking his movement with a cookie. Eventually, he downloads an ebook with his address because he doesn’t trust you yet.

At that moment, all of Dwight’s previously anonymous activity is associated with his new identity

The ebook is legit and two weeks later Schrute requests a live demo of your product from the sales teams. This time, he uses

Now, HubSpot doesn’t want to create a new contact record just because Dwight’s email address changed. Doing so would mean that we wouldn’t associate any of the activity (visits, submissions, etc.) that occurred prior to requesting the demo.

Alternatively, by using the cookie to de-duplicate the identities, HubSpot can group all of Dwight’s historical activity under one record. In the example above, and in many cases, this behavior is exactly what we’d want.

How KISSmetrics does de-duplication

As an aside, KISSmetrics, an analytics app that focuses on tracking people, runs into the same exact problem, but they handle it differently. As soon as a form submission (or some other identifying event) occurs, KISSmetrics creates a new identity and begins attributing activity in the browser session to that new identity.

Best practices for preventing lead merging

First, tell salespeople and VARs to always use an incognito browser window or clear their cookies when submitting forms on behalf of prospects and customers. If they’ll listen, great! Unfortunately, this won’t work for people outside of your organization (e.g., the Dwight, Jim, Pam scenario), but I’ve found the biggest offenders are inhouse.

Another option–which I would only advocate if this problem is killing you–would be to use your own forms (not HubSpot’s) and send the form submission data to HubSpot via the Forms API without passing the user token. There are major drawbacks to this workaround, mainly with respect to activity tracking, so I won’t detail exactly how to implement it. Visit the HubSpot API Google Group if you have questions about this.

The way forward

HubSpot opted to solve for the sandbagger problem over the lead merging problem. Personally, if I had to choose between the two evils, I’d have done the reverse (a la KISSmetrics).

The reality is, however, that in the vast majority of cases, all form submissions are done by a single person under a single email address and you don’t have to worry about lead evaporation or sandbagging.

My proposed solution, which I’ve discussed with the product manager at HubSpot, is to give portals a checkbox that forces de-duplicate based on email address instead of cookie.

From this HubSpot Ideas thread, it looks like they might implement this suggestion at the form level, which I’m also cool with!

Another terrific option would be to provide a way to quickly produce a report that shows you all contact records that have been aliased/de-duplicated with a side-by-side diff of the former and current data and a big button labeled SPLIT that let you untangle the contact records if it makes sense to do so.

Be the first to hear when Mastering HubSpot is released!

Signup for exclusive content, a discount, and progress reports.