blog_image

Simple Demographics Often Identify People Uniquely

PII
December 18, 2024

Think Your Data is Anonymous? Think Again. Here’s Why “De-Identified” Data Isn’t Really Anonymous.

In today’s world, companies and organizations share huge amounts of data, often believing it’s safe because they’ve “anonymized” it. Just strip out the names and addresses, right? Not so fast. One researcher, Latanya Sweeney from Carnegie Mellon University, proved just how easy it is to re-identify people—even with only a few demographic details.

Let’s dive into what her study found and what it means for all of us.

The Surprising Power of Simple Demographics

Sweeney’s research—aptly titled “Simple Demographics Often Identify People Uniquely”—is eye-opening. Using just three common data points—ZIP code, gender, and date of birth—she was able to pinpoint individuals with a startling level of accuracy. In fact, her analysis of U.S. Census data showed that 87% of the U.S. population could be uniquely identified using just those three fields. That’s nearly 9 out of 10 people!

If you widen the area to a city or town level, over half (about 53%) of people are still uniquely identifiable using just gender and birth date. Even at the broader county level, 1 in 5 could still be singled out.

Why Should We Care?

Think about the types of data we’re constantly sharing, even in the healthcare world. It’s common practice for health data to be “de-identified” before being shared for research or other purposes. Typically, this means removing direct identifiers like your name or social security number. But what if your ZIP code, gender, and birth date are still in there?

Sweeney showed how easy it is to link these attributes to publicly available records. For example, she bought a voter registration list for just $20 that contained names, addresses, and birth dates. When she cross-referenced it with a supposedly “anonymous” health dataset that had only ZIP, gender, and birth date, it became simple to identify people and connect sensitive medical records to real names.

What This Means for Privacy

What does this tell us? Stripping out names and phone numbers isn’t enough. If a few demographic details are left, your “anonymous” data might still be screaming your identity to anyone who knows how to listen.

This is a wake-up call for organizations handling data: it’s time to rethink how we anonymize information. Here are a few suggestions on how to do it better:

  1. Add Some Noise: Ever heard of differential privacy? It’s a technique where a little bit of randomness is added to data to make it harder to identify specific individuals. It might make the data a little less precise, but it does a lot to protect people’s privacy.
  2. Be Less Specific: If you’re sharing data, ask yourself—do I really need to include the exact birth date? Maybe using just the year would be enough. Or maybe you can generalize ZIP codes into broader regions. Less precision can sometimes mean a lot more privacy.
  3. Limit What You Share: Sometimes, less is more. The more details you include, the greater the chance that those little bits of info can be pieced together to reveal someone’s identity. Be intentional about what data fields you actually need to share.

There’s no such thing as “perfectly anonymous” data. In the age of big data, small details can be all it takes to trace back to an individual. So next time you hear that a dataset is “de-identified” or “anonymous,” remember: it’s not as safe as you think.

Want to stay truly private? It’s all about knowing what’s being shared—and pushing for stronger privacy safeguards in every dataset.

Source:

https://dataprivacylab.org/projects/identifiability/paper1.pdf

https://dataprivacylab.org/projects/identifiability/index.html

https://privacytools.seas.harvard.edu/publications/simple-demographics-often-identify-people-uniquely

Link copied to clipboard!

GenAI Security Platform

We are currently onboarding a few design partners. If you are looking to unleash the power of GenAI in your organization without compromising sensitive data, we want to talk to you.
Invalid email address. Please add a valid workspace email.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.