In today’s world, companies and organizations share huge amounts of data, often believing it’s safe because they’ve “anonymized” it. Just strip out the names and addresses, right? Not so fast. One researcher, Latanya Sweeney from Carnegie Mellon University, proved just how easy it is to re-identify people—even with only a few demographic details.
Let’s dive into what her study found and what it means for all of us.
Sweeney’s research—aptly titled “Simple Demographics Often Identify People Uniquely”—is eye-opening. Using just three common data points—ZIP code, gender, and date of birth—she was able to pinpoint individuals with a startling level of accuracy. In fact, her analysis of U.S. Census data showed that 87% of the U.S. population could be uniquely identified using just those three fields. That’s nearly 9 out of 10 people!
If you widen the area to a city or town level, over half (about 53%) of people are still uniquely identifiable using just gender and birth date. Even at the broader county level, 1 in 5 could still be singled out.
Think about the types of data we’re constantly sharing, even in the healthcare world. It’s common practice for health data to be “de-identified” before being shared for research or other purposes. Typically, this means removing direct identifiers like your name or social security number. But what if your ZIP code, gender, and birth date are still in there?
Sweeney showed how easy it is to link these attributes to publicly available records. For example, she bought a voter registration list for just $20 that contained names, addresses, and birth dates. When she cross-referenced it with a supposedly “anonymous” health dataset that had only ZIP, gender, and birth date, it became simple to identify people and connect sensitive medical records to real names.
What does this tell us? Stripping out names and phone numbers isn’t enough. If a few demographic details are left, your “anonymous” data might still be screaming your identity to anyone who knows how to listen.
This is a wake-up call for organizations handling data: it’s time to rethink how we anonymize information. Here are a few suggestions on how to do it better:
There’s no such thing as “perfectly anonymous” data. In the age of big data, small details can be all it takes to trace back to an individual. So next time you hear that a dataset is “de-identified” or “anonymous,” remember: it’s not as safe as you think.
Want to stay truly private? It’s all about knowing what’s being shared—and pushing for stronger privacy safeguards in every dataset.
Source:
https://dataprivacylab.org/projects/identifiability/paper1.pdf
https://dataprivacylab.org/projects/identifiability/index.html
https://privacytools.seas.harvard.edu/publications/simple-demographics-often-identify-people-uniquely