In today’s world, companies and organizations share huge amounts of data, often believing it’s safe because they’ve “anonymized” it. Just strip out the names and addresses, right? Not so fast. One researcher, Latanya Sweeney from Carnegie Mellon University, proved just how easy it is to re-identify people—even with only a few demographic details.
Let’s dive into what her study found and what it means for all of us.
Sweeney’s research—aptly titled “Simple Demographics Often Identify People Uniquely”—is eye-opening. Using just three common data points—ZIP code, gender, and date of birth—she was able to pinpoint individuals with a startling level of accuracy. In fact, her analysis of U.S. Census data showed that 87% of the U.S. population could be uniquely identified using just those three fields. That’s nearly 9 out of 10 people!
If you widen the area to a city or town level, over half (about 53%) of people are still uniquely identifiable using just gender and birth date. Even at the broader county level, 1 in 5 could still be singled out.
About 53% of people are still uniquely identifiable using just gender and birth date. Even at the broader county level, 1 in 5 could still be singled out.
Think about the types of data we’re constantly sharing, even in the healthcare world. It’s common practice for health data to be “de-identified” before being shared for research or other purposes. Typically, this means removing direct identifiers like your name or social security number. But what if your ZIP code, gender, and birth date are still in there?
Sweeney showed how easy it is to link these attributes to publicly available records. For example, she bought a voter registration list for just $20 that contained names, addresses, and birth dates. When she cross-referenced it with a supposedly “anonymous” health dataset that had only ZIP, gender, and birth date, it became simple to identify people and connect sensitive medical records to real names.