t-closeness#

In Brief#

t-closeness aims to maintain the distribution of sensitive attributes in the anonymity by indistinguishability paradigm, ensuring that the distance between the two distributions (the original and the private ones) should be limited by a threshold t.

More in Detail#

t-closeness [1] was proposed to overcome one of the weakness of k-anonymity against the backgound knowledge attack. In a background knowledge attack, an attacker knows information useful to associate some quasi-identifiers with some sensitive attributes: as an example, in the medical domain, it is common knowledge that certain diseases are more frequent in a specific gender. Or, again, it is a well-known information that Asian people are less suscettible to heart diseases.

Indeed, suppose to publish the 3-anonymous dataset depicted in Table 4, and suppose that an attacker is searching for Umeko, a 20 years old Japanese, living in the 300112 area. The anonymity set containing Umeko is also l-diverse (see l-diversity. However, it is well-known that Japanese have an extremely low incidence of heart disease. Thus, the adversary can exploit this statistical information to infer, with a good probability, that Umeko suffers from a viral infection.

Table 4 A toy example representing a 3-anonymous dataset, obtained generalizing the date of birth and the ZIP code of patients. Here, each patient is included in an anonymity set of at least 3 individuals.#

Pseudo-id

Gender

Date of Birth

ZIP code

Disease

1000

M

[2000-2010]

30011*

Lyme Disease

1001

M

[2000-2010]

30011*

Viral Infection

1002

M

[2000-2010]

30011*

Stroke

1003

F

[2000-2010]

30011*

Stroke

1004

F

[2000-2010]

30011*

Stroke

1005

F

[2000-2010]

30011*

Stroke

1006

F

[2000-2010]

30011*

Viral Infection

In order to protect people in a dataset with the t-closeness mechanism, indeed, we need to guarantee that the distribution of a sensitive attribute in any equivalence class must be close to the distribution of the attribute in the overall dataset (i.e., the distance between the two distributions should be no more than a threshold t)

Bibliography#

1

Ninghui Li, Tiancheng Li, and Suresh Venkatasubramanian. T-closeness: privacy beyond k-anonymity and l-diversity. In 23rd International Conference on Data Engineering. 2007.

This entry was written by Francesca Pratesi.