\(\epsilon\)-Differential Privacy#

Synonyms: \(\epsilon\)-indistinguishability.

In brief#

\(\epsilon\)-Differential Privacy is the simpler form of Differential Privacy, where \(\epsilon\) represents the level of privacy guarantee.

More in Detail#

The Statistical Databases Context#

The seminal \(\epsilon\)-differential privacy model lays down the basic notions related to differential privacy by formalizing the intuitive requirement that the possible impact of any single individual on the output of a differentially private function must be limited (see Differential Privacy). The initial \(\epsilon\)-differential privacy model focuses on the context of statistical databases: the private dataset is a table \(\mathcal{D}\) in which each individual contributes at most one record, the system answers interactively to a sequence of statistical queries over \(\mathcal{D}\) (e.g., a sequence of queries containing counts, sums, averages, etc), and the \(\epsilon\)-differential privacy model aims at limiting the information leakage about the private dataset.

Formalizing Differential Privacy#

In a nutshell, the \(\epsilon\)-differential privacy model requires that the presence/absence of any possible individual does not shift any output probability by more than a factor of \(e^\epsilon\). More precisely, a random function \(\mathtt{f}\) with range \(\mathcal{O}\) satisfies \(\epsilon\)-differential privacy if and only if for all possible pairs of datasets (\(\mathcal{D}\), \(\mathcal{D}'\)) such that \(\mathcal{D}'\) is \(\mathcal{D}\) with one record more or one record less, and for all \(\mathcal{S} \subseteq \mathcal{O}\), then it holds that: \(\mathtt{Pr} [ \mathtt{f} ( \mathcal{D} ) \in \mathcal{S} ] \leq e^\mathbf{\epsilon} \times \mathtt{Pr} [ \mathtt{f} ( \mathcal{D}' ) \in \mathcal{S} ]\) where \(\epsilon>0\) is the privacy parameter.

Let us comment the above definition. First, the function \(\mathtt{f}\) can be any arbitrary function, including the usual statistical functions (e.g., counts, sums) but not restricted to them. Second, the pairs of datasets whose output distributions must not differ too much are taken from the full space of the possible datasets; they are not derived from the actual private dataset. Third, the impact of an individual is defined based on the presence (or absence) of his/her record in (or from) any possible dataset. Pairs of datasets that differ on the presence/absence of a single record are called neighboring datasets. Note that variants might exist (e.g., by considering that neighboring datasets are datasets that differ on the value of a single row). Fourth, the value of \(\epsilon\) sets the tolerance of the model to the possible impacts of individuals on the output of \(\mathtt{f}\): the lower the \(\epsilon\) the more stringent the requirement. Common values range from \(\epsilon=0.01\) to \(\epsilon=10\).

Please see Achieving Differential Privacy for a synthesis of how common functions can be adapted in order to satisfy \(\epsilon\)-differential privacy.

Self-Composability and Safety Under Post-Processing#

The \(\epsilon\)-differential privacy model is self-composable as follows. The parallel composition of two functions, respectively satisfying \(\epsilon_1\)-differential privacy and \(\epsilon_2\)-differential privacy, satisfies \(\max (\epsilon_1, \epsilon_2)\)-differential privacy. Their sequential composition satisfies \((\epsilon_1 + \epsilon_2)\)-differential privacy. The \(\epsilon\)-differential privacy model is as well convex and safe under post-processing.

Bibliography#

The \(\epsilon\)-differential privacy model was introduced in [1] and the \(\epsilon\)-indistinguishability model in [2].

1

Cynthia Dwork. Differential privacy. In ICALP. 2006.

2

Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC. 2006.

This entry was written by Tristan Allard.