# $\epsilon$-Differential Privacy
*Synonyms*: $\epsilon$-indistinguishability.
## In brief
**$\epsilon$-Differential Privacy** is the simpler form of {doc}`L2.differential_privacy`, where $\epsilon$ represents the level of privacy guarantee.
## More in Detail
### The Statistical Databases Context
The seminal $\epsilon$-differential privacy model lays down the basic
notions related to differential privacy by formalizing the intuitive
requirement that *the possible impact of any single individual on the
output of a differentially private function must be limited* (see
{doc}`./L2.differential_privacy`). The initial $\epsilon$-differential
privacy model focuses on the context of statistical databases: the
private dataset is a table $\mathcal{D}$ in which each individual
contributes at most one record, the system answers interactively to a
sequence of statistical queries over $\mathcal{D}$ (e.g., a sequence
of queries containing counts, sums, averages, *etc*), and the
$\epsilon$-differential privacy model aims at limiting the information
leakage about the private dataset.
### Formalizing Differential Privacy
In a nutshell, the $\epsilon$-differential privacy model requires that
the presence/absence of any *possible* individual does not shift any
output probability by more than a factor of $e^\epsilon$. More
precisely, a *random function* $\mathtt{f}$ with range $\mathcal{O}$
satisfies $\epsilon$-differential privacy if and only if for all
possible pairs of datasets ($\mathcal{D}$, $\mathcal{D}'$) such that
$\mathcal{D}'$ is $\mathcal{D}$ with *one record more or one record
less*, and for all $\mathcal{S} \subseteq \mathcal{O}$, then it holds
that: $\mathtt{Pr} [ \mathtt{f} ( \mathcal{D} ) \in \mathcal{S} ] \leq
e^\mathbf{\epsilon} \times \mathtt{Pr} [ \mathtt{f} ( \mathcal{D}' )
\in \mathcal{S} ]$ where $\epsilon>0$ is the privacy parameter.
Let us comment the above definition. First, the function $\mathtt{f}$
can be any arbitrary function, including the usual statistical
functions (e.g., counts, sums) but not restricted to them. Second, the
pairs of datasets whose output distributions must not differ too much
are taken from the full space of the possible datasets; they are not
derived from the actual private dataset. Third, the impact of an
individual is defined based on the presence (or absence) of his/her
record in (or from) any possible dataset. Pairs of datasets that
differ on the presence/absence of a single record are called
*neighboring datasets*. Note that variants might exist (e.g., by
considering that neighboring datasets are datasets that differ on the
value of a single row). Fourth, the value of $\epsilon$ sets the
tolerance of the model to the possible impacts of individuals on the
output of $\mathtt{f}$: the lower the $\epsilon$ the more stringent
the requirement. Common values range from $\epsilon=0.01$ to
$\epsilon=10$.
Please see {doc}`./L2.perturbation_mechanisms` for a synthesis of
**how** common functions can be adapted in order to satisfy
$\epsilon$-differential privacy.
### Self-Composability and Safety Under Post-Processing
The $\epsilon$-differential privacy model is self-composable as
follows. The *parallel composition* of two functions, respectively
satisfying $\epsilon_1$-differential privacy and
$\epsilon_2$-differential privacy, satisfies $\max (\epsilon_1,
\epsilon_2)$-differential privacy. Their *sequential composition*
satisfies $(\epsilon_1 + \epsilon_2)$-differential privacy. The
$\epsilon$-differential privacy model is as well convex and safe under
post-processing.
## Bibliography
The $\epsilon$-differential privacy model was introduced in
{cite:p}`Dwork2006DifferentialP` and the
$\epsilon$-indistinguishability model in
{cite:p}`Dwork2006CalibratingNT`.
```{bibliography}
:style: unsrt
:filter: docname in docnames
```
> This entry was written by Tristan Allard.