$D = \frac{1}{2 \cdot P \cdot (1-P)} \sum_{i=1}^n \frac{t_i}{T} \cdot | p_i - P | \label{equ:dissimilarity}$

The normalization factor $2 \cdot P \cdot (1-P)$ is to obtain an index in the range $[0, 1]$. Since $D$ measures dispersion of minorities over the units, higher values of the index mean higher segregation. Dissimilarity is minimum when for all $i \in [1, n]$, $p_i = P$, namely the distribution of the minority group is uniform over units. It is maximum when for all $i \in [1, n]$, either $p_i = 1$ or $p_i = 0$, namely every unit includes members of only one group (complete segregation). The second widely adopted index is the *information index*, also known as the *Theil index* in social sciences {cite}`Mora2011` and normalized mutual information in machine learning {cite}`mitchell1997`. Let the population entropy be $E = - P \cdot \log{P}-(1-P) \cdot \log{(1-P)}$, and the entropy of unit $i$ be $E_i = - p_i \cdot \log{p_i}-(1-p_i) \cdot \log{(1-p_i)}$. The information index is the weighted mean fractional deviation of every unit's entropy from the population entropy:

$H = \sum_{i=1}^n \frac{t_i}{T} \cdot \frac{(E-E_i)}{E}$

Information index ranges in $[0, 1]$. Since it denotes a relative reduction in uncertainty in the distribution of groups after considering units, higher values mean higher segregation of groups over the units. Information index reaches the minimum when all the units respect the global entropy (full integration), and the maximum when every unit contains only one group (complete segregation). The third evenness measure is the *Gini index*, defined as the mean absolute difference between minority proportions weighted across all pairs of units, and normalized to the maximum weighted mean difference. In formula:

$\label{eq:Gini} G = \frac{1}{2 \cdot T^2 \cdot P \cdot (1-P)} \cdot \sum_{i=1}^n \sum_{j=1}^n t_i \cdot t_j \cdot |p_i - p_j|$

Here $\sum_{i=1}^n \sum_{j=1}^n t_i \cdot t_j \cdot |p_i - p_j|$ is the weighted mean absolute difference. The normalization factor is obtained by maximizing such a value. The definition of the Gini index stems from econometrics, where it is used as a measure of the inequality of income distribution {cite}`gastwirth1971general`. The Gini index ranges in $[0, 1]$ with higher values denoting higher segregation. The maximum and minimum values are reached in the same cases of the dissimilarity index. *Exposure indexes.* Exposure indexes measure the degree of potential contact, or possibility of interaction, between members of social groups. The most used measure of exposure is the *isolation index* {cite}`bell1954probability`, defined as the likelihood that a member of the minority group is exposed to another member of the same group in a unit. For a unit $i$, this can be estimated as the product of the likelihood that a member of the minority group is in the unit ($m_i/M$) by the likelihood that she is exposed to another minority member in the unit ($m_i/t_i$, or $p_i$) -- assuming that the two events are independent. In formula:

$I = \frac{1}{M} \cdot \sum_{i=1}^n m_i \cdot p_i$ The right hand-side formula can be read as the minority-weighted average of minority proportions in units. The isolation index ranges over $[P, 1]$, with higher values denoting higher segregation. The minimum value is reached when for $i \in [1, n]$, $p_i = P$, namely the distribution of the minority group is uniform over the units. The maximum value is reached when there is only one $k \in [1, n]$ such that $m_k = t_k = M$, namely there is a unit containing all minority members and no majority member. A dual measure is the *interaction index*, which is the likelihood that a member of the minority group is exposed to a member of the majority group in a unit. By reasoning as above, this leads to the formula:

$\mathit{Int} = \frac{1}{M} \cdot \sum_{i=1}^n m_i \cdot (1-p_i)$

It clearly holds that $I + \mathit{Int} = 1$. Hence, lower values denote higher segregation. A more general definition of interaction index occurs when more than two groups are considered in the analysis, so that the exposure of the minority group to one of the other groups is worth to be considered {cite}`massey1988dimensions`. The key problem of assessing social segregation has been investigated by hypothesis testing, i.e., by formulating one or more possible contexts of segregation against a certain social group, and then in empirically testing such hypotheses. Such an approach is currently supported by statistical tools, such as the R packages *OasisR*[^oasis] and *seg*[^seg] {cite}`seg2014`, or by GIS tools such as the *Geo-Segregation Analyzer*[^geo-seg] {cite}`geosa2014`. A tool for multidimensional exploration of segregation index has been proposed[^multi-seg] in {cite}`DBLP:conf/edbt/0001R19`. ## Bibliography ```{bibliography} :style: unsrt :filter: docname in docnames ``` > This entry was readapted from *Alessandro Baroni and Salvatore Ruggieri. Segregation discovery in a social network of companies. J. Intell. Inf. Syst., 51(1):71–96, 2018* by Salvatore Ruggieri [^oasis]: [cran.r-project.org/package=OasisR](https://cran.r-project.org/package=OasisR) [^seg]: [cran.r-project.org/package=seg](https://cran.r-project.org/package=seg) [^geo-seg]: [geoseganalyzer.ucs.inrs.ca](http://geoseganalyzer.ucs.inrs.ca) [^multi-seg]: [github.com/ruggieris/SCube](https://github.com/ruggieris/SCube)