Compute the Gini-Simpson index of a compositional vector

This function computes the Gini-Simpson index, a statistical measure of variability known in population genetics as heterozygosity, of avector of non-negative entries which sum to 1. The function returns a number between 0 and 1 which quantifies the variability of the vector. Values of 0 are achieved when the vector is a permutation of (1,0,..., 0). The value approaches 1 as the number of categories K increases when the vector is equal to (1/K, 1/K, ..., 1/K).

Usage

gini_simpson(q, K = length(q), S = diag(K))

Arguments

q: A vector with K=length(q) non-negative entries that sum to 1.
K: Optional; an integer specifying the number of categories in the data. Default is K=length(q).
S: Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry S[i,k] for i!=k is the similarity between category and i and category k, equalling 1 if the categories are to be treated as identical and equaling 0 if they are to be treated as totally dissimilar. The default value is S = diag(ncol(q)).

Value

A numeric value between 0 and 1.

Examples

# Compute unweighted Gini-Simpson index:
gini_simpson(q = c(0.4, 0.3, 0.3))
#> [1] 0.66

# Compute Gini-Simpson index assuming that
# categories 1 and 2 are identical:
similarity_matrix = diag(3)
similarity_matrix[1,2] = 1
similarity_matrix[2,1] = 1
gini_simpson(q = c(0.4, 0.3, 0.3), S = similarity_matrix)
#> [1] 0.42