This function computes the Gini-Simpson index, a statistical measure of variability known in population genetics as heterozygosity, of avector of non-negative entries which sum to 1. The function returns a number between 0 and 1 which quantifies the variability of the vector. Values of 0 are achieved when the vector is a permutation of (1,0,..., 0). The value approaches 1 as the number of categories K increases when the vector is equal to (1/K, 1/K, ..., 1/K).
Arguments
- q
A vector with
K=length(q)
non-negative entries that sum to 1.- K
Optional; an integer specifying the number of categories in the data. Default is
K=length(q)
.- S
Optional; a K x K similarity matrix with diagonal elements equal to 1 and off-diagonal elements between 0 and 1. Entry
S[i,k]
fori!=k
is the similarity between category andi
and categoryk
, equalling 1 if the categories are to be treated as identical and equaling 0 if they are to be treated as totally dissimilar. The default value isS = diag(ncol(q))
.
Examples
# Compute unweighted Gini-Simpson index:
gini_simpson(q = c(0.4, 0.3, 0.3))
#> [1] 0.66
# Compute Gini-Simpson index assuming that
# categories 1 and 2 are identical:
similarity_matrix = diag(3)
similarity_matrix[1,2] = 1
similarity_matrix[2,1] = 1
gini_simpson(q = c(0.4, 0.3, 0.3), S = similarity_matrix)
#> [1] 0.42