Negative hypergeometric Probability mass function ![Several examples of the PMF of the negative hypergeometric probability distribution.](//upload.wikimedia.org/wikipedia/commons/thumb/b/b7/Negative_hypergeometric_pmf.png/300px-Negative_hypergeometric_pmf.png) |
Cumulative distribution function ![Several examples of the CDF of the negative hypergeometric probability distribution.](//upload.wikimedia.org/wikipedia/commons/thumb/b/b6/Negative_hypergeometric_cdf.png/300px-Negative_hypergeometric_cdf.png) |
Parameters | - total number of elements - total number of 'success' elements - number of failures when experiment is stopped |
---|
Support | - number of successes when experiment is stopped. |
---|
PMF | ![{\displaystyle {\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/3638a1ef2782b226414ff863090b0c28bff320a3) |
---|
Mean | ![{\displaystyle r{\frac {K}{N-K+1}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/9425bfe675c5f350bb18df3e32a29697d8e0e670) |
---|
Variance | ![{\displaystyle r{\frac {(N+1)K}{(N-K+1)(N-K+2)}}[1-{\frac {r}{N-K+1}}]}](https://wikimedia.org/api/rest_v1/media/math/render/svg/51e440acb363f2b562dbb11e50df1f9a41a68fd9) |
---|
In probability theory and statistics, the negative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standard hypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn until
failures have been found, and the distribution describes the probability of finding
successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood of
successes in a sample with exactly
failures.
Definition
There are
elements, of which
are defined as "successes" and the rest are "failures".
Elements are drawn one after the other, without replacements, until
failures are encountered. Then, the drawing stops and the number
of successes is counted. The negative hypergeometric distribution,
is the discrete distribution of this
.
[1]
The negative hypergeometric distribution is a special case of the beta-binomial distribution[2] with parameters
and
both being integers (and
).
The outcome requires that we observe
successes in
draws and the
bit must be a failure. The probability of the former can be found by the direct application of the hypergeometric distribution
and the probability of the latter is simply the number of failures remaining
divided by the size of the remaining population
. The probability of having exactly
successes up to the
failure (i.e. the drawing stops as soon as the sample includes the predefined number of
failures) is then the product of these two probabilities:
![{\displaystyle {\frac {{\binom {K}{k}}{\binom {N-K}{k+r-1-k}}}{\binom {N}{k+r-1}}}\cdot {\frac {N-K-(r-1)}{N-(k+r-1)}}={\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/380a7655d128afda45a210a6faddc12ab0946cff)
Therefore, a random variable
follows the negative hypergeometric distribution if its probability mass function (pmf) is given by
![{\displaystyle f(k;N,K,r)\equiv \Pr(X=k)={\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}\quad {\text{for }}k=0,1,2,\dotsc ,K}](https://wikimedia.org/api/rest_v1/media/math/render/svg/f23810277254dfaf99a5c08c0647dfc6fcf9505f)
where
is the population size,
is the number of success states in the population,
is the number of failures,
is the number of observed successes,
is a binomial coefficient
By design the probabilities sum up to 1. However, in case we want show it explicitly we have:
![{\displaystyle \sum _{k=0}^{K}\Pr(X=k)=\sum _{k=0}^{K}{\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}={\frac {1}{N \choose K}}\sum _{k=0}^{K}{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}={\frac {1}{N \choose K}}{N \choose K}=1,}](https://wikimedia.org/api/rest_v1/media/math/render/svg/fd9e07f28073cec56bf963d6ba07879403b42257)
where we have used that,
![{\displaystyle {\begin{aligned}\sum _{j=0}^{k}{\binom {j+m}{j}}{\binom {n-m-j}{k-j}}&=\sum _{j=0}^{k}(-1)^{j}{\binom {-m-1}{j}}(-1)^{k-j}{\binom {m+1+k-n-2}{k-j}}\\&=(-1)^{k}\sum _{j=0}^{k}{\binom {-m-1}{j}}{\binom {k-n-2-(-m-1)}{k-j}}\\&=(-1)^{k}{\binom {k-n-2}{k}}\\&=(-1)^{k}{\binom {k-(n+1)-1}{k}}\\&={\binom {n+1}{k}},\end{aligned}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/d10f999dd364fd08e35c087cdaf824d08b0a71c3)
which can be derived using the binomial identity,
![{\displaystyle {{n \choose k}=(-1)^{k}{k-n-1 \choose k}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/cd21af83beb2a648a3f1910e805eda49d8e9d331)
and the Chu–Vandermonde identity,
![{\displaystyle \sum _{j=0}^{k}{\binom {m}{j}}{\binom {n-m}{k-j}}={\binom {n}{k}},}](https://wikimedia.org/api/rest_v1/media/math/render/svg/4d15bd56367902ca03af8d4603878b2ebf1c7f07)
which holds for any complex-values
and
and any non-negative integer
.
Expectation
When counting the number
of successes before
failures, the expected number of successes is
and can be derived as follows.
where we have used the relationship
, that we derived above to show that the negative hypergeometric distribution was properly normalized.
Variance
The variance can be derived by the following calculation.
Then the variance is
Related distributions
If the drawing stops after a constant number
of draws (regardless of the number of failures), then the number of successes has the hypergeometric distribution,
. The two functions are related in the following way:[1]
![{\displaystyle NHG_{N,K,r}(k)=1-HG_{N,N-K,k+r}(r-1)}](https://wikimedia.org/api/rest_v1/media/math/render/svg/b018865bb76971fb6ffe40f9d041f176df70afd2)
Negative-hypergeometric distribution (like the hypergeometric distribution) deals with draws without replacement, so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with draws with replacement, so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items:
| With replacements | No replacements |
# of successes in constant # of draws | binomial distribution | hypergeometric distribution |
# of successes in constant # of failures | negative binomial distribution | negative hypergeometric distribution |
Some authors[3][4] define the negative hypergeometric distribution to be the number of draws required to get the
th failure. If we let
denote this number then it is clear that
where
is as defined above. Hence the PMF
![{\displaystyle \Pr(Y=y)={\binom {y-1}{r-1}}{\frac {\binom {N-y}{N-K-r}}{\binom {N}{N-K}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/eeb0d15d25df6628c38b6fe49197fd4c1c6cdc1d)
If we let the number of failures
be denoted by
means that we have
![{\displaystyle \Pr(Y=y)={\binom {y-1}{r-1}}{\frac {\binom {N-y}{M-r}}{\binom {N}{M}}}.}](https://wikimedia.org/api/rest_v1/media/math/render/svg/957785470d826c28cfa3e0e9e71e3d5c0e7a4631)
The support of
is the set
. It is clear that:
![{\displaystyle E[Y]=E[X]+r={\frac {r(N+1)}{M+1}}}](https://wikimedia.org/api/rest_v1/media/math/render/svg/05ecb17d532fb5fae1a0035199a0f7d9bd302d6d)
and
.
References
- ^ a b Negative hypergeometric distribution in Encyclopedia of Math.
- ^ Johnson, Norman L.; Kemp, Adrienne W.; Kotz, Samuel (2005). Univariate Discrete Distributions. Wiley. ISBN 0-471-27246-9. §6.2.2 (p.253–254)
- ^ Rohatgi, Vijay K., and AK Md Ehsanes Saleh. An introduction to probability and statistics. John Wiley & Sons, 2015.
- ^ Khan, RA (1994). A note on the generating function of a negative hypergeometric distribution. Sankhya: The Indian Journal of Statistics B, 56(3), 309-313.
Probability distributions (
list)
Discrete univariate | with finite support | |
---|
with infinite support | |
---|
|
---|
Continuous univariate | supported on a bounded interval | |
---|
supported on a semi-infinite interval | |
---|
supported on the whole real line | |
---|
with support whose type varies | |
---|
|
---|
Mixed univariate | |
---|
Multivariate (joint) | |
---|
Directional | |
---|
Degenerate and singular | |
---|
Families | |
---|
Category Commons |