Bootstrap and Jackknife comparison¶

In this notebook we compare the bootstrap to the jackknife. Bootstrap resampling is superior to jackknifing, but the jackknife is deterministic, which may be helpful, and it can exactly remove biases of order 1/N from an estimator (the bootstrap removes biases of higher orders, too, but it does not remove the lowest order exactly).

[1]:

from resample import jackknife as j, bootstrap as b
import numpy as np
from scipy import stats

rng = np.random.default_rng(1)
data = rng.normal(size=20)


# get mean and std deviation
def fn(d):
    return np.mean(d), np.var(d, ddof=0) # we return the biased variance


# exact bias for biased standard deviation
# - we computed: s = 1/N * sum(x ** 2 - np.mean(x) ** 2)
# - correct is:  N/(N-1) * s
# - bias is: (1 - N/(N-1)) * s = (N - 1 - N) / (N - 1) * s = - 1 / (N - 1) * s


print("estimates           ", np.round(fn(data), 3))
print("std.dev. (jackknife)", np.round(j.variance(fn, data) ** 0.5, 3))
print("std.dev. (bootstrap)", np.round(b.variance(fn, data, random_state=1) ** 0.5, 3))
print("bias (jackknife)    ", np.round(j.bias(fn, data), 3))
print("bias (bootstrap)    ", np.round(b.bias(fn, data, random_state=1), 3))
print("bias (exact)        ", np.round((0, -1 / (len(data) - 1) * fn(data)[1]), 3))

estimates            [0.037 0.333]
std.dev. (jackknife) [0.132 0.098]
std.dev. (bootstrap) [0.145 0.093]
bias (jackknife)     [-0.    -0.018]
bias (bootstrap)     [ 0.    -0.021]
bias (exact)         [ 0.    -0.018]

The standard deviations for the estimates computed by bootstrap and jackknife differ by about 10 %. This difference shrinks for larger data sets.

Both resampling methods find no bias for the mean, and a small bias for the (not bias-corrected) variance. The jackknife is getting closer, since the bias for sufficiently large N is dominated by the O(1/N) order that the jackknife removes exactly.