1 Introduction
2 Required packages
3 Parameters of the probabilistic distributions used in simulation studies
4 Samples from probabilistic distributions used in simulation studies
5 Estimators of the Gini index
- 5.1 Infinite populations
- 5.2 Finite populations
6 Effect of the skewness on the bias using box plots
7 Empirical measures for a given estimator
- 7.1 Infinite populations
- 7.2 Finite populations
8 Applications to real data sets

1 Introduction

The aim of this document is to describe how to reproduce the results derived in the article “Exploring and correcting the bias in the estimation of the Gini measure of income inequality”, published in the journal Sociological Methods & Research. First (Section 2), we describe (load) the various R packages required for the computation of the estimators described in this article. In Section 3, we compute the parameters of the probabilistic distributions considered in simulation studies. Section 4 gives examples on how samples can be selected for the various scenarios. For both infinite and finite populations, codes for computing the various estimators of the Gini index are provided in Section 5. Box plots to investigate the effect of the skewness on the bias of the estimation of the Gini index can be seen in Section 6. For the various estimators defined in this document, Section 7 provides codes for computing empirical measures related to such estimators. In particular, functions inf.empirical.measures and fin.empirical.measures gives the: (i) Relative Bias (RB); (ii) Relative Root Mean Square Error (RRMSE); (iii) Bias Ratio (BR); (iv) Expected value based on estimates of the Gini index and; (v) Expected value based on estimates of the coefficient of skewness. Some examples are included, and they indicate how to carry out simulation studies. Finally, various estimators of the Gini index are computed using the data set ES-SILC, with size \(n=26\). The various real data sets can be loaded using the file Datasets.RData.

2 Required packages

library(moments, warn.conflicts=FALSE)
library(stats4, warn.conflicts=FALSE)
library(splines, warn.conflicts=FALSE)
library(VGAM, warn.conflicts=FALSE)
library(boot, warn.conflicts=FALSE)
library(sampling, warn.conflicts=FALSE)

3 Parameters of the probabilistic distributions used in simulation studies

Simulation studies are based on different Gini values (\(G=\{0.1,0.2,\ldots,0.8\}\)) for various continuous probabilistic distributions (Pareto, Dagum, Lognormal, Weibull and Gamma). In this section, we can compute the parameters used in simulation studies for each continuous probabilistic distribution and for each Gini index.

# Vector with Gini values between 0.1 and 0.8
VECTOR.G <- seq(0.1, 0.8, by = 0.1)
LEN <- length(VECTOR.G)

###########################
### PARETO DISTRIBUTION ###
###########################

VECTOR.ALPHA.Par <- (1+VECTOR.G)/(2*VECTOR.G)
VECTOR.ALPHA.Par

## [1] 5.500000 3.000000 2.166667 1.750000 1.500000 1.333333 1.214286 1.125000

##############################
### LOGNORMAL DISTRIBUTION ###
##############################

VECTOR.SIGMA.Log <- qnorm( (VECTOR.G+1)/2)*sqrt(2)
VECTOR.SIGMA.Log

## [1] 0.1777120 0.3582869 0.5449254 0.7416143 0.9538726 1.1902322 1.4657382
## [8] 1.8123876

###################################################
### DAGUM DISTRIBUTIONS (Dagum-p20; Dagum-p0.5) ###
###################################################

GINI.DAGUM <- function(p,GINI,a)
{
gamma(p)*gamma(2*p+1/a)/(gamma(2*p)*gamma(p+1/a)) -1 - GINI
} 


## p = 20 

VECTOR.a <- c()

for (i in 1:LEN)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 16), GINI=VECTOR.G[i], p=20)
VECTOR.a <- c(VECTOR.a, SOL.root$root)
}

VECTOR.a.Dag20 <- VECTOR.a 
VECTOR.a.Dag20

## [1] 7.386969 3.852848 2.671827 2.079358 1.722419 1.483396 1.311813 1.182504

## p = 0.5 

VECTOR.a <- c()

for (i in 1:LEN)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 16), GINI=VECTOR.G[i], p=0.5)
VECTOR.a <- c(VECTOR.a, SOL.root$root)
}

VECTOR.a.Dag0.5 <- VECTOR.a 
VECTOR.a.Dag0.5

## [1] 13.381436  6.461639  4.162633  3.018566  2.336341  1.884859  1.565110
## [8]  1.327546

############################
### WEIBULL DISTRIBUTION ###
############################

VECTOR.K.Wei <- 1/log2(1/(1-VECTOR.G))
VECTOR.K.Wei

## [1] 6.5788135 3.1062837 1.9433582 1.3569154 1.0000000 0.7564708 0.5757166
## [8] 0.4306766

##########################
### GAMMA DISTRIBUTION ###
##########################

GINI.GAMMA <- function(k,GINI)
{
gamma((2*k+1)/2)/(k*gamma(k)*sqrt(pi)) - GINI
} 

VECTOR.K.Gam <- c()

for (L in 1:LEN)
{
GINI.L <- VECTOR.G[L]
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=GINI.L)
VECTOR.K.Gam <- c(VECTOR.K.Gam, SOL.root$root)
}
VECTOR.K.Gam

## [1] 31.5800009  7.7038277  3.2780415  1.7241161  0.9999945  0.6021600  0.3583336
## [8]  0.1968058

4 Samples from probabilistic distributions used in simulation studies

This section shows how to select a sample for each Gini index and each population based on the continuous probabilistic distributions used in simulation studies. We consider samples with size \(N=50\).

N <- 50 
set.seed(123)

##############
### PARETO ###
##############

# G = 0.1
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[1])

##  [1] 1.254320 1.044198 1.176522 1.022878 1.011222 1.753476 1.123090 1.020910
##  [9] 1.114297 1.153188 1.008055 1.154701 1.073336 1.106681 1.511966 1.019377
## [17] 1.290360 1.779125 1.224735 1.008502 1.021510 1.069006 1.084370 1.001045
## [25] 1.079756 1.064652 1.117026 1.099286 1.253069 1.416889 1.006874 1.018868
## [33] 1.069595 1.042482 1.961161 1.143720 1.051551 1.320868 1.231467 1.304648
## [41] 1.424576 1.173633 1.174056 1.198824 1.407748 1.431943 1.303211 1.148947
## [49] 1.272258 1.028275

# G = 0.2
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[2])

##  [1]  2.794343  1.312584  1.077700  2.016816  1.212530  1.691757  1.986677
##  [8]  1.099029  1.037652  1.387385  1.145604  2.192814  1.375839  1.538912
## [15]  1.070725  1.306393  1.072738  1.071713  1.079769  1.314936  1.098462
## [22]  1.166986  1.120839 11.697485  1.281363  1.656206  1.380836  1.177336
## [29]  1.416562  2.079932  1.601140  1.143920  1.337818  1.082568  2.134247
## [36]  1.319895  1.005065  1.038424  1.040987  1.787628  1.970514  1.152585
## [43]  1.427855  1.150442  1.461441  1.746568  1.085284  2.202499  1.289129
## [50]  1.250403

# G = 0.3
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[3])

##  [1] 1.265888 1.661562 1.391729 1.021738 1.399301 1.055066 1.042146 1.257461
##  [9] 1.507920 2.422066 1.031353 1.739840 3.643655 1.025089 1.163270 2.459442
## [17] 1.318541 1.021927 1.280268 1.518508 1.221795 1.692407 1.722804 2.012376
## [25] 1.583320 1.007369 2.369885 3.022355 2.462538 1.186794 1.247555 1.054495
## [33] 1.200544 1.151192 1.350946 1.211536 1.094807 1.117364 1.009453 1.461569
## [41] 1.712610 1.509983 8.202119 2.185138 1.082176 1.965971 1.935573 3.271395
## [49] 1.911314 1.154772

# G = 0.4
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[4])

##  [1] 1.099200 1.490210 1.717954 2.226304 3.510057 1.712698 1.376128 2.394906
##  [9] 1.588790 2.388006 1.482103 1.810420 1.279123 1.752266 1.805931 1.431646
## [17] 1.187443 2.368740 1.658096 2.132709 1.302186 2.632301 1.087377 1.181767
## [25] 1.258989 1.316522 1.758917 1.437585 1.079514 1.362813 1.104937 1.944009
## [33] 1.217849 2.135783 1.346237 1.518735 2.135714 1.386330 1.053263 1.060793
## [41] 2.094761 1.912600 1.008299 1.314123 1.037685 1.546003 1.671826 1.268841
## [49] 2.930574 1.374849

# G = 0.5
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[5])

##  [1] 2.598564 1.025908 1.403592 1.556366 1.834158 1.088756 1.961219 2.291736
##  [9] 3.250403 3.231162 1.626584 2.500114 2.775596 1.300364 7.606916 1.267405
## [17] 2.006303 1.815059 1.140569 1.058038 2.322516 1.026801 1.235252 1.285165
## [25] 7.101278 1.856838 1.636095 1.471444 1.270539 1.060481 1.377769 1.759621
## [33] 1.504149 6.637470 2.449431 1.850812 2.946208 1.130515 3.497458 1.157103
## [41] 1.495434 1.316101 3.237097 1.356351 2.174459 1.239613 1.845278 1.020965
## [49] 1.022343 1.237169

# G = 0.6
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[6])

##  [1]  2.768698  3.094187  1.479735  2.688321  1.607444  1.198745  3.809782
##  [8]  1.971934  1.757260  1.111911  1.059606  1.098770  1.344048  1.039083
## [15]  1.641466  1.511436  2.264251  2.210285 18.785899  1.674731  1.109099
## [22] 44.715085  7.190222  3.876572  1.216160  1.259514  1.021626  1.771661
## [29]  7.020824  1.383273  1.230250  4.438206  2.001005  3.061148  8.465679
## [36]  2.003627  7.774536  3.051985  8.849769  1.349916  2.480962  5.593171
## [43]  7.201696  1.100208  1.235562  1.164101  1.013608  5.476225  5.664166
## [50]  1.183474

# G = 0.7
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[7])

##  [1]  1.221157 46.564269  1.228264  1.296750  1.462763  1.827381  4.602868
##  [8] 52.162617  1.921503  1.792514  2.173448  1.879833  1.320825 10.849017
## [15]  2.347571  1.198266  1.159284  3.264254  2.351923  1.135638  1.139057
## [22]  2.726046  4.848653  1.335157  6.459031 16.302171  1.000491 15.859315
## [29]  2.440828  1.075836  1.487887  2.801186  1.284570  1.161178  2.594089
## [36]  1.791695  1.345584  1.441452  1.436914  1.018614  2.064319  5.755721
## [43]  1.697291  3.414917  1.810345  2.266684  1.013923  2.179291  3.363660
## [50]  1.475959

# G = 0.8
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[8])

##  [1]  5.870260  1.029833  1.803506  5.013155  1.525310  1.012653  1.429910
##  [8]  2.167141  2.728050  1.173529  5.605483  4.319450  1.101729  2.847553
## [15]  2.459657  1.241560  4.308276 35.968671  2.225355  1.908982  2.153761
## [22]  2.589925  1.135859  2.013252  1.747252  1.033276  1.254879  4.022934
## [29]  2.842082  1.026182  1.610788  1.275044  2.404386  1.262704  1.735947
## [36]  1.083219  4.474915  3.078709  8.106677  3.995558  1.020807  2.948282
## [43]  1.329294  1.239113  7.387902  3.561032  3.196511  7.670509  6.687684
## [50]  1.007855

##############################
### LOGNORMAL DISTRIBUTION ###
##############################

# G = 0.1
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[1])

##  [1] 219.3689 187.3974 141.5822 163.4541 137.8777 136.3691 129.0056 133.5304
##  [9] 199.0162 146.9950 151.5918 154.9816 184.7541 135.4076 124.4146 199.8948
## [17] 137.2220 130.5169 119.1400 118.1188 134.0212 165.6411 180.7714 168.2998
## [25] 139.1251 149.9974 130.9460 130.6526 173.6797 123.9052 210.0781 146.0500
## [33] 154.1808 130.1587 134.0113 117.4427 143.6661 159.8855 157.2179 129.1677
## [41] 129.0052 135.7416 193.6143 121.2540 143.7651 208.1112 145.7737 116.5523
## [49] 131.8761 161.7856

# G = 0.2
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[2])

##  [1] 129.72624 121.35099 131.20736 153.30414 263.15015 143.77769 218.59841
##  [8] 186.04544 142.49178  85.69387 123.13613 124.52243 150.94187 236.47540
## [15] 337.50495 258.39206 141.49916  79.09576 129.11523 153.23333 200.88992
## [22] 209.52879 189.64978  90.02537 201.22345 126.46991 158.00545 152.43080
## [29] 173.01989 149.73106  81.66010 193.22916 170.42721 134.93869 154.83029
## [36] 155.71451 160.64362 267.17228 137.21062 157.62450 225.56687 216.52354
## [43] 223.70602 120.67498 304.13247 152.00266 289.70657  91.46801 149.53316
## [50] 232.25317

# G = 0.3
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[3])

##  [1] 100.50891  98.47874  88.99377  83.63472 116.95400 177.76598  49.52143
##  [8] 166.58631 291.16662 450.48550 301.58258 224.16577  57.91988 106.93527
## [15] 122.50611 217.75445 140.10848  74.74859 371.62744 243.87122 168.91266
## [22] 288.23565  71.55510 212.74575 111.61459 215.42015 143.57486 209.54036
## [29] 307.27942 149.00391 258.39607  77.66402 100.16105 339.63121 182.29902
## [36]  48.50620  70.57679 133.03195 237.88449 140.39800 208.54099 250.28153
## [43] 368.92755 153.01331 144.26815  57.08928 156.66756 108.67745  87.29013
## [50] 134.55385

# G = 0.4
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[4])

##  [1] 315.03898  33.85712 108.10723 161.82247  76.52247 190.11463 201.36563
##  [8] 144.82120  23.83737 999.27516 127.45293 240.55151 181.82218 317.32051
## [15] 272.15941 127.02887 196.45917  73.61665 280.20077 105.43423 890.97264
## [22]  43.62186 105.20390 273.72215 216.65833  95.85460  70.86475 165.19808
## [29] 146.84673  39.34236 152.26517 170.89982 168.94608  67.86933 211.26373
## [36] 412.55410 208.16925  63.93272 107.43855 191.84263  91.84861  29.95986
## [43] 285.93745  80.22599  96.99307 452.73829  83.58658 277.88479  58.26838
## [50] 114.09898

# G = 0.5
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[5])

##  [1]  138.35695   48.67938   81.00663  144.38579  281.39475   30.74074
##  [7]  106.31209  305.36731   88.76964  184.34473  237.34703  191.61353
## [13]  276.75277  132.01976  100.02352   11.92662  135.82212  223.72928
## [19]  247.32475   87.38601  810.31170  195.04151  167.41726  499.48639
## [25]   74.78930   96.58605 1460.98204  149.99708  705.01289   37.63175
## [31]  123.75097  212.93100  197.59086   56.86905  151.16484   53.10537
## [37]  292.89909  417.68902   17.77238  482.36162   45.43132  229.01601
## [43]  278.51252  122.64949   80.20962  173.76350  225.55795  344.65996
## [49]   20.95405   31.15899

# G = 0.6
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[6])

##  [1]  814.448236  515.806305  249.159912  347.661165  442.150838    6.251892
##  [7]  556.400028   83.325295  195.290378  104.448455  418.987352   98.026405
## [13]  275.101428   93.222972   40.420194  626.527976  358.469300 1155.481448
## [19]  160.380370  566.237981 1558.076448  106.162499   30.734354  111.621766
## [25]  115.035552  177.777899 1139.153198  100.666498  231.357038  113.182675
## [31]  152.070023  215.681230  721.176460  171.468064  346.695841  375.036839
## [37]  440.888747   74.913726 1029.024296   94.308662  130.855319  789.299456
## [43]  692.466794   40.554900   52.501556   29.475831  184.277058  180.584516
## [49]  228.921926  286.344568

# G = 0.7
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[7])

##  [1]   61.422898   34.587698  668.474356  446.241020   16.247792  129.093733
##  [7]   39.915914    7.133637  184.940991  132.144547  128.674010  203.735590
## [13]  541.023579  200.607870   60.127449   50.550695  122.340925  233.785429
## [19]   32.333403  113.278735  612.629611  126.632569   53.319290   99.041271
## [25]  760.353079  332.360566  909.270752  181.977133  270.793317   65.460597
## [31]  360.436172   70.657736   18.500966  179.039092 2571.149555  480.069415
## [37]  818.895397  251.134468   60.825829  110.340149   99.433574   74.664806
## [43]  416.599303   25.661162  528.406512  526.694809   25.613856  378.921031
## [49] 5229.526052   65.579819

# G = 0.8
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[8])

##  [1]  686.2856733   35.9578375 1111.0264613  233.4059532 2962.8620237
##  [6]   10.5463469  135.2369394   57.1116054  103.8014144   47.4159794
## [11]   32.7450639  423.6337388   20.6741100 2185.5860040   17.2900966
## [16]  178.2513443  389.9362160  429.8308916   85.8943529  171.4151567
## [21]  847.4088092   10.5943329   35.9879671  265.2554126   66.2796330
## [26] 1777.5268327  502.8022526  169.1513925    9.6538744  155.6023516
## [31]   83.6408304  123.2861958   17.4363429  366.4131998   22.5787280
## [36]   98.4942415  296.2759980   35.8723103  426.9241648   13.6531855
## [41]    0.9117397  344.7095654  680.8785597   88.4057752  370.0625980
## [46]   18.2658139  117.8670673    4.3982915 1262.3844179 4319.4328661

###################################################
### DAGUM DISTRIBUTIONS (Dagum-p20; Dagum-p0.5) ###
###################################################

##############
### p = 20 ###
##############

# G = 0.1
rdagum(N, shape1.a = VECTOR.a.Dag20[1], shape2.p = 20)

##  [1] 1.934100 1.999221 1.565946 1.740252 1.564476 2.750316 1.296750 1.371851
##  [9] 1.816666 1.599844 1.522765 3.764328 1.431743 1.577666 1.310935 1.856300
## [17] 1.503112 1.839532 1.397558 2.220594 1.482537 1.586698 1.338849 1.995635
## [25] 2.270069 1.267034 1.568745 1.414497 1.934520 1.592451 1.172718 1.444238
## [33] 1.471059 1.631884 1.428291 1.270869 1.338059 1.726261 2.180958 1.463188
## [41] 1.310483 1.469120 1.658832 1.536567 1.838349 1.445216 1.619184 1.269502
## [49] 1.393104 1.749375

# G = 0.2
rdagum(N, shape1.a = VECTOR.a.Dag20[2], shape2.p = 20)

##  [1] 3.343095 2.402142 2.513124 1.266558 1.809264 3.595768 1.688709 1.972871
##  [9] 6.117798 2.533626 1.979263 2.615724 1.542557 1.813923 2.846333 1.647878
## [17] 2.631615 4.426759 1.700591 5.525498 1.518173 2.915847 2.301298 2.509952
## [25] 3.648202 2.486399 2.957576 2.634714 2.074344 2.200333 1.906477 3.502093
## [33] 2.597244 3.023765 2.502460 2.020821 2.950393 2.475872 2.128179 2.410012
## [41] 3.284712 2.431260 1.907994 3.471273 2.166066 2.062597 3.079869 2.943218
## [49] 3.413742 2.515851

# G = 0.3
rdagum(N, shape1.a = VECTOR.a.Dag20[3], shape2.p = 20)

##  [1]  1.827992  3.444381  3.668354  5.840640  4.705788  3.721361  2.090140
##  [8]  2.327102  4.396897  2.196068  2.540908  3.033911  5.986578  2.640941
## [15]  4.532001  2.225837  5.146224  2.695603  3.691028  1.882913  5.438946
## [22]  2.945687  7.633172  3.005944 12.213000  3.092684  3.453935  3.461858
## [29]  1.689784  4.141110  3.545908  2.892249  3.838975  4.013258  3.272407
## [36]  2.437295  4.604354  4.859770  5.950610  3.635355  2.837199  3.336382
## [43]  3.092568  2.677453  4.215081  9.734216  2.803539  4.189762  3.641976
## [50]  3.914570

# G = 0.4
rdagum(N, shape1.a = VECTOR.a.Dag20[4], shape2.p = 20)

##  [1]  2.154015  1.768979  2.931933  2.153012  2.701539  4.827968  3.277703
##  [8]  9.029012  3.519394  4.917869  6.287600  3.080179  9.667061  4.576977
## [15]  3.465628  3.801180  3.083919  5.668381  3.027063  3.186901  4.015419
## [22]  3.211999  4.389114  5.926149  6.718343  9.268503  4.207233 18.126665
## [29]  2.072623  5.914768  4.748740  4.100021 11.460590  2.595074 11.504310
## [36]  3.234313  3.371817 10.861729  2.488217  3.472772 45.889577  6.625602
## [43]  4.570092  4.226034  4.733167  8.709375  6.445157  4.764650  2.429294
## [50]  3.484467

# G = 0.5
rdagum(N, shape1.a = VECTOR.a.Dag20[5], shape2.p = 20)

##  [1]  4.481721  4.461180  3.015303  6.938278  4.568389 11.952132  7.571603
##  [8] 31.196559  3.263275 31.130630 10.878030 24.429327  8.573435  3.907060
## [15]  4.083030 11.093515  8.181566  5.127003 12.461700  5.330613 16.712610
## [22]  8.044621  6.062879 26.675608  6.548711  3.385656  6.578699 62.690883
## [29] 25.070752  6.260650 17.822232  5.025434 13.467091 13.578975  5.786307
## [36]  5.736570  9.139203 17.164662  9.355127  7.170561  3.808093  8.555389
## [43]  2.780511  6.952935 11.364102 11.811322  3.030148  6.333787  6.980384
## [50]  4.778310

# G = 0.6
rdagum(N, shape1.a = VECTOR.a.Dag20[6], shape2.p = 20)

##  [1]   11.717066   16.626124   15.582026    4.335996   11.126205   20.487862
##  [7]    5.690445   16.094212   24.166356    4.364804   63.892337   17.245349
## [13]   21.525980    5.202746    2.956698   27.444356   39.871946    4.591587
## [19]    9.125048    8.869202   15.013590    3.659717   16.487309    7.809087
## [25]    8.844057   16.115036    8.990650 1308.264189   25.077217   23.132749
## [31]   18.027833    4.281760   24.272262   21.783374  185.595052    3.388814
## [37]   17.192892    8.160934   11.296187   23.389435    2.616822    2.545767
## [43]  336.994132   18.269830    6.699648   21.622758  182.822857    5.328394
## [49]   13.070379    8.448077

# G = 0.7
rdagum(N, shape1.a = VECTOR.a.Dag20[7], shape2.p = 20)

##  [1]  79.695180 284.460960  11.630310   7.123482  21.092462  14.586005
##  [7]  15.609631  11.074311  10.898591  16.087735  11.524000 105.392427
## [13]  37.912473   6.993283  10.759430  10.755123   3.242754   9.886945
## [19]  10.809672   6.479797  21.742351   2.967250  23.641019  11.052755
## [25]  23.643954   2.985278   3.907498  10.719125  18.184316   6.944127
## [31]  35.807590  21.046174  54.059446  10.623996  10.550908   7.449339
## [37]   9.747652   7.144764  73.721019  12.150302   3.960835  27.525413
## [43]   8.901230  14.456929  19.923286   7.920044   4.074767  10.888873
## [49]  16.637448 167.101582

# G = 0.8
rdagum(N, shape1.a = VECTOR.a.Dag20[8], shape2.p = 20)

##  [1] 239.738907   5.654808  16.854396  13.166714  13.001410  10.970494
##  [7]  28.201205  10.109794  12.994141 625.028099  42.933849  10.811779
## [13]  48.513963   8.608495  16.967464   7.410756   6.368819  82.522460
## [19]   5.777743  64.299543   9.913184   3.945259  42.038180  12.066284
## [25]   3.862684  35.359906   4.030410  14.413078   9.434296  16.839161
## [31]  25.705635   5.539770   7.707205   6.025807  47.360584  23.788876
## [37]  24.004111  13.018523  14.357853  11.784019  43.242840  75.489695
## [43]  25.530595  21.558666  11.216910   5.345997  21.470980   7.660221
## [49]  34.716899  10.339975

###############
### p = 0.5 ###
###############

# G = 0.1
rdagum(N, shape1.a = VECTOR.a.Dag0.5[1], shape2.p = 0.5)

##  [1] 0.9727338 0.7335687 0.8195123 1.0555434 1.0459976 0.6309126 1.0929187
##  [8] 0.7363830 0.9612159 0.9099467 0.9421479 1.0100714 0.9100552 1.3248986
## [15] 1.2952169 0.8808015 0.9612218 0.8612678 0.8977689 0.8055791 0.4663359
## [22] 0.5722552 0.6548420 1.0258555 0.9097524 1.0746956 0.9511319 0.7803401
## [29] 0.9613020 1.1081536 1.0408901 0.9226820 0.8923931 0.9786244 1.1031283
## [36] 1.1069438 0.7747455 0.8206568 0.7230097 1.0424038 0.7191917 1.0606405
## [43] 1.0290085 1.1696007 0.9089766 1.1633217 1.0361167 1.0921827 0.8812677
## [50] 0.7090353

# G = 0.2
rdagum(N, shape1.a = VECTOR.a.Dag0.5[2], shape2.p = 0.5)

##  [1] 0.6804281 1.0134724 0.7420027 0.7612641 0.3445502 0.8586150 0.5214153
##  [8] 1.2202981 0.4482809 1.0751873 1.1659869 0.3610522 0.6789285 0.9465927
## [15] 1.0761262 0.7737018 1.3510091 0.7668778 0.7906180 0.6881549 0.9235229
## [22] 0.8849840 0.3692394 1.0350652 0.6178388 0.9579151 0.8592937 0.9399307
## [29] 0.9278793 0.6869170 0.6342019 0.8709575 2.1414623 0.7214802 0.7172168
## [36] 0.3553711 0.8615336 0.9124555 1.0319899 1.4249365 0.8364252 0.4638383
## [43] 0.6661738 1.2007763 1.1627696 1.1490510 1.3969979 0.9548968 0.8354059
## [50] 0.5992060

# G = 0.3
rdagum(N, shape1.a = VECTOR.a.Dag0.5[3], shape2.p = 0.5)

##  [1] 1.00812089 0.30658617 0.73023289 1.59920019 1.24657573 1.01249873
##  [7] 0.36787998 0.86529054 2.54112436 0.88789331 0.54606039 0.26661057
## [13] 0.28340793 1.32433235 0.63558901 1.66987795 0.82672276 0.78054325
## [19] 1.68752877 0.34698554 1.18624370 0.64962337 0.82498997 0.58752594
## [25] 0.30179074 1.11153977 0.45809874 0.93434137 0.70514557 0.46270976
## [31] 2.97010244 0.71629341 0.18971744 1.60661460 1.31664662 0.87226531
## [37] 0.57261280 1.16655494 1.72621723 1.26374941 0.37292545 1.92642192
## [43] 0.82546467 0.08111830 0.38483456 0.08454663 1.18197100 1.11544986
## [49] 1.56850181 0.64126730

# G = 0.4
rdagum(N, shape1.a = VECTOR.a.Dag0.5[4], shape2.p = 0.5)

##  [1] 2.64429048 1.28083538 1.17619602 0.85976397 2.42906087 0.69482470
##  [7] 1.57862451 0.57851697 0.63213737 1.90926658 0.95707929 0.07024996
## [13] 0.30651057 0.68982187 0.60773196 3.92807884 1.55175959 0.51608562
## [19] 0.99763865 1.67444362 0.94494825 0.88452836 0.28615608 0.82806270
## [25] 0.43379089 0.64425201 0.37543249 0.41340176 0.90754293 0.66103020
## [31] 0.30073250 0.43302699 1.01106889 0.84575210 0.26310036 0.17200366
## [37] 2.40142303 0.54140948 0.94693260 1.35763462 0.39808674 0.10883417
## [43] 0.81969965 1.32241415 0.53462853 0.50085076 0.89332385 2.23530785
## [49] 0.82106224 0.70762010

# G = 0.5
rdagum(N, shape1.a = VECTOR.a.Dag0.5[5], shape2.p = 0.5)

##  [1] 0.205145089 1.483886391 0.273032189 0.915680665 0.813352977 0.077042569
##  [7] 2.577843528 0.751930669 1.569038865 0.651609678 0.332283548 0.777158052
## [13] 1.262645419 0.958256401 0.597649098 0.577513880 0.814549449 0.555189770
## [19] 0.786447284 0.009999016 7.152134766 0.499888296 0.161415929 1.622979958
## [25] 0.684210562 1.707129720 0.060252010 1.705912774 1.038033650 1.173984238
## [31] 1.585151048 0.625919166 2.153711607 0.265825234 0.045579903 0.389939557
## [37] 0.895282698 0.388646330 2.468783155 1.204062036 0.417887401 0.599054616
## [43] 0.511970729 1.195252660 0.604310801 0.027928407 0.859903954 0.296404295
## [49] 1.104985620 0.366360685

# G = 0.6
rdagum(N, shape1.a = VECTOR.a.Dag0.5[6], shape2.p = 0.5)

##  [1]  2.24549689  1.68666564  0.50052968  0.06088570  0.69315972  0.69656107
##  [7] 10.02639989  0.33410550  0.88869577  2.89083070  0.68755130  0.95529985
## [13]  0.07780113  1.79497445  0.56010047  0.20931693  0.43131884  1.99127219
## [19]  0.55136998  0.90402410  0.27330906  1.44764362  0.23351269  0.37604161
## [25]  0.22224126  2.81653311  0.43488034  0.12299539  2.37351702  0.34896233
## [31]  1.68363817  0.12802551  0.36336691  0.23287587  0.03807255  1.28357697
## [37]  0.18439359  0.40427287  0.32026039  0.74123661  0.62663084  8.27935749
## [43]  0.24178958  1.09023910  5.10267869  0.65639567  0.40538317  1.49209763
## [49]  0.09900223  0.61714941

# G = 0.7
rdagum(N, shape1.a = VECTOR.a.Dag0.5[7], shape2.p = 0.5)

##  [1]  0.40744409  0.20094774  0.09565062  2.08932425  0.61036020  0.58773416
##  [7]  0.67502355  1.07898882  0.78880238  0.72655311  0.02314300  0.61670626
## [13]  0.66385003  0.76174295  0.74486691  0.04127712  0.03831556  0.93110678
## [19]  1.66436424  0.08665310  1.34493825  0.47315427 11.51737343  0.49411340
## [25]  0.61079786  0.31865059  0.52792947  0.03619845  0.56426139  4.91728273
## [31]  0.54446938  0.39777547  0.01243607  0.01519337  0.02031608  0.09853098
## [37]  0.44553446  1.48313542  0.20772140  1.19637838  0.53362928  0.21636674
## [43]  0.00563657  0.87217690  0.03169057  1.71596262  0.07856381  0.83558358
## [49]  1.72968757  0.18468494

# G = 0.8
rdagum(N, shape1.a = VECTOR.a.Dag0.5[8], shape2.p = 0.5)

##  [1]  0.050202723  1.286797455  0.100030046  0.145467545  0.496169043
##  [6]  0.393554157  0.253907554  1.362185758  0.550401117  0.061543420
## [11] 18.032871318  0.125438691  0.047057890  0.100540378  0.261790972
## [16]  0.237776645  0.177612392  0.009180611  4.296066191  0.573543421
## [21]  0.106676486  2.493426958  0.382658920  0.517176760  1.424339148
## [26]  0.783525205  0.055338492  0.196586197  5.439030722  0.082980927
## [31]  1.210658320  0.369990374  0.053279645  0.095020857  0.875592477
## [36]  0.051360685  0.307216117  0.357969890  0.709576419  0.131026182
## [41]  0.377734851  0.067953529  2.163357046  0.062350411  0.025992377
## [46]  2.817317537  0.175631235  1.220071169  0.288835657  0.353951419

############################
### WEIBULL DISTRIBUTION ###
############################

# G = 0.1
rweibull(n = N, shape = VECTOR.K.Wei[1], scale = 1)

##  [1] 1.0966294 1.1054899 1.1027109 0.9398041 0.9487831 0.8955482 0.9674293
##  [8] 1.1749425 1.2856239 1.0641976 0.7578675 1.0430092 0.9035614 0.8990655
## [15] 0.4788531 1.0795370 0.8254862 1.0544327 1.0168796 1.3009305 0.4041450
## [22] 0.9712539 0.3975329 1.0766885 1.1268528 1.0505634 0.8649325 0.9130004
## [29] 0.9801176 0.8392976 0.9143393 0.9128337 0.9073895 1.0636683 1.1325734
## [36] 0.8352050 0.8973416 0.6342538 0.6152625 0.8304620 0.9725724 1.0882420
## [43] 0.9723854 0.8200874 0.6867993 0.7024688 0.7853109 1.0751419 1.1819398
## [50] 0.9545080

# G = 0.2
rweibull(n = N, shape = VECTOR.K.Wei[2], scale = 1)

##  [1] 0.8136433 1.3142550 0.3820811 1.2116704 0.4465724 1.2614551 0.7512971
##  [8] 0.6344213 0.6547890 0.7425327 1.2807930 0.5554505 0.7902585 0.5743632
## [15] 1.2023125 1.6318224 1.0397293 1.0909560 0.5879152 0.4221470 1.2614298
## [22] 0.4616862 0.8011008 0.9720054 0.7485822 0.6517124 0.8706295 0.9769478
## [29] 0.3392674 1.4267671 1.2243610 0.7204882 0.6913638 1.2931057 0.9180110
## [36] 0.7694110 0.9747184 0.2242161 0.7370142 1.0472410 1.2237296 1.0603968
## [43] 1.3182296 1.5126216 1.0517346 0.6770933 0.3568441 0.9588193 0.8703490
## [50] 1.3760251

# G = 0.3
rweibull(n = N, shape = VECTOR.K.Wei[3], scale = 1)

##  [1] 0.6723143 1.0204454 0.9232960 0.1830853 0.6112538 0.5658132 1.3408509
##  [8] 1.2701405 0.3524256 0.4105959 0.5631601 1.1304970 1.2574065 0.8353249
## [15] 0.8015362 1.2382822 0.5101415 1.9628220 0.5743467 0.4057912 0.2576583
## [22] 0.6339090 1.1488608 1.2413480 0.3659594 1.9622461 0.3925628 0.7988883
## [29] 0.9093308 0.2567157 1.3549371 0.9848547 1.4212675 1.3909748 0.6167205
## [36] 0.7529012 0.8633020 0.4472539 0.1867102 0.8092558 0.5837988 1.3179036
## [43] 1.0182293 1.7593036 1.3838934 1.5384903 1.1319911 0.9729842 0.6671840
## [50] 0.5797363

# G = 0.4
rweibull(n = N, shape = VECTOR.K.Wei[4], scale = 1)

##  [1] 1.08767967 0.08531036 0.47267741 1.55845215 0.99763332 1.14142591
##  [7] 0.83343447 0.29914732 1.32079790 0.93323585 0.07321532 1.71802668
## [13] 0.56566356 0.46244297 0.62249070 0.01545715 2.29858491 0.03352232
## [19] 0.40564555 0.87045019 1.59067177 0.84916787 1.46587031 0.88868382
## [25] 0.30259220 2.24057092 1.08907627 0.27267993 2.36409405 0.34217588
## [31] 0.09177086 0.52219959 3.14912258 0.82007405 0.44490515 2.55395879
## [37] 0.26840425 0.40413231 1.23503922 0.40481684 0.50725091 1.51679215
## [43] 0.50986113 0.14412431 0.26073434 0.35690320 0.48738192 0.62224147
## [49] 0.93145184 1.69910932

# G = 0.5
rweibull(n = N, shape = VECTOR.K.Wei[5], scale = 1)

##  [1] 0.198250392 0.333069372 0.237845025 3.999980221 1.745887038 0.513079882
##  [7] 2.517775072 0.346277836 0.573288480 1.262987378 0.936348639 0.396413942
## [13] 1.819050501 2.220987563 0.628400750 2.271095143 2.370670320 0.446404075
## [19] 0.572625787 1.185685967 1.324225018 0.458073810 0.185416942 2.968980996
## [25] 0.007694491 3.812757347 1.715693890 1.112633909 0.156775786 1.690863212
## [31] 0.228964583 0.152158504 0.750734909 0.433551608 1.123801845 0.763274714
## [37] 0.059317737 1.404603765 3.796150271 0.033482504 1.346885754 0.424484001
## [43] 2.585196874 0.031236727 0.084970562 0.141387847 0.857128414 0.247139188
## [49] 1.204286545 1.239763075

# G = 0.6
rweibull(n = N, shape = VECTOR.K.Wei[6], scale = 1)

##  [1] 7.091276e-05 3.227487e-03 6.642938e-01 1.783915e-01 1.151861e+00
##  [6] 1.623692e+00 3.805460e-01 1.103691e+00 3.975339e+00 5.018143e-03
## [11] 2.724118e-01 8.021659e-02 7.113277e-01 2.618417e+00 2.833668e-01
## [16] 6.074188e+00 2.197768e+00 5.406257e-02 2.297640e+00 1.490302e+00
## [21] 1.575632e+00 1.223966e+00 1.721929e+00 5.215889e+00 4.890989e-02
## [26] 1.487526e-02 2.364223e-02 7.337406e-01 7.770753e-01 4.258597e+00
## [31] 1.029382e+00 5.346309e-02 9.886247e-01 6.083213e-01 5.536837e-01
## [36] 2.127760e+00 5.301881e-01 2.840456e+00 3.869279e-01 5.168959e-01
## [41] 1.679148e+00 4.998627e-02 1.887297e+00 8.232671e-01 6.257509e-02
## [46] 1.611400e+00 1.317970e+00 6.873731e-02 3.784992e-01 8.430330e-02

# G = 0.7
rweibull(n = N, shape = VECTOR.K.Wei[7], scale = 1)

##  [1]  0.592237444  0.022239075  2.715738109  0.275744899  0.141712732
##  [6]  2.410658677  0.370238616  0.202013052  0.013603781  1.221734016
## [11]  0.416263838 18.115657773  0.989494786  0.249838724  0.294747057
## [16]  0.061366518  3.742896404  0.212926524  2.265818190  0.335190559
## [21]  0.172629451  0.458262494  0.992276822  0.168623967  0.602747716
## [26]  1.943821263  0.663088937  5.034518570  1.697265365  0.400448759
## [31] 10.292985475  2.153920593  0.251700719  0.147862649  0.091564003
## [36]  5.308517199  4.298048456  0.006506546  0.010262356  0.012928163
## [41]  0.231678014  2.926650137  0.016503793  0.012746942  2.713051803
## [46]  5.300879991  0.375245898  1.011750414  0.643955080  0.629779910

# G = 0.8
rweibull(n = N, shape = VECTOR.K.Wei[8], scale = 1)

##  [1] 3.994930e-02 2.219593e-02 1.724725e+00 6.827818e-02 4.386402e-01
##  [6] 1.015844e+01 1.576384e+01 2.321477e+00 2.649634e+00 7.080428e-01
## [11] 3.033911e-01 8.015085e-02 2.283815e+00 5.966757e-03 4.792874e-01
## [16] 1.959981e+00 1.354731e+00 1.254570e-01 6.961699e-02 2.597738e-01
## [21] 8.656272e-03 7.700056e-01 2.904447e+00 5.179642e-01 9.298103e-01
## [26] 2.170066e+00 2.670368e-03 5.339343e-01 2.913978e+01 1.369539e-03
## [31] 9.587958e-01 3.860485e+01 1.091982e-04 9.408407e-01 5.153538e-05
## [36] 4.821076e+00 2.776404e-01 3.303197e-02 3.155741e+00 8.252536e-02
## [41] 2.707620e-01 3.447355e+01 2.038120e+00 5.174183e-01 2.515725e-03
## [46] 3.970787e-06 2.811218e+00 3.132991e-01 1.469905e-01 5.234601e-03

##########################
### GAMMA DISTRIBUTION ###
##########################

# G = 0.1
rgamma(n = N, shape = VECTOR.K.Gam[1], scale = 1)

##  [1] 34.63168 27.00038 28.41345 34.69285 37.49449 25.81374 24.36601 32.74267
##  [9] 32.21542 28.74403 31.02044 30.32568 29.55222 23.33838 33.67359 35.90658
## [17] 27.33807 20.18409 22.82648 36.04005 25.23121 44.93018 27.49963 26.50892
## [25] 26.54987 21.73886 22.46125 29.37303 31.00386 36.96266 27.16381 27.50749
## [33] 26.86216 34.19346 27.63495 34.21468 39.40432 33.27543 37.21970 34.64967
## [41] 33.54442 33.26967 38.69593 25.74359 27.22599 30.41632 28.67746 35.42144
## [49] 29.63288 23.65935

# G = 0.2
rgamma(n = N, shape = VECTOR.K.Gam[2], scale = 1)

##  [1] 10.339464  8.392638 11.563390  7.183345 11.083696  4.725019  8.359876
##  [8]  6.472208  5.081425  7.335643  9.957855  6.913409  5.376258  6.837318
## [15] 10.035225  5.928422  7.010587  6.899897  2.850106  9.238046  4.147690
## [22]  7.100980  9.870433  6.436940  8.102035  5.056679  9.397179  5.843702
## [29]  6.366496  4.763882  8.389102  9.049077  4.301848  9.036459  7.802253
## [36] 11.877998  4.717460  9.255811  5.332222  3.370841  5.496392  4.154450
## [43]  8.203489  5.722501  4.933585 19.177219  7.771466  9.665247 10.865428
## [50]  5.443321

# G = 0.3
rgamma(n = N, shape = VECTOR.K.Gam[3], scale = 1)

##  [1] 3.7724631 3.4936209 1.8627591 2.3146208 2.9573218 1.2024368 4.3495636
##  [8] 0.8930237 4.8122282 2.5500276 5.7443087 1.8925940 5.9788771 7.8752462
## [15] 2.0473102 3.4995891 1.5258269 7.6173220 3.8588901 3.3603524 5.5996564
## [22] 4.8323514 2.4401219 2.7720772 5.4246966 1.9633057 6.8477152 2.9644242
## [29] 5.0038969 4.2057250 1.1722392 4.1961260 2.4776547 3.6048484 3.6301791
## [36] 2.0193636 0.3352092 2.7242666 2.5338895 3.9088356 3.5717306 3.5257127
## [43] 3.5644615 3.7844592 3.9458804 5.7752217 3.9073178 7.1276106 3.1425242
## [50] 0.2409909

# G = 0.4
rgamma(n = N, shape = VECTOR.K.Gam[4], scale = 1)

##  [1] 0.2452192 1.1300639 1.4169767 0.6778939 1.3884554 0.1456747 0.7931100
##  [8] 3.6556269 0.9269505 3.1995700 5.1107374 0.9267827 0.2722695 0.5081242
## [15] 1.5333889 1.8513418 1.0987622 1.2593836 2.6347716 0.9748723 1.5692000
## [22] 2.0758283 0.6549810 0.6427581 2.3418768 1.1158532 2.0652245 0.4882630
## [29] 1.8105879 1.6982807 3.2677248 0.1650683 1.9411048 2.3447135 3.1456201
## [36] 0.7328015 1.0505221 3.2079448 2.2133208 1.7344136 0.4519682 0.5585969
## [43] 2.0264175 1.7377048 1.0109876 0.2001713 1.4213732 0.4779559 2.7281793
## [50] 0.1177665

# G = 0.5
rgamma(n = N, shape = VECTOR.K.Gam[5], scale = 1)

##  [1] 0.04934946 1.68841500 0.62247764 2.05932206 1.05753278 1.82163078
##  [7] 0.30121129 1.14435237 0.38105183 0.68435428 1.48151280 0.18310251
## [13] 0.18757649 0.80008880 1.36850923 1.55650540 1.66685735 0.25100687
## [19] 1.01360156 0.06175226 0.04457459 0.20745047 0.98830358 1.52219076
## [25] 2.69518643 0.57721706 0.28802751 0.60881026 1.94008078 0.04216691
## [31] 0.15510919 0.98421483 2.45598520 1.01911155 0.04535016 0.90372297
## [37] 0.58416714 1.14381197 1.70262039 0.83432134 2.19316628 0.77332109
## [43] 2.54619215 0.65016729 0.08349626 0.17352799 0.33650049 0.22392166
## [49] 0.92663822 3.57523533

# G = 0.6
rgamma(n = N, shape = VECTOR.K.Gam[6], scale = 1)

##  [1] 2.136827874 0.544901731 1.006575221 0.311428476 1.673919302 0.132624177
##  [7] 0.118647262 0.443093202 1.112380598 0.132001432 0.051687700 0.728891279
## [13] 0.298016872 0.119966452 0.457350248 0.006305465 0.001465703 1.100608969
## [19] 1.209050987 0.232051982 0.663074122 0.748821232 0.960546628 0.853658364
## [25] 0.215535316 0.007671702 0.326261216 1.118028600 2.352172826 1.006885791
## [31] 0.168457014 0.011183765 1.182786617 0.264811235 0.419009631 0.722529193
## [37] 2.354158315 0.276775206 0.001780794 0.195088566 0.029536470 0.349494818
## [43] 0.241708259 1.752095723 1.316467390 0.084276294 0.035481699 0.207074986
## [49] 2.230421413 1.902984531

# G = 0.7
rgamma(n = N, shape = VECTOR.K.Gam[7], scale = 1)

##  [1] 4.213045e-02 7.914715e-05 3.792429e-03 8.387965e-01 5.361176e-02
##  [6] 2.240461e-01 3.685452e-01 2.142289e-01 2.202561e-01 7.921302e-01
## [11] 5.441748e-05 3.149630e-04 3.033587e-02 6.241676e-02 8.919675e-01
## [16] 5.052141e-02 1.882529e-01 3.026228e-02 8.178127e-01 1.331303e+00
## [21] 7.849344e-01 1.222614e-01 9.343051e-01 4.295278e-01 2.088232e+00
## [26] 3.450247e-01 3.398099e-02 5.652823e-03 1.097582e-02 4.299467e-02
## [31] 1.339899e-01 2.092384e-01 2.682824e-02 2.166857e-02 5.332922e-03
## [36] 1.368756e-01 1.143787e+00 6.076660e-03 4.606802e-03 2.748152e-01
## [41] 7.144623e-02 7.678407e-02 8.603165e-03 2.511104e-04 2.357906e-01
## [46] 1.229911e-03 8.540354e-02 2.567427e-02 5.371906e-02 2.980774e-02

# G = 0.8
rgamma(n = N, shape = VECTOR.K.Gam[8], scale = 1)

##  [1] 9.139057e-03 8.054584e-02 1.447501e-04 2.406610e-04 2.264403e-02
##  [6] 1.543411e-04 3.775935e-01 1.365707e-01 8.561590e-03 1.427175e+00
## [11] 1.491474e-02 2.471447e-01 1.223016e-06 4.169344e-01 1.610903e-06
## [16] 1.032937e-02 3.749964e-10 1.129319e-02 6.263633e-05 3.903581e-03
## [21] 2.979187e-04 8.504979e-02 2.076813e-02 8.044114e-02 4.295114e-01
## [26] 6.369536e-01 4.033731e-02 3.204659e-02 2.851867e-01 5.099091e-01
## [31] 2.212640e-02 3.544844e-03 1.148871e-01 3.049286e-01 9.210711e-06
## [36] 3.295236e-04 3.987036e-02 7.126610e-06 1.247532e-02 1.401004e+00
## [41] 7.477153e-02 1.293020e-01 3.397923e-02 8.715096e-02 1.258967e-01
## [46] 9.072160e-04 1.804427e-04 9.305137e-02 1.876970e-01 8.002716e-03

5 Estimators of the Gini index

For both infinite and finite populations, this section describes the expressions for the various estimators of the Gini index, and provides the corresponding R codes for computing such estimators.

5.1 Infinite populations

Examples for infinite populations are based on this sample:

# Sample selection with size n = 50 from the pareto distribution. G = 0.5. 
set.seed(123)
Y.n <- rpareto(n=50, scale= 1, shape=1.5)
Y.n

##  [1]  2.295250  1.171846  1.814962  1.086477  1.041768  7.839706  1.530569
##  [8]  1.078833  1.487090  1.686424  1.029854  1.694550  1.296274  1.450159
## [15]  4.553238  1.072905  2.546479  8.268442  2.102910  1.031529  1.081160
## [22]  1.277203  1.345811  1.003838  1.324933  1.258233  1.500487  1.414946
## [29]  2.286870  3.588377  1.025436  1.070943  1.279788  1.164801 11.818134
## [36]  1.636207  1.202389  2.774286  2.145609  2.651401  3.660281  1.798669
## [43]  1.801051  1.944332  3.504223  3.730162  2.640708  1.663793  2.417921
## [50]  1.107643

R code for the estimator

\[\widehat{G}_{n}^{a}= \frac{1}{n\overline{y}}\sum_{i \in S}[\{2\widehat{F}_{n}(y_{i})-1\}y_{i}]= \frac{1}{T}\sum_{i \in S}[\{2\widehat{F}_{n}(y_{i})-1\}y_{i}],\]
where \(T=\sum_{i \in S}y_{i}\).

Gn.a <- function(y)
{
   Funct.FDi <- function(Yi, Y) mean(Y<=Yi)
   FDi <- sapply(y, Funct.FDi, Y=y)
   Output <- (1/sum(y))*sum( (2*FDi-1)*y)
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.a(Y.n)

## [1] 0.3803843

R code for the estimator

\[\widehat{G}_{n}^{b}=\frac{1}{2n^{2}\overline{y}}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|= \frac{1}{2nT}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|.\]

Gn.b <- function(y)
{
   n <- length(y)
   Funct.ABSi <- function(Yi, Y) sum(abs(Y-Yi))
   ABSi <- sapply(y, Funct.ABSi, Y=y)
   Output <- sum(ABSi)/(2*n*sum(y))
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.b(Y.n)

## [1] 0.3603843

R code for the estimator

\[\widehat{G}_{n}^{b1} = \displaystyle \frac{2}{ n^{2}\overline{y} } \sum_{i \in S} i y_{(i)} - \frac{n+1}{n} =\frac{2}{ nT } \sum_{i=1}^{n} i y_{(i)} - \frac{n+1}{n}.\]

Gn.b1 <- function(y)
{
   n <- length(y)
   Output <- (2/(n*sum(y)))*sum((1:n)*sort(y))-(n+1)/n
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.b1(Y.n)

## [1] 0.3603843

R code for the estimator

\[\widehat{G}_{n}^{b2} = \displaystyle\frac{2\widehat{\beta}}{n} - \frac{n+1}{n}.\]

Gn.b2 <- function(y)
{
   n <- length(y)
   Beta <- sum((1:n)*sort(y))/sum(y) 
   Output <- 2*Beta/n  - (n+1)/n
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.b2(Y.n)

## [1] 0.3603843

R code for the estimator

\[\widehat{G}_{n}^{b3} = \displaystyle \frac{2}{n\overline{y}} cov(i, y_{(i)}) = \frac{2}{T} cov(i, y_{(i)}).\]

Gn.b3 <- function(y)
{
   n <- length(y)
   Output <- (2/sum(y))*(mean((1:n)*sort(y))-mean(1:n)*mean(y)) 
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.b3(Y.n)

## [1] 0.3603843

R code for the estimator

\[\widehat{G}_{n}^{c}= \frac{n}{n-1}\widehat{G}_{n}^{b}=\frac{1}{2(n-1)T}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|.\]

Gn.c <- function(y)
{
   n <- length(y)
   Funct.ABSi <- function(Yi, Y) sum(abs(Y-Yi))
   ABSi <- sapply(y, Funct.ABSi, Y=y)
   Output <- sum(ABSi)/(2*(n-1)*sum(y))
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.c(Y.n)

## [1] 0.3677391

R code for the estimator

\[\widehat{G}_{n}^{c1} = \displaystyle \frac{2}{ n(n-1)\overline{y} } \sum_{i \in S} i y_{(i)} - \frac{n+1}{n-1} =\frac{2}{ (n-1)T } \sum_{i=1}^{n} i y_{(i)} - \frac{n+1}{n-1}.\]

Gn.c1 <- function(y)
{
   n <- length(y)
   Output <- (2/((n-1)*sum(y)))*sum((1:n)*sort(y))-(n+1)/(n-1)
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.c1(Y.n)

## [1] 0.3677391

R code for the estimator

\[\widehat{G}_{n}^{c2} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{1 \leq i_{1} < i_{2} \leq n} |y_{i_{1}}-y_{i_{2}}|\]

Gn.c2 <- function(y)
{
   n <- length(y)
   Funct.apply <- function(x, y) y[x]
   Matrix.Y <- apply(combn(n,2),2,Funct.apply, y = y)
   SUM <- sum(abs(Matrix.Y[1,]-Matrix.Y[2,]))
   Output <-(1/(2*mean(y))) *  SUM / choose(n,2)
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.c2(Y.n)

## [1] 0.3677391

R code for the estimator

\[\widehat{G}_{n}^{c3} =1 - \displaystyle \frac{\overline{z}}{\overline{y}}.\]

Gn.c3 <- function(y)
{
   n <- length(y)
   Funct.Min <- function(i, Y)  sum(pmin(Y[-i],Y[i]))
   zi <- sapply(1:n, Funct.Min, Y=y)/(n-1)
   Output <- 1- mean(zi)/mean(y)
   Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.c3(Y.n)

## [1] 0.3677391

R code for the bias corrected estimator (Ogwang’s jackknife)

\[\widehat{G}_{n}^{b2.Jo}= n\widehat{G}_{n}^{b2}-(n-1)\bar{G}_{n}^{b2.J},\] where \[\bar{G}_{n}^{b2.J}=\frac{1}{n}\sum_{k=1}^{n}\widehat{G}_{n}^{b2}(k)\] and \[\widehat{G}_{n}^{b2}(k)=\widehat{G}_{n}^{b2} + \frac{2}{n\overline{y}-y_{(k)}}\left\{\frac{y_{(k)}\widehat{\beta}}{n} + \displaystyle\frac{\sum_{i =1}^{n}iy_{(i)}}{n(n-1)} - \frac{n\overline{y}-\sum_{i=1}^{k}y_{(i)}+ky_{(k)}}{n-1} \right\} - \frac{1}{n(n-1)}.\]

Gn.b2.JO <- function(y)
{
 y <- sort(y)
 n <- length(y)
 SUM <- sum(y)
 SUM.iy <- sum((1:n)*y)
 Beta <- SUM.iy/SUM 
 G.b  <- 2*Beta/n  - (n+1)/n
 G.k <- G.b + (2/(SUM-y))*(y*Beta/n + SUM.iy/(n*(n-1)) - (SUM - cumsum(y) + (1:n)*y)/(n-1)  ) - 1/(n*(n-1))
 G.bar <- mean(G.k)
 Output <- n*G.b - (n - 1)*G.bar
 Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.b2.JO(Y.n)

## [1] 0.3742264

R code for the bias corrected estimator (jackknife)

\[\widehat{G}_{n}^{c.Jo}= n\widehat{G}_{n}^{c}-(n-1)\bar{G}_{n}^{c.J},\] where \[\bar{G}_{n}^{c.J}=\frac{1}{n}\sum_{k=1}^{n}\widehat{G}_{n}^{c}(k),\] and \(\widehat{G}_{n}^{c}(k)\) is the estimator \(\widehat{G}_{n}^{c}\) computed from the observations \(\{y_{(i)}:i \in S\}\) after removing the \(k\)-th unit.

GINI.Jackk.c <- function(y)
{
N <- length(y)
Theta <- sum((1:N)*sort(y))/sum(y) 
Output <- 2*Theta/(N-1)  - (N+1)/(N-1)
Output
}

Gn.c.JO <- function(y)
{
 y <- sort(y)
 n <- length(y)
 SUM <- sum(y)
 SUM.iy <- sum((1:n)*y)
 Beta <- SUM.iy/SUM 
 G.c  <- 2*Beta/(n-1)  - (n+1)/(n-1)
 SOL.Jackk <- bootstrap::jackknife(y, GINI.Jackk.c)
 Output <- G.c - SOL.Jackk$jack.bias
 Output
}
# Argument 
## y: vector with the sample observations of the variable Y.

# Usage
Gn.c.JO(Y.n)

## [1] 0.3743616

R code for the bias corrected estimator (additive bootstrap)

\[\widehat{G}_{n}^{c.Ba}= \widehat{G}_{n}^{c}-(\bar{G}_{n}^{c.B}-\widehat{G}_{n}^{c})=2\widehat{G}_{n}^{c}-\bar{G}_{n}^{c.B},\] where \(\bar{G}_{n}^{c.B}\) is the empirical expected value of \(\widehat{G}_{n}^{c}\) based on \(B\) bootstrap samples. The empirical expected value based on \(B\) bootstrap samples of a given statistic \(\widehat{\theta}_{n}\) is defined by \[ E_{B}[\widehat{\theta}_{n}]=\bar{\theta}_{n}^{B}=\frac{1}{B}\sum_{b=1}^{B}\widehat{\theta}_{n}^{(b)}, \] where \(\widehat{\theta}_{n}^{(b)}\) is the statistic \(\widehat{\theta}_{n}\) evaluated at the \(b\)-th bootstrap sample \(S^{(b)}\).

# Computes 2 Statistics for Bootstrap used in various estimators  
GINI.c.Ske.boot <- function(Y, indices)
{
Y   <- Y[indices]
N.boot  <- length(Y)  
SUM.Y   <- sum(Y)
Nk <- rank(Y)
Gini <- (2/(N.boot*SUM.Y))*sum(Nk*Y) - (N.boot+1)/N.boot
Gini <- N.boot*Gini/(N.boot-1)
c(Gini,skewness(Y))
}


Gn.c.Ba <- function(y, R.boot = 1000)
{
SOL.boot    <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t      <- mean(SOL.boot$t[,1])
Output  <- 2*SOL.boot$t0[1] - MEAN.t 
Output
}
# Argument 
## y:      vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.

# Usage
Gn.c.Ba(Y.n)

## [1] 0.3795352

R code for the bias corrected estimator (multiplicative bootstrap)

\[\widehat{G}_{n}^{c.Bm}= \frac{\left(\widehat{G}_{n}^{c}\right)^2}{\bar{G}_{n}^{c.B}}.\]

Gn.c.Bm <- function(y, R.boot = 1000)
{
SOL.boot    <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t      <- mean(SOL.boot$t[,1])
Output  <- SOL.boot$t0[1]^2/MEAN.t 
Output
}
# Argument 
## y:      vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.

# Usage
Gn.c.Bm(Y.n)

## [1] 0.3814144

R code for the bias corrected estimator (additive and multiplicative bootstrap)

\[ \widehat{G}_{n}^{c.Bam} = \left\{ \begin{array}{ll} 2\widehat{G}_{n}^{c}-\bar{G}_{n}^{c.B} & \quad \mbox{if } \widehat{G}_{n}^{c} \geq \bar{G}_{n}^{c.B} \\ & \\ \widehat{G}_{n}^{c} \exp\left\{-\left(\bar{G}_{n}^{c.B} - \widehat{G}_{n}^{c}\right)/\bar{G}_{n}^{c.B} \right\} & \quad \mbox{otherwise} \end{array} \right. \]

Gn.c.Bam <- function(y, R.boot = 1000)
{
SOL.boot    <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t      <- mean(SOL.boot$t[,1])
if (SOL.boot$t0[1] >= MEAN.t)  Output <- 2*SOL.boot$t0[1] - MEAN.t
else                           Output <- SOL.boot$t0[1]*exp(-(MEAN.t - SOL.boot$t0[1])/MEAN.t)
Output
}
# Argument 
## y:      vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.

# Usage
Gn.c.Bam(Y.n)

## [1] 0.3820706

R code for the bias corrected estimator (parametric bootstrap)

\[\widehat{G}_{n}^{c.Bp}= \widehat{G}_{n}^{c}-\widehat{q}_{opt}(\widehat{G}_{n}^{c}; \bar{G}_{n}^{c.B}; \widehat{\gamma}_{n}; \bar{\gamma}_{n}^{B}),\] where \[ \widehat{\gamma}_{n}=\displaystyle\frac{\mu_{n.3}}{\sigma_{n}^3} \] is the coefficient of skewness, \(\sigma_{n}=(\mu_{n.2})^{1/2}\) is the sample standard deviation, and \(\mu_{n.\alpha}=n^{-1}\sum_{i \in S}(y_{i}-\overline{y})^\alpha\) is the \(\alpha\)-th central moment based on \(S\).

Gn.c.Bp <- function(y, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
Num.Functions <- 7 # Number of bias functions.
n <- length(y)
SOL.Boot    <- boot(y, GINI.c.Ske.boot, R = R.boot)
Values.Boot.t.G <- SOL.Boot$t[,1]
Exp.G <- mean(SOL.Boot$t[,1])
Exp.Gamma <- mean(SOL.Boot$t[,2])
G.Hat     <- SOL.Boot$t0[1]
Gamma.Hat <- SOL.Boot$t0[2]
L.G <- quantile(Values.Boot.t.G, 0.005, names=F)
U.G <- quantile(Values.Boot.t.G, 0.995, names=F)
Vector.GT <- runif(Value.T, L.G, U.G)   

TV <-Value.T - Value.V
POS.TV <- sample(Value.T, TV)
POS.V <-setdiff(1:Value.T, POS.TV)

MATRIX.T <- data.frame(Y=double(TV), Gt.HAT=double(TV), Gamma1t.HAT=double(TV), Exp.Gt=double(TV), Exp.Gamma1t=double(TV))
t <- 0

Vector.Y     <- c() 

for (tv in POS.TV) 
{
t <- t + 1
Gt          <- Vector.GT[tv] 

if (Distribution==1) # LOGNORMAL 
{
Sigmat      <- qnorm( (Gt+1)/2)*sqrt(2)
Yr          <- rlnorm(n, meanlog =5, sdlog = Sigmat)
}

if (Distribution==2) # PARETO 
{
Alphat  <- (1+Gt)/(2*Gt)
Yr      <- rpareto(n, scale= 1, shape=Alphat)
}

if (Distribution==3) # Dagum p=20 
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=20)
Parameter.at <- SOL.root$root
Yr      <- rdagum(n, shape1.a = Parameter.at, shape2.p = 20)
}

if (Distribution==4) # Dagum p=0.5 
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=0.5)
Parameter.at <- SOL.root$root
Yr      <- rdagum(n, shape1.a = Parameter.at, shape2.p = 0.5)
}

SOL.boot.t  <- boot(Yr, GINI.c.Ske.boot, R = R.boot)
Exp.G.t     <- mean(SOL.boot.t$t[,1])
Exp.Gamma.t <- mean(SOL.boot.t$t[,2])
G.Hat.t     <- SOL.boot.t$t0[1]
Gamma.Hat.t <- SOL.boot.t$t0[2]

MATRIX.T[t,2] <- G.Hat.t 
MATRIX.T[t,3] <- Gamma.Hat.t
MATRIX.T[t,4] <- Exp.G.t  
MATRIX.T[t,5] <- Exp.Gamma.t

Sum.Gr.Par <- 0

for (r in 1:R)
{
if (Distribution==1) Yr <- rlnorm(n, meanlog =5, sdlog = Sigmat)
if (Distribution==2) Yr <- rpareto(n, scale= 1, shape=Alphat)
if (Distribution==3) Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 20)
if (Distribution==4) Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 0.5)

 Gr.Par <- (2/(n*sum(Yr)))*sum(rank(Yr)*Yr) - (n+1)/n
 Gr.Par <- n*Gr.Par/(n-1)
 Sum.Gr.Par <- Sum.Gr.Par + Gr.Par 
}
 Exp.Gr.Par <- Sum.Gr.Par/R
 Vector.Y <-c(Vector.Y, Exp.Gr.Par - Gt)

} # for (t in 1:Value.T)
   
MATRIX.T[,1] <- Vector.Y    

 # Training group
X1 <- MATRIX.T[,4]-MATRIX.T[,2]
X2 <- MATRIX.T[,5]-MATRIX.T[,3]
   
Func.q.Par1 <- lm(Y ~ X1, MATRIX.T)
Func.q.Par2 <- lm(Y ~ X1 + Gamma1t.HAT, MATRIX.T)
Func.q.Par3 <- lm(Y ~ X1 + Exp.Gamma1t, MATRIX.T)
Func.q.Par4 <- lm(Y ~ X2, MATRIX.T)
Func.q.Par5 <- lm(Y ~ X1 + X2, MATRIX.T)
Func.q.Par6 <- lm(Y ~ Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)
Func.q.Par7 <- lm(Y ~ Gt.HAT + Exp.Gt + Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)

Coef.q.Par1  <- Func.q.Par1$coefficients
Coef.q.Par2  <- Func.q.Par2$coefficients
Coef.q.Par3  <- Func.q.Par3$coefficients
Coef.q.Par4  <- Func.q.Par4$coefficients
Coef.q.Par5  <- Func.q.Par5$coefficients
Coef.q.Par6  <- Func.q.Par6$coefficients
Coef.q.Par7  <- Func.q.Par7$coefficients

SUM.MSE.Par <- double(Num.Functions)  

for (i in 1:Value.V)
{
Gi <- Vector.GT[i]  

if (Distribution==1)
{
Sigmai       <- qnorm( (Gi+1)/2)*sqrt(2)
Yi          <- rlnorm(n, meanlog =5, sdlog = Sigmai)
}

if (Distribution==2)
{
Alphai <- (1+Gi)/(2*Gi)  
Yi   <- rpareto(n, scale= 1, shape=Alphai)
}

if (Distribution==3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p=20)
Parameter.ai <- SOL.root$root
Yi      <- rdagum(n, shape1.a = Parameter.ai, shape2.p = 20)
}

if (Distribution==4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p=0.5)
Parameter.ai <- SOL.root$root
Yi      <- rdagum(n, shape1.a = Parameter.ai, shape2.p = 0.5)
}

SOL.boot.i      <- boot(Yi, GINI.c.Ske.boot, R = R.boot)
Exp.G.i     <- mean(SOL.boot.i$t[,1])
Exp.Gamma.i <- mean(SOL.boot.i$t[,2])
G.Hat.i     <- SOL.boot.i$t0[1]
Gamma.Hat.i <- SOL.boot.i$t0[2]
   
qv1 <-  Coef.q.Par1[[1]] + Coef.q.Par1[[2]]*(Exp.G.i-G.Hat.i)      
qv2 <-  Coef.q.Par2[[1]] + Coef.q.Par2[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par2[[3]]*Gamma.Hat.i 
qv3 <-  Coef.q.Par3[[1]] + Coef.q.Par3[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par3[[3]]*Exp.Gamma.i 
qv4 <-  Coef.q.Par4[[1]] + Coef.q.Par4[[2]]*(Exp.Gamma.i-Gamma.Hat.i)      
qv5 <-  Coef.q.Par5[[1]] + Coef.q.Par5[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par5[[3]]*(Exp.Gamma.i-Gamma.Hat.i)      
qv6 <-  Coef.q.Par6[[1]] + Coef.q.Par6[[2]]*Gamma.Hat.i + Coef.q.Par6[[3]]*Exp.Gamma.i
qv7 <-  Coef.q.Par7[[1]] + Coef.q.Par7[[2]]*G.Hat.i     + Coef.q.Par7[[3]]*Exp.G.i 
                          + Coef.q.Par7[[4]]*Gamma.Hat.i + Coef.q.Par7[[5]]*Exp.Gamma.i

Estv1 <-  G.Hat.i - qv1
Estv2 <-  G.Hat.i - qv2
Estv3 <-  G.Hat.i - qv3
Estv4 <-  G.Hat.i - qv4
Estv5 <-  G.Hat.i - qv5
Estv6 <-  G.Hat.i - qv6
Estv7 <-  G.Hat.i - qv7

SUM.MSE.Par <- SUM.MSE.Par + c( (Estv1-Gi)^2,  (Estv2-Gi)^2, (Estv3-Gi)^2,
                                (Estv4-Gi)^2,  (Estv5-Gi)^2, (Estv6-Gi)^2, (Estv7-Gi)^2)

} # for (v in POS.V)

OPT.MSE.Par <- which.min(SUM.MSE.Par)

if (OPT.MSE.Par ==1)  q.opt.Par <- Coef.q.Par1[[1]] + Coef.q.Par1[[2]]*(Exp.G-G.Hat)  
if (OPT.MSE.Par ==2)  q.opt.Par <- Coef.q.Par2[[1]] + Coef.q.Par2[[2]]*(Exp.G-G.Hat) + Coef.q.Par2[[3]]*Gamma.Hat            
if (OPT.MSE.Par ==3)  q.opt.Par <- Coef.q.Par3[[1]] + Coef.q.Par3[[2]]*(Exp.G-G.Hat) + Coef.q.Par3[[3]]*Exp.Gamma           
if (OPT.MSE.Par ==4)  q.opt.Par <- Coef.q.Par4[[1]] + Coef.q.Par4[[2]]*(Exp.Gamma-Gamma.Hat) 
if (OPT.MSE.Par ==5)  q.opt.Par <- Coef.q.Par5[[1]] + Coef.q.Par5[[2]]*(Exp.G-G.Hat) +                                           Coef.q.Par5[[3]]*(Exp.Gamma-Gamma.Hat) 
if (OPT.MSE.Par ==6)   q.opt.Par <-  Coef.q.Par6[[1]] + Coef.q.Par6[[2]]*Gamma.Hat + 
                                     Coef.q.Par6[[3]] * Exp.Gamma
if (OPT.MSE.Par ==7)   q.opt.Par <-  Coef.q.Par7[[1]] + Coef.q.Par7[[2]]*G.Hat + 
                                     Coef.q.Par7[[3]] * Exp.G + 
                                     Coef.q.Par7[[4]] * Gamma.Hat + 
                                     Coef.q.Par7[[5]] * Exp.Gamma

Output <- G.Hat - q.opt.Par 
Output
}
# Argument 
## y:            vector with the sample observations of the variable Y.
## R.boot:       number of bootstrap samples.
## R:            number of replications. 
## Value.T:      Total number of samples (Training + Validation group).
## Value.V:      number of samples in the Validation group.
## Distribution: Probabilistic distribution for the parametric bootstrap.
### 1 = Lognormal; 2 = Pareto; 3 = Dagum with p = 20; 4 = Dagum with p = 0.5.

# Usage
Gn.c.Bp(Y.n, Distribution = 2)

## [1] 0.3864234

5.2 Finite populations

Examples for finite populations are based on this sample selected under randomized systematic sampling design. The effect of the design is increased by generating inclusion probabilities \(\pi_i\) with a correlation of \(0.7\) between \(\pi_i\) and \(y_i\)

# Population, sample selection ans survey weights (using SRSWOR). n = 50 and G = 0.5. 
# Population and sample selection and survey weights
set.seed(123)
N <- 10000 # Population size.
n <- 50
Labels <- (1:N)
Cor <- 0.7
Y.N <- rpareto(n=N, scale= 1, shape=1.5)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.N))
Size.Variable.U <- 1 + Cor * Y.N  + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.N[Sample]
Prob.S <- Prob.U[Sample]
w.n <- 1/Prob.S
n <- length(Sample)
Y.n

##  [1]  1.256281  1.153969  1.274485  1.007599  3.253916  1.148925 17.619978
##  [8]  1.390136  8.977209  1.711177  1.515922  1.216494  1.924554  2.066960
## [15]  1.182792  4.700734  1.277821  1.951636  2.607654  1.038222  1.283669
## [22]  1.898321  6.621856 16.672066  2.893945  2.263599  1.313010  1.392572
## [29]  1.014446  3.824858  2.699629  1.777259  1.270751  1.030797  2.119897
## [36]  2.272461  1.219897  3.254782  1.201556  1.134711  1.334193  1.555776
## [43]  1.571195  6.969796  1.004846  1.091881  2.716155  1.049775  1.205693
## [50]  1.074515

w.n

##  [1] 201.2816 189.7074 166.6904 207.3157 138.3292 278.9388 140.5038 209.0983
##  [9] 238.6331 205.3096 314.5301 219.3074 206.2540 189.6478 144.7267 262.5656
## [17] 186.2906 215.6584 239.8989 172.3857 169.0149 267.6195 157.5581 162.2901
## [25] 156.1198 253.0959 173.3194 142.1086 164.6035 205.1006 156.5991 197.0950
## [33] 151.5781 260.0406 224.7287 322.2452 179.0356 269.4481 298.5258 190.6057
## [41] 175.9704 191.0797 229.3220 228.9040 179.0971 147.8013 377.6547 249.1136
## [49] 307.9467 167.5910

R code for the estimator

\[\widehat{G}_{w}^{a}= \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1.\]

Gw.a <- function(y,w)
{
   N.Hat <- sum(w)
   Funct.FDi <- function(Yi, Y, W, N.Hat) (1/N.Hat)*sum(W*(Y<=Yi))
   FDi <- sapply(y, Funct.FDi, Y=y, W=w, N.Hat=N.Hat)
   Mean.y <- (1/N.Hat)*sum(w*y)
   Output <- (2/(N.Hat*Mean.y))*sum(w*y*FDi) - 1
   Output
}
# Arguments 
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.

# Usage
Gw.a(Y.n, w.n)

## [1] 0.4473034

R code for the estimator

\[\widehat{G}_{w}^{b}=\frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|.\]

Gw.b <- function(y,w)
{
   N.Hat <- sum(w)
   Funct.ABSi <- function(Yi, Y, W) sum(W*abs(Y-Yi))
   ABSi <- sapply(y, Funct.ABSi, Y=y, W=w)
   Mean.y <- (1/N.Hat)*sum(w*y)
   Output <- sum(w*ABSi)/(2*N.Hat^2*Mean.y)
   Output
}
# Arguments 
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.

# Usage
Gw.b(Y.n, w.n)

## [1] 0.4268247

R code for the estimator

\[\widehat{G}_{w}^{c} =1 - \displaystyle \frac{\overline{z}_{w}}{\overline{y}_{w}}.\]

# Function Gw.c
Gw.c <- function(y,w)
{
   N.Hat <- sum(w)
   n <- length(y)
   Funct.Min <- function(i, Y, W, N.Hat)  sum(W[-i]*pmin(Y[-i],Y[i]))/(N.Hat-W[i])
   zi <- sapply(1:n, Funct.Min, Y=y, W=w, N.Hat=N.Hat)
   Mean.z <- (1/N.Hat)*sum(w*zi)
   Mean.y <- (1/N.Hat)*sum(w*y)
   Output <- 1- Mean.z/Mean.y
   Output
}
# Arguments 
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.

# Usage
Gw.c(Y.n, w.n)

## [1] 0.4352065

R code for the bias corrected estimator (additive bootstrap)

\[\widehat{G}_{w}^{c.Ba}= \widehat{G}_{w}^{c}-(\bar{G}_{w}^{c.B}-\widehat{G}_{w}^{c})=2\widehat{G}_{w}^{c}-\bar{G}_{w}^{c.B}.\]

First, we create some functions require to compute various estimators for the case of finite populations. In particular, the function Rescaled.Bootstrap computes the estimates of the Gini index for a set of bootstrap samples.

MU.w <- function(y,w,ALPHA)
{
   N.Hat <- sum(w)
   Mean.Hat <- (1/N.Hat)*sum(w*y)
   (1/N.Hat)*sum(w*(y-Mean.Hat)^ALPHA)
}

Ske.W <- function(y,w) MU.w(y,w,3)/(MU.w(y,w,2)^(3/2))

BOOTSTRAP.WEIGHTS <- function(n, Weight.S)
{
SAMPLE <- table(sample(n, n, replace = T)) 
Ri     <- as.integer(SAMPLE)
UNITS   <- as.integer(names(SAMPLE))
UNITS.OUT <- setdiff(1:n, UNITS)
LEN.OUT   <- length(UNITS.OUT)
Ri2 <- c(Ri, rep(0,LEN.OUT))
UNITS2 <- c(UNITS,UNITS.OUT)
ORDER <- order(UNITS2)
Ri.order <- Ri2[ORDER]
OUTPUT <- Ri.order * Weight.S * n / (n-1) 
OUTPUT
}

Rescaled.Bootstrap <- function(y, R.boot=1000, w)
{
n <- length(y)
Sum.Ske   <- 0
Vector.Gini.B <- c()
for (i in 1:R.boot)
  {
w.b <- BOOTSTRAP.WEIGHTS(n, w)
Vector.Gini.B <- c(Vector.Gini.B, Gw.c(y, w.b))
Sum.Ske  <- Sum.Ske + Ske.W(y,w.b) 
  }
list(Mean.Gini =mean(Vector.Gini.B), Mean.Ske = Sum.Ske/R.boot, Vector.Gini.B = Vector.Gini.B)  
}

# Usage
Rescaled.Bootstrap(Y.n, R.boot = 50, w= w.n)

## $Mean.Gini
## [1] 0.4421744
## 
## $Mean.Ske
## [1] 3.387153
## 
## $Vector.Gini.B
##  [1] 0.3217424 0.4774353 0.4500154 0.3835550 0.4229068 0.2654164 0.4687924
##  [8] 0.5061761 0.4460293 0.3617398 0.4529941 0.3588451 0.4760658 0.4765871
## [15] 0.4778994 0.4336448 0.4411065 0.4088126 0.4607407 0.5327704 0.4221788
## [22] 0.4470892 0.4895719 0.3985759 0.4257491 0.3859401 0.4804542 0.4818865
## [29] 0.4718366 0.5603562 0.4309820 0.2805123 0.4692137 0.4152673 0.3765406
## [36] 0.4918474 0.3956425 0.4542818 0.5026744 0.3778840 0.4168256 0.4680288
## [43] 0.4881710 0.4264974 0.4413882 0.5134525 0.4210026 0.5144043 0.5329497
## [50] 0.5042383

The function Gw.c.Ba gives the bias corrected estimator of the Gini index based on the additive bootstrap.

Gw.c.Ba <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Output  <- 2*Gw.c(y, w) - SOL.ResBoot$Mean.Gini   
Output
}
# Arguments 
## y:      vector with the sample observations of the variable Y.
## w:      vector with the sample survey weights.
## R.boot: Number of bootstrap samples.

# Usage
Gw.c.Ba(Y.n, w.n)

## [1] 0.4400799

R code for the bias corrected estimator (multiplicative bootstrap)

\[\widehat{G}_{w}^{c.Bm}= \frac{\left(\widehat{G}_{w}^{c}\right)^2}{\bar{G}_{w}^{c.B}}.\]

Gw.c.Bm <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Output  <- Gw.c(y, w)^2/SOL.ResBoot$Mean.Gini   
Output
}
# Arguments 
## y:      vector with the sample observations of the variable Y.
## w:      vector with the sample survey weights.
## R.boot: number of bootstrap samples.

# Usage
Gw.c.Bm(Y.n, w.n)

## [1] 0.4405649

R code for the bias corrected estimator (additive and multiplicative bootstrap)

\[ \widehat{G}_{w}^{c.Bam} = \left\{ \begin{array}{ll} 2\widehat{G}_{w}^{c}-\bar{G}_{w}^{c.B} & \quad \mbox{if } \widehat{G}_{w}^{c} \geq \bar{G}_{w}^{c.B} \\ & \\ \widehat{G}_{w}^{c} \exp\left\{-\left(\bar{G}_{w}^{c.B} - \widehat{G}_{w}^{c}\right)/\bar{G}_{w}^{c.B} \right\} & \quad \mbox{otherwise} \end{array} \right. \]

Gw.c.Bam <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Gc <- Gw.c(y, w) 
if (Gc >= SOL.ResBoot$Mean.Gini)  Output <- 2*Gc - SOL.ResBoot$Mean.Gini
else  Output <- Gc*exp(-(SOL.ResBoot$Mean.Gini - Gc)/SOL.ResBoot$Mean.Gini)
Output
}
# Arguments 
## y:      vector with the sample observations of the variable Y.
## w:      vector with the sample survey weights.
## R.boot: number of bootstrap samples.

# Usage
Gw.c.Bam(Y.n, w.n)

## [1] 0.4429545

R code for the bias corrected estimator (parametric bootstrap)

\[\widehat{G}_{w}^{c.Bp}= \widehat{G}_{w}^{c}-\widehat{q}_{opt}(\widehat{G}_{w}^{c}; \bar{G}_{w}^{c.B}; \widehat{\gamma}_{w}; \bar{\gamma}_{w}^{B}),\] where \[\widehat{\gamma}_{w}=\displaystyle\frac{\mu_{w.3}}{\sigma_{w}^3}\] is the weighted coefficient of skewness, \(\sigma_{w}=(\mu_{w.2})^{1/2}\) and \(\mu_{w.\alpha}=\widehat{N}^{-1}\sum_{i \in S}w_{i}(y_{i}-\overline{y}_{w})^\alpha\).

Gw.c.Bp <- function(y, w, N, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
Num.Functions <- 7 # Number of bias functions.
n <- length(y)
Labels <- (1:N)
Cor <- 0.7

Gamma.Hat <- Ske.W(y, w)
G.Hat <- Gw.c(y,w)


SOL.Boot <- Rescaled.Bootstrap(y, R.boot, w)

# Step 1. Generate T parameters from interval [G.L, G.U]
Values.Boot.t.G <- SOL.Boot$Vector.Gini.B  
Exp.G           <- SOL.Boot$Mean.Gini      
Exp.Gamma       <- SOL.Boot$Mean.Ske       

L.G <- quantile(Values.Boot.t.G, 0.005, names=F)
U.G <- quantile(Values.Boot.t.G, 0.995, names=F)
Vector.GT <- runif(Value.T, L.G, U.G)   

TV <-Value.T - Value.V
POS.TV <- sample(Value.T, TV)
POS.V <-setdiff(1:Value.T, POS.TV)

MATRIX.T <- data.frame(Y=double(TV), Gt.HAT=double(TV), Gamma1t.HAT=double(TV), Exp.Gt=double(TV), Exp.Gamma1t=double(TV))

t <- 0
Vector.Y.F <- c()
for (tv in POS.TV) 
{
   t <- t + 1
   Gt <- Vector.GT[tv]  

if (Distribution==1) # LOGNORMAL 
{
Sigmat          <- qnorm( (Gt+1)/2)*sqrt(2)
Yr.FU          <- rlnorm(N, meanlog =5, sdlog = Sigmat)
}
   
if (Distribution==2) # PARETO 
{
Alphat      <- (1+Gt)/(2*Gt)
Yr.FU      <- rpareto(N, scale= 1, shape=Alphat)
}

if (Distribution==3) # Dagum p=20 
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=20)
Parameter.at <- SOL.root$root
Yr.FU      <- rdagum(N, shape1.a = Parameter.at, shape2.p = 20)
}

if (Distribution==4) # Dagum p=0.5 
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=0.5)
Parameter.at <- SOL.root$root
Yr.FU      <- rdagum(N, shape1.a = Parameter.at, shape2.p = 0.5)
}
Residual.SD <- sqrt(1 -Cor^2)*sqrt(var(Yr.FU))
Size.Variable.U <- 1 + Cor * Yr.FU  + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
   
## Step 4 (Expected values based on Bootstrap samples)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yr.F <- Yr.FU[Sample]
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.t <- length(Sample)
SOL.boot.t   <- Rescaled.Bootstrap(Yr.F, R.boot, Weight.S)
Exp.G.t     <- SOL.boot.t$Mean.Gini    
Exp.Gamma.t <- SOL.boot.t$Mean.Ske     
G.Hat.t     <- Gw.c(Yr.F, Weight.S)
Gamma.Hat.t <- Ske.W(Yr.F, Weight.S)     

MATRIX.T[t,2] <- G.Hat.t 
MATRIX.T[t,3] <- Gamma.Hat.t
MATRIX.T[t,4] <- Exp.G.t  
MATRIX.T[t,5] <- Exp.Gamma.t

# Step 5 (Expected values based on Replications)
Sum.Gr <- 0
for (r in 1:R)
{
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yr.F <- Yr.FU[Sample] 
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.r <- length(Sample)   
Gr.F <- Gw.c(Yr.F, Weight.S) 
Sum.Gr <- Sum.Gr + Gr.F 
}
   rm(Yr.FU)
   Exp.Gr <- Sum.Gr/R
   Vector.Y.F <-c(Vector.Y.F, Exp.Gr - Gt)
   
} # for (t in 1:Value.T)
   
MATRIX.T[,1] <- Vector.Y.F    
   
# Training group
X1.F <- MATRIX.T[,4]-MATRIX.T[,2]
X2.F <- MATRIX.T[,5]-MATRIX.T[,3]

Func.q.Par1.F <- lm(Y ~ X1.F,      MATRIX.T)
Func.q.Par2.F <- lm(Y ~ X1.F + Gamma1t.HAT,          MATRIX.T)
Func.q.Par3.F <- lm(Y ~ X1.F + Exp.Gamma1t,          MATRIX.T)
Func.q.Par4.F <- lm(Y ~ X2.F,      MATRIX.T)
Func.q.Par5.F <- lm(Y ~ X1.F + X2.F, MATRIX.T)
Func.q.Par6.F <- lm(Y ~ Gamma1t.HAT + Exp.Gamma1t,   MATRIX.T)
Func.q.Par7.F <- lm(Y ~ Gt.HAT + Exp.Gt + Gamma1t.HAT + Exp.Gamma1t,      MATRIX.T)

Coef.q.Par1.F  <- Func.q.Par1.F$coefficients
Coef.q.Par2.F  <- Func.q.Par2.F$coefficients
Coef.q.Par3.F  <- Func.q.Par3.F$coefficients
Coef.q.Par4.F  <- Func.q.Par4.F$coefficients
Coef.q.Par5.F  <- Func.q.Par5.F$coefficients
Coef.q.Par6.F  <- Func.q.Par6.F$coefficients
Coef.q.Par7.F  <- Func.q.Par7.F$coefficients

SUM.MSE <- double(Num.Functions)  
for (i in 1:Value.V)
{
Gi <- Vector.GT[i]  

if (Distribution==1)
{
Sigmai       <- qnorm( (Gi+1)/2)*sqrt(2)
Yi.FU          <- rlnorm(N, meanlog =5, sdlog = Sigmai)
}

if (Distribution==2)
{
Alphai <- (1+Gi)/(2*Gi)  
Yi.FU   <- rpareto(N, scale= 1, shape=Alphai)
}

if (Distribution==3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p = 20)
Parameter.ai <- SOL.root$root
Yi.FU      <- rdagum(N, shape1.a = Parameter.ai, shape2.p = 20)
}

if (Distribution==4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p = 0.5)
Parameter.ai <- SOL.root$root
Yi.FU      <- rdagum(N, shape1.a = Parameter.ai, shape2.p = 0.5)
}
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Yi.FU))
Size.Variable.U <- 1 + Cor * Yi.FU  + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yi.F <- Yi.FU[Sample]
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.i <- length(Sample)
rm(Yi.FU)
SOL.boot.i  <- Rescaled.Bootstrap(Yi.F, R.boot, Weight.S) 

Exp.G.i     <- SOL.boot.i$Mean.Gini  
Exp.Gamma.i <- SOL.boot.i$Mean.Ske   
G.Hat.i     <- Gw.c(Yi.F, Weight.S)
Gamma.Hat.i <- Ske.W(Yi.F, Weight.S)

qv1 <-  Coef.q.Par1.F[[1]] + Coef.q.Par1.F[[2]]*(Exp.G.i-G.Hat.i)      
qv2 <-  Coef.q.Par2.F[[1]] + Coef.q.Par2.F[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par2.F[[3]]*Gamma.Hat.i 
qv3 <-  Coef.q.Par3.F[[1]] + Coef.q.Par3.F[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par3.F[[3]]*Exp.Gamma.i 
qv4 <-  Coef.q.Par4.F[[1]] + Coef.q.Par4.F[[2]]*(Exp.Gamma.i-Gamma.Hat.i) 
qv5 <-  Coef.q.Par5.F[[1]] + Coef.q.Par5.F[[2]]*(Exp.G.i-G.Hat.i) +                              Coef.q.Par5.F[[3]]*(Exp.Gamma.i-Gamma.Hat.i)      
qv6 <-  Coef.q.Par6.F[[1]] + Coef.q.Par6.F[[2]]*Gamma.Hat.i + Coef.q.Par6.F[[3]]*Exp.Gamma.i
qv7 <-  Coef.q.Par7.F[[1]] + Coef.q.Par7.F[[2]]*G.Hat.i     + Coef.q.Par7.F[[3]]*Exp.G.i + Coef.q.Par7.F[[4]]*Gamma.Hat.i + Coef.q.Par7.F[[5]]*Exp.Gamma.i

Estv1 <-  G.Hat.i - qv1
Estv2 <-  G.Hat.i - qv2
Estv3 <-  G.Hat.i - qv3
Estv4 <-  G.Hat.i - qv4
Estv5 <-  G.Hat.i - qv5
Estv6 <-  G.Hat.i - qv6
Estv7 <-  G.Hat.i - qv7

SUM.MSE <- SUM.MSE + c( (Estv1-Gi)^2,  (Estv2-Gi)^2, (Estv3-Gi)^2,
                        (Estv4-Gi)^2,  (Estv5-Gi)^2, (Estv6-Gi)^2, (Estv7-Gi)^2)
} # for (v in POS.V)

OPT.MSE <- which.min(SUM.MSE)

if (OPT.MSE ==1)  q.opt <- Coef.q.Par1.F[[1]] + Coef.q.Par1.F[[2]]*(Exp.G-G.Hat)  

if (OPT.MSE ==2)   q.opt <-  Coef.q.Par2.F[[1]] + Coef.q.Par2.F[[2]]*(Exp.G-G.Hat) + Coef.q.Par2.F[[3]]*Gamma.Hat           

if (OPT.MSE ==3)   q.opt <-  Coef.q.Par3.F[[1]] + Coef.q.Par3.F[[2]]*(Exp.G-G.Hat) + Coef.q.Par3.F[[3]]*Exp.Gamma           

if (OPT.MSE ==4)  q.opt <- Coef.q.Par4.F[[1]] + Coef.q.Par4.F[[2]]*(Exp.Gamma-Gamma.Hat) 

if (OPT.MSE ==5)  q.opt <- Coef.q.Par5.F[[1]] + Coef.q.Par5.F[[2]]*(Exp.G-G.Hat) +                                       Coef.q.Par5.F[[3]]*(Exp.Gamma-Gamma.Hat) 

if (OPT.MSE ==6)   q.opt <-  Coef.q.Par6.F[[1]] + Coef.q.Par6.F[[2]] * Gamma.Hat +                                      Coef.q.Par6.F[[3]] * Exp.Gamma

if (OPT.MSE ==7)   q.opt <-  Coef.q.Par7.F[[1]] + Coef.q.Par7.F[[2]]*G.Hat + Coef.q.Par7.F[[3]] * Exp.G + Coef.q.Par7.F[[4]] * Gamma.Hat + Coef.q.Par7.F[[5]] * Exp.Gamma

Output <- G.Hat - q.opt
Output
}
# Argument 
## y:            vector with the sample observations of the variable Y.
## w:            vector with the survey weights.
## N:            population size.
## R.boot:       number of bootstrap samples.
## R:            number of replications. 
## Value.T:      Total number of samples (Training + Validation group).
## Value.V:      number of samples in the Validation group.
## Distribution: Probabilistic distribution for the parametric bootstrap.
### 1 = Lognormal; 2 = Pareto; 3 = Dagum with p = 20; 4 = Dagum with p = 0.5.

# Usage
Gw.c.Bp(Y.n, w.n, N, Distribution = 2)

## [1] 0.4532425

6 Effect of the skewness on the bias using box plots

This section shows the effect of the skewness on the bias of the Gini index estimator \(\widehat{G}_n^c\) using box plots. We consider samples, with size \(n=50\), randomly selected from the most (Pareto) and the least (Gamma) skewed distributions from this study. Box plots are obtained using a total of 1000 estimates of \(\widehat{G}_n^c\).

PLOT.BOX <- function(R)
{
  VECTOR.G <- seq(0.2, 0.8, by = 0.2)
  LEN <- length(VECTOR.G)
  VECTOR.ALPHA.Par <- (1+VECTOR.G)/(2*VECTOR.G)
  VECTOR.K.Gam <- c()
  
  for (L in 1:LEN)
  {
    GINI.L <- VECTOR.G[L]
    SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=GINI.L)
    VECTOR.K.Gam <- c(VECTOR.K.Gam, SOL.root$root)
  }
  
  N <- c(50)
  MATRIX.Est.Par            <- array(dim=c(LEN, R, 1))
  MATRIX.Est.Gam            <- array(dim=c(LEN, R, 1))

for (L in 1:LEN)
    {
      ALPHA.Par <- VECTOR.ALPHA.Par[L]
      K.Gam <- VECTOR.K.Gam[L]
      Gini.Teo <-  VECTOR.G[L] 
      
      for (r in 1:R)
      {
        Y.Par <- rpareto(N, scale= 1, shape=ALPHA.Par)
        Y.Gam <- rgamma(n=N,shape=K.Gam,scale=1)
    
  #####################################
  ########## ESTIMATORS ###############
  #####################################
        
        #### Pareto
        MATRIX.Est.Par[L,r,1]   <-  Gn.c1 (Y.Par)# N*GINI.2013.Par/(N-1)
        
        #### Gamma 
        MATRIX.Est.Gam[L,r,1]       <- Gn.c1 (Y.Gam) 
      }
    } # for (L in 1:LEN)

  DATA <- data.frame(EST_PAR_0.2=MATRIX.Est.Par[1,,],
                     EST_GAM_0.2=MATRIX.Est.Gam[1,,],
                     EST_PAR_0.4=MATRIX.Est.Par[2,,],
                     EST_GAM_0.4=MATRIX.Est.Gam[2,,],
                     EST_PAR_0.6=MATRIX.Est.Par[3,,],
                     EST_GAM_0.6=MATRIX.Est.Gam[3,,],
                     EST_PAR_0.8=MATRIX.Est.Par[4,,],
                     EST_GAM_0.8=MATRIX.Est.Gam[4,,])
 
  boxplot(DATA, las=2,  ylab = "Estimation of the Gini index" ,
          at= c(1,2, 4,5, 7,8, 10,11), col=rep(c("snow2","blue"),4), xaxt="n",xlab="G")#,
  mtext("0.2", side=1, at= 1.5, line=1.15)
  mtext("0.4", side=1, at= 4.5, line=1.15)
  mtext("0.6", side=1, at= 7.5, line=1.15)
  mtext("0.8", side=1, at= 10.5, line=1.15)
  
  VECTOR.G<-c(0.2,0.4,0.6,0.8)
  LEN<-length(VECTOR.G)
  for (L in 1:LEN)
  {
    abline(h=VECTOR.G[L],lty=2)
  }
  legend("topleft",legend = c("Pareto", "Gamma"), fill = c("snow2", "blue"), bty = "n" )
} 
  
set.seed(123)  
R           <- 1000
PLOT.BOX(R)

7 Empirical measures for a given estimator

7.1 Infinite populations

Expressions of estimators of the Gini index for each value of the argument “estimator” in the function *inf.empirical.measures*.
estimator	Expression
1	\(\widehat{G}_{n}^{a}\)
2	\(\widehat{G}_{n}^{b1}\)
3	\(\widehat{G}_{n}^{c1}\)
4	\(\widehat{G}_{n}^{c.JO}\)
5	\(\widehat{G}_{n}^{c.Ba}\)
6	\(\widehat{G}_{n}^{c.Bm}\)
7	\(\widehat{G}_{n}^{c.Bam}\)
8	\(\widehat{G}_{n}^{c.Bp}\)

Probabilistic distributions for each value of the argument “Distribution” in functions *inf.empirical.measures* and *fin.empirical.measures*.
Distribution	Probabilistic distribution
1	Lognormal
2	Pareto
3	Dagum (\(p=20\))
4	Dagum (\(p=0.5\))
5	Weibull
6	Gamma

inf.empirical.measures <- function(G, estimator, Sample.size, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{

if ( (estimator == 8)&&((Distribution==5)||(Distribution==6))) stop("Give a distribution between 1 and 4 for estimator 8.")
  
if (Distribution == 1) SIGMA.Log <- qnorm((G + 1)/2)*sqrt(2)
if (Distribution == 2) ALPHA.Par <- (1 + G) / (2*G)
if (Distribution == 3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=20)
a.Dag <- SOL.root$root
}
if (Distribution == 4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=0.5)
a.Dag <- SOL.root$root
}
if (Distribution == 5) K.Wei <- 1/log2(1/(1-G))

if (Distribution == 6) 
{
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=G)
K.Gam <- SOL.root$root
}  

  
Sum <- 0
Sum2 <- 0
MSE <- 0
Sum.Ske <- 0

for (i in 1:R)
{

if (Distribution == 1) Sample <- rlnorm(Sample.size, meanlog = 5, sdlog = SIGMA.Log)
if (Distribution == 2) Sample <- rpareto(Sample.size, scale = 1, shape = ALPHA.Par)
if (Distribution == 3) Sample <- rdagum(Sample.size, shape1.a = a.Dag, shape2.p = 20)
if (Distribution == 4) Sample <- rdagum(Sample.size, shape1.a = a.Dag, shape2.p = 0.5)
if (Distribution == 5) Sample <- rweibull(n = Sample.size, shape = K.Wei, scale = 1)
if (Distribution == 6) Sample <- rgamma(n = Sample.size, shape = K.Gam, scale = 1)

if (estimator == 1) SOL <- Gn.a(Sample)
if (estimator == 2) SOL <- Gn.b1(Sample)
if (estimator == 3) SOL <- Gn.c1(Sample)
if (estimator == 4) SOL <- Gn.c.JO(Sample)
if (estimator == 5) SOL <- Gn.c.Ba(Sample,  R.boot = R.boot)
if (estimator == 6) SOL <- Gn.c.Bm(Sample,  R.boot = R.boot)
if (estimator == 7) SOL <- Gn.c.Bam(Sample, R.boot = R.boot)
if (estimator == 8) SOL <- Gn.c.Bp(Sample,  R.boot = R.boot, R = R, Value.T = Value.T, Value.V = Value.V, Distribution = Distribution)

Sum <- Sum + SOL 
Sum2 <- Sum2 + SOL^2
MSE <- MSE + (SOL - G)^2
Sum.Ske <- Sum.Ske + skewness(Sample)
  
}

Mean <- Sum/R
RB <- round(100*(Mean - G)/G, 2)
RRMSE <- round(100*sqrt(MSE/R)/G, 2)
Var <- Sum2/R - Mean^2
BR <- round((Mean - G)/sqrt(Var), 2)
Mean.Ske <- Sum.Ske/R

list(Expected.Gini = Mean, RB=RB, RRMSE = RRMSE, BR = BR, Expected.Skewness=Mean.Ske)  
}
# Simulation study for estimator = 1 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 1, Sample.size = 50, Distribution = 1)

## $Expected.Gini
## [1] 0.4101402
## 
## $RB
## [1] 2.54
## 
## $RRMSE
## [1] 10.78
## 
## $BR
## [1] 0.24
## 
## $Expected.Skewness
## [1] 1.960247

# Simulation study for estimator = 2 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 2, Sample.size = 50, Distribution = 1)

## $Expected.Gini
## [1] 0.3901402
## 
## $RB
## [1] -2.46
## 
## $RRMSE
## [1] 10.77
## 
## $BR
## [1] -0.24
## 
## $Expected.Skewness
## [1] 1.960247

# Simulation study for estimator = 3 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 1)

## $Expected.Gini
## [1] 0.3981022
## 
## $RB
## [1] -0.47
## 
## $RRMSE
## [1] 10.71
## 
## $BR
## [1] -0.04
## 
## $Expected.Skewness
## [1] 1.960247

# Simulation study for estimator = 3 (G = 0.4; n = 50; Pareto)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 2)

## $Expected.Gini
## [1] 0.3708247
## 
## $RB
## [1] -7.29
## 
## $RRMSE
## [1] 25.85
## 
## $BR
## [1] -0.29
## 
## $Expected.Skewness
## [1] 3.632704

# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 20)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 3)

## $Expected.Gini
## [1] 0.3828812
## 
## $RB
## [1] -4.28
## 
## $RRMSE
## [1] 20.13
## 
## $BR
## [1] -0.22
## 
## $Expected.Skewness
## [1] 3.047151

# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 0.5)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 4)

## $Expected.Gini
## [1] 0.3963076
## 
## $RB
## [1] -0.92
## 
## $RRMSE
## [1] 12.85
## 
## $BR
## [1] -0.07
## 
## $Expected.Skewness
## [1] 1.961689

# Simulation study for estimator = 3 (G = 0.4; n = 50; Weibull)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 5)

## $Expected.Gini
## [1] 0.3984262
## 
## $RB
## [1] -0.39
## 
## $RRMSE
## [1] 9.08
## 
## $BR
## [1] -0.04
## 
## $Expected.Skewness
## [1] 1.090686

# Simulation study for estimator = 3 (G = 0.4; n = 50; Gamma)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 6)

## $Expected.Gini
## [1] 0.3980721
## 
## $RB
## [1] -0.48
## 
## $RRMSE
## [1] 9.08
## 
## $BR
## [1] -0.05
## 
## $Expected.Skewness
## [1] 1.251534

7.2 Finite populations

Expressions of estimators of the Gini index for each value of the argument “estimator” in the function *fin.empirical.measures*.
estimator	Expression
1	\(\widehat{G}_{w}^{a}\)
2	\(\widehat{G}_{w}^{b}\)
3	\(\widehat{G}_{w}^{c}\)
4	\(\widehat{G}_{w}^{c.Ba}\)
5	\(\widehat{G}_{w}^{c.Bm}\)
6	\(\widehat{G}_{w}^{c.Bam}\)
7	\(\widehat{G}_{w}^{c.Bp}\)

fin.empirical.measures <- function(G, estimator, Sample.size, N, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{

if ( (estimator == 7)&&((Distribution==5)||(Distribution==6))) stop("Give a distribution between 1 and 4 for estimator 7.")

if (Distribution == 1) SIGMA.Log <- qnorm((G + 1)/2)*sqrt(2)
if (Distribution == 2) ALPHA.Par <- (1 + G) / (2*G)
if (Distribution == 3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=20)
a.Dag <- SOL.root$root
}
if (Distribution == 4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=0.5)
a.Dag <- SOL.root$root
}
if (Distribution == 5) K.Wei <- 1/log2(1/(1-G))

if (Distribution == 6) 
{
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=G)
K.Gam <- SOL.root$root
}  

Labels <- 1:N
Cor <- 0.7

Sum <- 0
Sum2 <- 0
MSE <- 0
Sum.Ske <- 0


for (i in 1:R)
{

if (Distribution == 1) 
{
Y.Log.FU <- rlnorm(N, meanlog =5, sdlog =  SIGMA.Log)   
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Log.FU))
Size.Variable.U <- 1+ Cor * Y.Log.FU  + rnorm(n=N, sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Log.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Log.FU)
}  


if (Distribution == 2) 
{
Y.Par.FU <- rpareto(N, scale= 1, shape=  ALPHA.Par)  
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Par.FU))
Size.Variable.U <- 1+ Cor * Y.Par.FU  + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Par.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Par.FU)
}

if (Distribution == 3)
{
Y.Dag.FU <- rdagum(N, shape1.a = a.Dag,   shape2.p = 20)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Dag.FU))
Size.Variable.U <- 1+ Cor * Y.Dag.FU  + rnorm(n=N, sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Dag.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Dag.FU)
}

if (Distribution == 4)
{
Y.Dag05.FU <- rdagum(N, shape1.a = a.Dag, shape2.p = 0.5)
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Dag05.FU))
Size.Variable.U <- 1 + Cor * Y.Dag05.FU  + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Dag05.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Dag05.FU)
}  

if (Distribution == 5) 
{
Y.Wei.FU <- rweibull(n = N, shape = K.Wei, scale = 1) 
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Wei.FU))
Size.Variable.U <- 1 + Cor * Y.Wei.FU  + rnorm(n = N, sd = Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Wei.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Wei.FU)
}
  
if (Distribution == 6)
{
Y.Gam.FU <- rgamma(n = N, shape = K.Gam, scale = 1)  
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Gam.FU))
Size.Variable.U <- 1 + Cor * Y.Gam.FU  + rnorm(n = N, sd = Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Gam.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Gam.FU)
}

if (estimator == 1) SOL <- Gw.a(Y.n, w)
if (estimator == 2) SOL <- Gw.b(Y.n, w)
if (estimator == 3) SOL <- Gw.c(Y.n, w)
if (estimator == 4) SOL <- Gw.c.Ba(Y.n, w,  R.boot = R.boot)
if (estimator == 5) SOL <- Gw.c.Bm(Y.n, w,  R.boot = R.boot)
if (estimator == 6) SOL <- Gw.c.Bam(Y.n, w, R.boot = R.boot)
if (estimator == 7) SOL <- Gw.c.Bp(Y.n, w,  N = N, R.boot = R.boot, R = R, Value.T = Value.T, Value.V = Value.V, Distribution = Distribution)

Sum <- Sum + SOL 
Sum2 <- Sum2 + SOL^2
MSE <- MSE + (SOL - G)^2
Sum.Ske <- Sum.Ske + skewness(Sample)
  
}

Mean <- Sum/R
RB <- round(100*(Mean - G)/G, 2)
RRMSE <- round(100*sqrt(MSE/R)/G, 2)
Var <- Sum2/R - Mean^2
BR <- round((Mean - G)/sqrt(Var), 2)
Mean.Ske <- Sum.Ske/R

list(Expected.Gini = Mean, RB=RB, RRMSE = RRMSE, BR = BR, Expected.Skewness=Mean.Ske)  
}
# Simulation study for estimator = 1 (G = 0.4; n = 50; Lognormal)

N <- 10000
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 1, Sample.size = 50, N = N, Distribution = 1)

## $Expected.Gini
## [1] 0.4113975
## 
## $RB
## [1] 2.85
## 
## $RRMSE
## [1] 9.16
## 
## $BR
## [1] 0.33
## 
## $Expected.Skewness
## [1] -0.002321214

# Simulation study for estimator = 2 (G = 0.4; n = 50; Lognormal)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 2, Sample.size = 50, N = N, Distribution = 1)

## $Expected.Gini
## [1] 0.3923017
## 
## $RB
## [1] -1.92
## 
## $RRMSE
## [1] 9.03
## 
## $BR
## [1] -0.22
## 
## $Expected.Skewness
## [1] -0.002321214

# Simulation study for estimator = 3 (G = 0.4; n = 50; Lognormal)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 1)

## $Expected.Gini
## [1] 0.3988507
## 
## $RB
## [1] -0.29
## 
## $RRMSE
## [1] 8.92
## 
## $BR
## [1] -0.03
## 
## $Expected.Skewness
## [1] -0.002321214

# Simulation study for estimator = 3 (G = 0.4; n = 50; Pareto)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 2)

## $Expected.Gini
## [1] 0.3929852
## 
## $RB
## [1] -1.75
## 
## $RRMSE
## [1] 19.2
## 
## $BR
## [1] -0.09
## 
## $Expected.Skewness
## [1] 0.004515528

# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 20)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 3)

## $Expected.Gini
## [1] 0.3936652
## 
## $RB
## [1] -1.58
## 
## $RRMSE
## [1] 13.67
## 
## $BR
## [1] -0.12
## 
## $Expected.Skewness
## [1] -0.01088615

# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 0.5)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 4)

## $Expected.Gini
## [1] 0.3982037
## 
## $RB
## [1] -0.45
## 
## $RRMSE
## [1] 10.81
## 
## $BR
## [1] -0.04
## 
## $Expected.Skewness
## [1] 0.001194609

# Simulation study for estimator = 3 (G = 0.4; n = 50; Weibull)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 5)

## $Expected.Gini
## [1] 0.4023592
## 
## $RB
## [1] 0.59
## 
## $RRMSE
## [1] 9.28
## 
## $BR
## [1] 0.06
## 
## $Expected.Skewness
## [1] 0.00450743

# Simulation study for estimator = 3 (G = 0.4; n = 50; Gamma)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 6)

## $Expected.Gini
## [1] 0.3993217
## 
## $RB
## [1] -0.17
## 
## $RRMSE
## [1] 8.92
## 
## $BR
## [1] -0.02
## 
## $Expected.Skewness
## [1] 0.0034934

8 Applications to real data sets

In this section, we use the data set ES-SILC, with size \(n=26\), to compute estimators of the Gini index. The various real data sets described in the manuscript can be loaded using the file Datasets.RData.

# ES-SILC with n = 26. 
SILCn26<- c(6974.50, 10539.60,  9157.30,  25193.40, 23033.40, 19379.80, 30839.70,
71958.30, 227173.60, 74166.80, 26155.50, 132202.10, 16219.50, 45297.20, 19217.74,
23238.30,  23070.50, 38165.50, 15111.90,  67500.70, 28608.10,  2291.40, 28934.00,
 9375.90,  15858.00, 22555.10)

# Sample size (n)
length(SILCn26)

## [1] 26

# Estimation of coefficient of skewness
skewness(SILCn26)

## [1] 2.807518

# Estimation of the Gini index using Gn.c
Gn.c(SILCn26)

## [1] 0.5178801

# Estimation of the Gini index using Gn.c.JO
Gn.c.JO(SILCn26)

## [1] 0.5343978

# Kolmogorov-Smirnov (KS) p-value for the Lognormal distribution
lSILCn26 <- log(SILCn26) 
mean.MLE <- mean(lSILCn26)
sd.MLE <- sqrt(mean(lSILCn26^2)-mean.MLE^2) 
ks.test(SILCn26, "plnorm", mean.MLE, sd.MLE)

## 
##  Exact one-sample Kolmogorov-Smirnov test
## 
## data:  SILCn26
## D = 0.13826, p-value = 0.6527
## alternative hypothesis: two-sided

# Estimation of the Gini index using Gn.c.Bp (Lognormal distribution)
set.seed(123)
Gn.c.Bp(SILCn26, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution = 1)

## [1] 0.525814

R Codes for the article “Exploring and correcting the bias in the estimation of the Gini measure of income inequality”.

J.F. Muñoz; P.J. Moya; E. Álvarez