The aim of this document is to describe how to reproduce the results derived in the article “Exploring and correcting the bias in the estimation of the Gini measure of income inequality”, published in the journal Sociological Methods & Research. First (Section 2), we describe (load) the various R packages required for the computation of the estimators described in this article. In Section 3, we compute the parameters of the probabilistic distributions considered in simulation studies. Section 4 gives examples on how samples can be selected for the various scenarios. For both infinite and finite populations, codes for computing the various estimators of the Gini index are provided in Section 5. Box plots to investigate the effect of the skewness on the bias of the estimation of the Gini index can be seen in Section 6. For the various estimators defined in this document, Section 7 provides codes for computing empirical measures related to such estimators. In particular, functions inf.empirical.measures and fin.empirical.measures gives the: (i) Relative Bias (RB); (ii) Relative Root Mean Square Error (RRMSE); (iii) Bias Ratio (BR); (iv) Expected value based on estimates of the Gini index and; (v) Expected value based on estimates of the coefficient of skewness. Some examples are included, and they indicate how to carry out simulation studies. Finally, various estimators of the Gini index are computed using the data set ES-SILC, with size \(n=26\). The various real data sets can be loaded using the file Datasets.RData.
library(moments, warn.conflicts=FALSE)
library(stats4, warn.conflicts=FALSE)
library(splines, warn.conflicts=FALSE)
library(VGAM, warn.conflicts=FALSE)
library(boot, warn.conflicts=FALSE)
library(sampling, warn.conflicts=FALSE)
Simulation studies are based on different Gini values (\(G=\{0.1,0.2,\ldots,0.8\}\)) for various continuous probabilistic distributions (Pareto, Dagum, Lognormal, Weibull and Gamma). In this section, we can compute the parameters used in simulation studies for each continuous probabilistic distribution and for each Gini index.
# Vector with Gini values between 0.1 and 0.8
VECTOR.G <- seq(0.1, 0.8, by = 0.1)
LEN <- length(VECTOR.G)
###########################
### PARETO DISTRIBUTION ###
###########################
VECTOR.ALPHA.Par <- (1+VECTOR.G)/(2*VECTOR.G)
VECTOR.ALPHA.Par
## [1] 5.500000 3.000000 2.166667 1.750000 1.500000 1.333333 1.214286 1.125000
##############################
### LOGNORMAL DISTRIBUTION ###
##############################
VECTOR.SIGMA.Log <- qnorm( (VECTOR.G+1)/2)*sqrt(2)
VECTOR.SIGMA.Log
## [1] 0.1777120 0.3582869 0.5449254 0.7416143 0.9538726 1.1902322 1.4657382
## [8] 1.8123876
###################################################
### DAGUM DISTRIBUTIONS (Dagum-p20; Dagum-p0.5) ###
###################################################
GINI.DAGUM <- function(p,GINI,a)
{
gamma(p)*gamma(2*p+1/a)/(gamma(2*p)*gamma(p+1/a)) -1 - GINI
}
## p = 20
VECTOR.a <- c()
for (i in 1:LEN)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 16), GINI=VECTOR.G[i], p=20)
VECTOR.a <- c(VECTOR.a, SOL.root$root)
}
VECTOR.a.Dag20 <- VECTOR.a
VECTOR.a.Dag20
## [1] 7.386969 3.852848 2.671827 2.079358 1.722419 1.483396 1.311813 1.182504
## p = 0.5
VECTOR.a <- c()
for (i in 1:LEN)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 16), GINI=VECTOR.G[i], p=0.5)
VECTOR.a <- c(VECTOR.a, SOL.root$root)
}
VECTOR.a.Dag0.5 <- VECTOR.a
VECTOR.a.Dag0.5
## [1] 13.381436 6.461639 4.162633 3.018566 2.336341 1.884859 1.565110
## [8] 1.327546
############################
### WEIBULL DISTRIBUTION ###
############################
VECTOR.K.Wei <- 1/log2(1/(1-VECTOR.G))
VECTOR.K.Wei
## [1] 6.5788135 3.1062837 1.9433582 1.3569154 1.0000000 0.7564708 0.5757166
## [8] 0.4306766
##########################
### GAMMA DISTRIBUTION ###
##########################
GINI.GAMMA <- function(k,GINI)
{
gamma((2*k+1)/2)/(k*gamma(k)*sqrt(pi)) - GINI
}
VECTOR.K.Gam <- c()
for (L in 1:LEN)
{
GINI.L <- VECTOR.G[L]
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=GINI.L)
VECTOR.K.Gam <- c(VECTOR.K.Gam, SOL.root$root)
}
VECTOR.K.Gam
## [1] 31.5800009 7.7038277 3.2780415 1.7241161 0.9999945 0.6021600 0.3583336
## [8] 0.1968058
This section shows how to select a sample for each Gini index and each population based on the continuous probabilistic distributions used in simulation studies. We consider samples with size \(N=50\).
N <- 50
set.seed(123)
##############
### PARETO ###
##############
# G = 0.1
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[1])
## [1] 1.254320 1.044198 1.176522 1.022878 1.011222 1.753476 1.123090 1.020910
## [9] 1.114297 1.153188 1.008055 1.154701 1.073336 1.106681 1.511966 1.019377
## [17] 1.290360 1.779125 1.224735 1.008502 1.021510 1.069006 1.084370 1.001045
## [25] 1.079756 1.064652 1.117026 1.099286 1.253069 1.416889 1.006874 1.018868
## [33] 1.069595 1.042482 1.961161 1.143720 1.051551 1.320868 1.231467 1.304648
## [41] 1.424576 1.173633 1.174056 1.198824 1.407748 1.431943 1.303211 1.148947
## [49] 1.272258 1.028275
# G = 0.2
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[2])
## [1] 2.794343 1.312584 1.077700 2.016816 1.212530 1.691757 1.986677
## [8] 1.099029 1.037652 1.387385 1.145604 2.192814 1.375839 1.538912
## [15] 1.070725 1.306393 1.072738 1.071713 1.079769 1.314936 1.098462
## [22] 1.166986 1.120839 11.697485 1.281363 1.656206 1.380836 1.177336
## [29] 1.416562 2.079932 1.601140 1.143920 1.337818 1.082568 2.134247
## [36] 1.319895 1.005065 1.038424 1.040987 1.787628 1.970514 1.152585
## [43] 1.427855 1.150442 1.461441 1.746568 1.085284 2.202499 1.289129
## [50] 1.250403
# G = 0.3
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[3])
## [1] 1.265888 1.661562 1.391729 1.021738 1.399301 1.055066 1.042146 1.257461
## [9] 1.507920 2.422066 1.031353 1.739840 3.643655 1.025089 1.163270 2.459442
## [17] 1.318541 1.021927 1.280268 1.518508 1.221795 1.692407 1.722804 2.012376
## [25] 1.583320 1.007369 2.369885 3.022355 2.462538 1.186794 1.247555 1.054495
## [33] 1.200544 1.151192 1.350946 1.211536 1.094807 1.117364 1.009453 1.461569
## [41] 1.712610 1.509983 8.202119 2.185138 1.082176 1.965971 1.935573 3.271395
## [49] 1.911314 1.154772
# G = 0.4
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[4])
## [1] 1.099200 1.490210 1.717954 2.226304 3.510057 1.712698 1.376128 2.394906
## [9] 1.588790 2.388006 1.482103 1.810420 1.279123 1.752266 1.805931 1.431646
## [17] 1.187443 2.368740 1.658096 2.132709 1.302186 2.632301 1.087377 1.181767
## [25] 1.258989 1.316522 1.758917 1.437585 1.079514 1.362813 1.104937 1.944009
## [33] 1.217849 2.135783 1.346237 1.518735 2.135714 1.386330 1.053263 1.060793
## [41] 2.094761 1.912600 1.008299 1.314123 1.037685 1.546003 1.671826 1.268841
## [49] 2.930574 1.374849
# G = 0.5
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[5])
## [1] 2.598564 1.025908 1.403592 1.556366 1.834158 1.088756 1.961219 2.291736
## [9] 3.250403 3.231162 1.626584 2.500114 2.775596 1.300364 7.606916 1.267405
## [17] 2.006303 1.815059 1.140569 1.058038 2.322516 1.026801 1.235252 1.285165
## [25] 7.101278 1.856838 1.636095 1.471444 1.270539 1.060481 1.377769 1.759621
## [33] 1.504149 6.637470 2.449431 1.850812 2.946208 1.130515 3.497458 1.157103
## [41] 1.495434 1.316101 3.237097 1.356351 2.174459 1.239613 1.845278 1.020965
## [49] 1.022343 1.237169
# G = 0.6
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[6])
## [1] 2.768698 3.094187 1.479735 2.688321 1.607444 1.198745 3.809782
## [8] 1.971934 1.757260 1.111911 1.059606 1.098770 1.344048 1.039083
## [15] 1.641466 1.511436 2.264251 2.210285 18.785899 1.674731 1.109099
## [22] 44.715085 7.190222 3.876572 1.216160 1.259514 1.021626 1.771661
## [29] 7.020824 1.383273 1.230250 4.438206 2.001005 3.061148 8.465679
## [36] 2.003627 7.774536 3.051985 8.849769 1.349916 2.480962 5.593171
## [43] 7.201696 1.100208 1.235562 1.164101 1.013608 5.476225 5.664166
## [50] 1.183474
# G = 0.7
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[7])
## [1] 1.221157 46.564269 1.228264 1.296750 1.462763 1.827381 4.602868
## [8] 52.162617 1.921503 1.792514 2.173448 1.879833 1.320825 10.849017
## [15] 2.347571 1.198266 1.159284 3.264254 2.351923 1.135638 1.139057
## [22] 2.726046 4.848653 1.335157 6.459031 16.302171 1.000491 15.859315
## [29] 2.440828 1.075836 1.487887 2.801186 1.284570 1.161178 2.594089
## [36] 1.791695 1.345584 1.441452 1.436914 1.018614 2.064319 5.755721
## [43] 1.697291 3.414917 1.810345 2.266684 1.013923 2.179291 3.363660
## [50] 1.475959
# G = 0.8
rpareto(N, scale = 1, shape = VECTOR.ALPHA.Par[8])
## [1] 5.870260 1.029833 1.803506 5.013155 1.525310 1.012653 1.429910
## [8] 2.167141 2.728050 1.173529 5.605483 4.319450 1.101729 2.847553
## [15] 2.459657 1.241560 4.308276 35.968671 2.225355 1.908982 2.153761
## [22] 2.589925 1.135859 2.013252 1.747252 1.033276 1.254879 4.022934
## [29] 2.842082 1.026182 1.610788 1.275044 2.404386 1.262704 1.735947
## [36] 1.083219 4.474915 3.078709 8.106677 3.995558 1.020807 2.948282
## [43] 1.329294 1.239113 7.387902 3.561032 3.196511 7.670509 6.687684
## [50] 1.007855
##############################
### LOGNORMAL DISTRIBUTION ###
##############################
# G = 0.1
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[1])
## [1] 219.3689 187.3974 141.5822 163.4541 137.8777 136.3691 129.0056 133.5304
## [9] 199.0162 146.9950 151.5918 154.9816 184.7541 135.4076 124.4146 199.8948
## [17] 137.2220 130.5169 119.1400 118.1188 134.0212 165.6411 180.7714 168.2998
## [25] 139.1251 149.9974 130.9460 130.6526 173.6797 123.9052 210.0781 146.0500
## [33] 154.1808 130.1587 134.0113 117.4427 143.6661 159.8855 157.2179 129.1677
## [41] 129.0052 135.7416 193.6143 121.2540 143.7651 208.1112 145.7737 116.5523
## [49] 131.8761 161.7856
# G = 0.2
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[2])
## [1] 129.72624 121.35099 131.20736 153.30414 263.15015 143.77769 218.59841
## [8] 186.04544 142.49178 85.69387 123.13613 124.52243 150.94187 236.47540
## [15] 337.50495 258.39206 141.49916 79.09576 129.11523 153.23333 200.88992
## [22] 209.52879 189.64978 90.02537 201.22345 126.46991 158.00545 152.43080
## [29] 173.01989 149.73106 81.66010 193.22916 170.42721 134.93869 154.83029
## [36] 155.71451 160.64362 267.17228 137.21062 157.62450 225.56687 216.52354
## [43] 223.70602 120.67498 304.13247 152.00266 289.70657 91.46801 149.53316
## [50] 232.25317
# G = 0.3
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[3])
## [1] 100.50891 98.47874 88.99377 83.63472 116.95400 177.76598 49.52143
## [8] 166.58631 291.16662 450.48550 301.58258 224.16577 57.91988 106.93527
## [15] 122.50611 217.75445 140.10848 74.74859 371.62744 243.87122 168.91266
## [22] 288.23565 71.55510 212.74575 111.61459 215.42015 143.57486 209.54036
## [29] 307.27942 149.00391 258.39607 77.66402 100.16105 339.63121 182.29902
## [36] 48.50620 70.57679 133.03195 237.88449 140.39800 208.54099 250.28153
## [43] 368.92755 153.01331 144.26815 57.08928 156.66756 108.67745 87.29013
## [50] 134.55385
# G = 0.4
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[4])
## [1] 315.03898 33.85712 108.10723 161.82247 76.52247 190.11463 201.36563
## [8] 144.82120 23.83737 999.27516 127.45293 240.55151 181.82218 317.32051
## [15] 272.15941 127.02887 196.45917 73.61665 280.20077 105.43423 890.97264
## [22] 43.62186 105.20390 273.72215 216.65833 95.85460 70.86475 165.19808
## [29] 146.84673 39.34236 152.26517 170.89982 168.94608 67.86933 211.26373
## [36] 412.55410 208.16925 63.93272 107.43855 191.84263 91.84861 29.95986
## [43] 285.93745 80.22599 96.99307 452.73829 83.58658 277.88479 58.26838
## [50] 114.09898
# G = 0.5
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[5])
## [1] 138.35695 48.67938 81.00663 144.38579 281.39475 30.74074
## [7] 106.31209 305.36731 88.76964 184.34473 237.34703 191.61353
## [13] 276.75277 132.01976 100.02352 11.92662 135.82212 223.72928
## [19] 247.32475 87.38601 810.31170 195.04151 167.41726 499.48639
## [25] 74.78930 96.58605 1460.98204 149.99708 705.01289 37.63175
## [31] 123.75097 212.93100 197.59086 56.86905 151.16484 53.10537
## [37] 292.89909 417.68902 17.77238 482.36162 45.43132 229.01601
## [43] 278.51252 122.64949 80.20962 173.76350 225.55795 344.65996
## [49] 20.95405 31.15899
# G = 0.6
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[6])
## [1] 814.448236 515.806305 249.159912 347.661165 442.150838 6.251892
## [7] 556.400028 83.325295 195.290378 104.448455 418.987352 98.026405
## [13] 275.101428 93.222972 40.420194 626.527976 358.469300 1155.481448
## [19] 160.380370 566.237981 1558.076448 106.162499 30.734354 111.621766
## [25] 115.035552 177.777899 1139.153198 100.666498 231.357038 113.182675
## [31] 152.070023 215.681230 721.176460 171.468064 346.695841 375.036839
## [37] 440.888747 74.913726 1029.024296 94.308662 130.855319 789.299456
## [43] 692.466794 40.554900 52.501556 29.475831 184.277058 180.584516
## [49] 228.921926 286.344568
# G = 0.7
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[7])
## [1] 61.422898 34.587698 668.474356 446.241020 16.247792 129.093733
## [7] 39.915914 7.133637 184.940991 132.144547 128.674010 203.735590
## [13] 541.023579 200.607870 60.127449 50.550695 122.340925 233.785429
## [19] 32.333403 113.278735 612.629611 126.632569 53.319290 99.041271
## [25] 760.353079 332.360566 909.270752 181.977133 270.793317 65.460597
## [31] 360.436172 70.657736 18.500966 179.039092 2571.149555 480.069415
## [37] 818.895397 251.134468 60.825829 110.340149 99.433574 74.664806
## [43] 416.599303 25.661162 528.406512 526.694809 25.613856 378.921031
## [49] 5229.526052 65.579819
# G = 0.8
rlnorm(N, meanlog = 5, sdlog = VECTOR.SIGMA.Log[8])
## [1] 686.2856733 35.9578375 1111.0264613 233.4059532 2962.8620237
## [6] 10.5463469 135.2369394 57.1116054 103.8014144 47.4159794
## [11] 32.7450639 423.6337388 20.6741100 2185.5860040 17.2900966
## [16] 178.2513443 389.9362160 429.8308916 85.8943529 171.4151567
## [21] 847.4088092 10.5943329 35.9879671 265.2554126 66.2796330
## [26] 1777.5268327 502.8022526 169.1513925 9.6538744 155.6023516
## [31] 83.6408304 123.2861958 17.4363429 366.4131998 22.5787280
## [36] 98.4942415 296.2759980 35.8723103 426.9241648 13.6531855
## [41] 0.9117397 344.7095654 680.8785597 88.4057752 370.0625980
## [46] 18.2658139 117.8670673 4.3982915 1262.3844179 4319.4328661
###################################################
### DAGUM DISTRIBUTIONS (Dagum-p20; Dagum-p0.5) ###
###################################################
##############
### p = 20 ###
##############
# G = 0.1
rdagum(N, shape1.a = VECTOR.a.Dag20[1], shape2.p = 20)
## [1] 1.934100 1.999221 1.565946 1.740252 1.564476 2.750316 1.296750 1.371851
## [9] 1.816666 1.599844 1.522765 3.764328 1.431743 1.577666 1.310935 1.856300
## [17] 1.503112 1.839532 1.397558 2.220594 1.482537 1.586698 1.338849 1.995635
## [25] 2.270069 1.267034 1.568745 1.414497 1.934520 1.592451 1.172718 1.444238
## [33] 1.471059 1.631884 1.428291 1.270869 1.338059 1.726261 2.180958 1.463188
## [41] 1.310483 1.469120 1.658832 1.536567 1.838349 1.445216 1.619184 1.269502
## [49] 1.393104 1.749375
# G = 0.2
rdagum(N, shape1.a = VECTOR.a.Dag20[2], shape2.p = 20)
## [1] 3.343095 2.402142 2.513124 1.266558 1.809264 3.595768 1.688709 1.972871
## [9] 6.117798 2.533626 1.979263 2.615724 1.542557 1.813923 2.846333 1.647878
## [17] 2.631615 4.426759 1.700591 5.525498 1.518173 2.915847 2.301298 2.509952
## [25] 3.648202 2.486399 2.957576 2.634714 2.074344 2.200333 1.906477 3.502093
## [33] 2.597244 3.023765 2.502460 2.020821 2.950393 2.475872 2.128179 2.410012
## [41] 3.284712 2.431260 1.907994 3.471273 2.166066 2.062597 3.079869 2.943218
## [49] 3.413742 2.515851
# G = 0.3
rdagum(N, shape1.a = VECTOR.a.Dag20[3], shape2.p = 20)
## [1] 1.827992 3.444381 3.668354 5.840640 4.705788 3.721361 2.090140
## [8] 2.327102 4.396897 2.196068 2.540908 3.033911 5.986578 2.640941
## [15] 4.532001 2.225837 5.146224 2.695603 3.691028 1.882913 5.438946
## [22] 2.945687 7.633172 3.005944 12.213000 3.092684 3.453935 3.461858
## [29] 1.689784 4.141110 3.545908 2.892249 3.838975 4.013258 3.272407
## [36] 2.437295 4.604354 4.859770 5.950610 3.635355 2.837199 3.336382
## [43] 3.092568 2.677453 4.215081 9.734216 2.803539 4.189762 3.641976
## [50] 3.914570
# G = 0.4
rdagum(N, shape1.a = VECTOR.a.Dag20[4], shape2.p = 20)
## [1] 2.154015 1.768979 2.931933 2.153012 2.701539 4.827968 3.277703
## [8] 9.029012 3.519394 4.917869 6.287600 3.080179 9.667061 4.576977
## [15] 3.465628 3.801180 3.083919 5.668381 3.027063 3.186901 4.015419
## [22] 3.211999 4.389114 5.926149 6.718343 9.268503 4.207233 18.126665
## [29] 2.072623 5.914768 4.748740 4.100021 11.460590 2.595074 11.504310
## [36] 3.234313 3.371817 10.861729 2.488217 3.472772 45.889577 6.625602
## [43] 4.570092 4.226034 4.733167 8.709375 6.445157 4.764650 2.429294
## [50] 3.484467
# G = 0.5
rdagum(N, shape1.a = VECTOR.a.Dag20[5], shape2.p = 20)
## [1] 4.481721 4.461180 3.015303 6.938278 4.568389 11.952132 7.571603
## [8] 31.196559 3.263275 31.130630 10.878030 24.429327 8.573435 3.907060
## [15] 4.083030 11.093515 8.181566 5.127003 12.461700 5.330613 16.712610
## [22] 8.044621 6.062879 26.675608 6.548711 3.385656 6.578699 62.690883
## [29] 25.070752 6.260650 17.822232 5.025434 13.467091 13.578975 5.786307
## [36] 5.736570 9.139203 17.164662 9.355127 7.170561 3.808093 8.555389
## [43] 2.780511 6.952935 11.364102 11.811322 3.030148 6.333787 6.980384
## [50] 4.778310
# G = 0.6
rdagum(N, shape1.a = VECTOR.a.Dag20[6], shape2.p = 20)
## [1] 11.717066 16.626124 15.582026 4.335996 11.126205 20.487862
## [7] 5.690445 16.094212 24.166356 4.364804 63.892337 17.245349
## [13] 21.525980 5.202746 2.956698 27.444356 39.871946 4.591587
## [19] 9.125048 8.869202 15.013590 3.659717 16.487309 7.809087
## [25] 8.844057 16.115036 8.990650 1308.264189 25.077217 23.132749
## [31] 18.027833 4.281760 24.272262 21.783374 185.595052 3.388814
## [37] 17.192892 8.160934 11.296187 23.389435 2.616822 2.545767
## [43] 336.994132 18.269830 6.699648 21.622758 182.822857 5.328394
## [49] 13.070379 8.448077
# G = 0.7
rdagum(N, shape1.a = VECTOR.a.Dag20[7], shape2.p = 20)
## [1] 79.695180 284.460960 11.630310 7.123482 21.092462 14.586005
## [7] 15.609631 11.074311 10.898591 16.087735 11.524000 105.392427
## [13] 37.912473 6.993283 10.759430 10.755123 3.242754 9.886945
## [19] 10.809672 6.479797 21.742351 2.967250 23.641019 11.052755
## [25] 23.643954 2.985278 3.907498 10.719125 18.184316 6.944127
## [31] 35.807590 21.046174 54.059446 10.623996 10.550908 7.449339
## [37] 9.747652 7.144764 73.721019 12.150302 3.960835 27.525413
## [43] 8.901230 14.456929 19.923286 7.920044 4.074767 10.888873
## [49] 16.637448 167.101582
# G = 0.8
rdagum(N, shape1.a = VECTOR.a.Dag20[8], shape2.p = 20)
## [1] 239.738907 5.654808 16.854396 13.166714 13.001410 10.970494
## [7] 28.201205 10.109794 12.994141 625.028099 42.933849 10.811779
## [13] 48.513963 8.608495 16.967464 7.410756 6.368819 82.522460
## [19] 5.777743 64.299543 9.913184 3.945259 42.038180 12.066284
## [25] 3.862684 35.359906 4.030410 14.413078 9.434296 16.839161
## [31] 25.705635 5.539770 7.707205 6.025807 47.360584 23.788876
## [37] 24.004111 13.018523 14.357853 11.784019 43.242840 75.489695
## [43] 25.530595 21.558666 11.216910 5.345997 21.470980 7.660221
## [49] 34.716899 10.339975
###############
### p = 0.5 ###
###############
# G = 0.1
rdagum(N, shape1.a = VECTOR.a.Dag0.5[1], shape2.p = 0.5)
## [1] 0.9727338 0.7335687 0.8195123 1.0555434 1.0459976 0.6309126 1.0929187
## [8] 0.7363830 0.9612159 0.9099467 0.9421479 1.0100714 0.9100552 1.3248986
## [15] 1.2952169 0.8808015 0.9612218 0.8612678 0.8977689 0.8055791 0.4663359
## [22] 0.5722552 0.6548420 1.0258555 0.9097524 1.0746956 0.9511319 0.7803401
## [29] 0.9613020 1.1081536 1.0408901 0.9226820 0.8923931 0.9786244 1.1031283
## [36] 1.1069438 0.7747455 0.8206568 0.7230097 1.0424038 0.7191917 1.0606405
## [43] 1.0290085 1.1696007 0.9089766 1.1633217 1.0361167 1.0921827 0.8812677
## [50] 0.7090353
# G = 0.2
rdagum(N, shape1.a = VECTOR.a.Dag0.5[2], shape2.p = 0.5)
## [1] 0.6804281 1.0134724 0.7420027 0.7612641 0.3445502 0.8586150 0.5214153
## [8] 1.2202981 0.4482809 1.0751873 1.1659869 0.3610522 0.6789285 0.9465927
## [15] 1.0761262 0.7737018 1.3510091 0.7668778 0.7906180 0.6881549 0.9235229
## [22] 0.8849840 0.3692394 1.0350652 0.6178388 0.9579151 0.8592937 0.9399307
## [29] 0.9278793 0.6869170 0.6342019 0.8709575 2.1414623 0.7214802 0.7172168
## [36] 0.3553711 0.8615336 0.9124555 1.0319899 1.4249365 0.8364252 0.4638383
## [43] 0.6661738 1.2007763 1.1627696 1.1490510 1.3969979 0.9548968 0.8354059
## [50] 0.5992060
# G = 0.3
rdagum(N, shape1.a = VECTOR.a.Dag0.5[3], shape2.p = 0.5)
## [1] 1.00812089 0.30658617 0.73023289 1.59920019 1.24657573 1.01249873
## [7] 0.36787998 0.86529054 2.54112436 0.88789331 0.54606039 0.26661057
## [13] 0.28340793 1.32433235 0.63558901 1.66987795 0.82672276 0.78054325
## [19] 1.68752877 0.34698554 1.18624370 0.64962337 0.82498997 0.58752594
## [25] 0.30179074 1.11153977 0.45809874 0.93434137 0.70514557 0.46270976
## [31] 2.97010244 0.71629341 0.18971744 1.60661460 1.31664662 0.87226531
## [37] 0.57261280 1.16655494 1.72621723 1.26374941 0.37292545 1.92642192
## [43] 0.82546467 0.08111830 0.38483456 0.08454663 1.18197100 1.11544986
## [49] 1.56850181 0.64126730
# G = 0.4
rdagum(N, shape1.a = VECTOR.a.Dag0.5[4], shape2.p = 0.5)
## [1] 2.64429048 1.28083538 1.17619602 0.85976397 2.42906087 0.69482470
## [7] 1.57862451 0.57851697 0.63213737 1.90926658 0.95707929 0.07024996
## [13] 0.30651057 0.68982187 0.60773196 3.92807884 1.55175959 0.51608562
## [19] 0.99763865 1.67444362 0.94494825 0.88452836 0.28615608 0.82806270
## [25] 0.43379089 0.64425201 0.37543249 0.41340176 0.90754293 0.66103020
## [31] 0.30073250 0.43302699 1.01106889 0.84575210 0.26310036 0.17200366
## [37] 2.40142303 0.54140948 0.94693260 1.35763462 0.39808674 0.10883417
## [43] 0.81969965 1.32241415 0.53462853 0.50085076 0.89332385 2.23530785
## [49] 0.82106224 0.70762010
# G = 0.5
rdagum(N, shape1.a = VECTOR.a.Dag0.5[5], shape2.p = 0.5)
## [1] 0.205145089 1.483886391 0.273032189 0.915680665 0.813352977 0.077042569
## [7] 2.577843528 0.751930669 1.569038865 0.651609678 0.332283548 0.777158052
## [13] 1.262645419 0.958256401 0.597649098 0.577513880 0.814549449 0.555189770
## [19] 0.786447284 0.009999016 7.152134766 0.499888296 0.161415929 1.622979958
## [25] 0.684210562 1.707129720 0.060252010 1.705912774 1.038033650 1.173984238
## [31] 1.585151048 0.625919166 2.153711607 0.265825234 0.045579903 0.389939557
## [37] 0.895282698 0.388646330 2.468783155 1.204062036 0.417887401 0.599054616
## [43] 0.511970729 1.195252660 0.604310801 0.027928407 0.859903954 0.296404295
## [49] 1.104985620 0.366360685
# G = 0.6
rdagum(N, shape1.a = VECTOR.a.Dag0.5[6], shape2.p = 0.5)
## [1] 2.24549689 1.68666564 0.50052968 0.06088570 0.69315972 0.69656107
## [7] 10.02639989 0.33410550 0.88869577 2.89083070 0.68755130 0.95529985
## [13] 0.07780113 1.79497445 0.56010047 0.20931693 0.43131884 1.99127219
## [19] 0.55136998 0.90402410 0.27330906 1.44764362 0.23351269 0.37604161
## [25] 0.22224126 2.81653311 0.43488034 0.12299539 2.37351702 0.34896233
## [31] 1.68363817 0.12802551 0.36336691 0.23287587 0.03807255 1.28357697
## [37] 0.18439359 0.40427287 0.32026039 0.74123661 0.62663084 8.27935749
## [43] 0.24178958 1.09023910 5.10267869 0.65639567 0.40538317 1.49209763
## [49] 0.09900223 0.61714941
# G = 0.7
rdagum(N, shape1.a = VECTOR.a.Dag0.5[7], shape2.p = 0.5)
## [1] 0.40744409 0.20094774 0.09565062 2.08932425 0.61036020 0.58773416
## [7] 0.67502355 1.07898882 0.78880238 0.72655311 0.02314300 0.61670626
## [13] 0.66385003 0.76174295 0.74486691 0.04127712 0.03831556 0.93110678
## [19] 1.66436424 0.08665310 1.34493825 0.47315427 11.51737343 0.49411340
## [25] 0.61079786 0.31865059 0.52792947 0.03619845 0.56426139 4.91728273
## [31] 0.54446938 0.39777547 0.01243607 0.01519337 0.02031608 0.09853098
## [37] 0.44553446 1.48313542 0.20772140 1.19637838 0.53362928 0.21636674
## [43] 0.00563657 0.87217690 0.03169057 1.71596262 0.07856381 0.83558358
## [49] 1.72968757 0.18468494
# G = 0.8
rdagum(N, shape1.a = VECTOR.a.Dag0.5[8], shape2.p = 0.5)
## [1] 0.050202723 1.286797455 0.100030046 0.145467545 0.496169043
## [6] 0.393554157 0.253907554 1.362185758 0.550401117 0.061543420
## [11] 18.032871318 0.125438691 0.047057890 0.100540378 0.261790972
## [16] 0.237776645 0.177612392 0.009180611 4.296066191 0.573543421
## [21] 0.106676486 2.493426958 0.382658920 0.517176760 1.424339148
## [26] 0.783525205 0.055338492 0.196586197 5.439030722 0.082980927
## [31] 1.210658320 0.369990374 0.053279645 0.095020857 0.875592477
## [36] 0.051360685 0.307216117 0.357969890 0.709576419 0.131026182
## [41] 0.377734851 0.067953529 2.163357046 0.062350411 0.025992377
## [46] 2.817317537 0.175631235 1.220071169 0.288835657 0.353951419
############################
### WEIBULL DISTRIBUTION ###
############################
# G = 0.1
rweibull(n = N, shape = VECTOR.K.Wei[1], scale = 1)
## [1] 1.0966294 1.1054899 1.1027109 0.9398041 0.9487831 0.8955482 0.9674293
## [8] 1.1749425 1.2856239 1.0641976 0.7578675 1.0430092 0.9035614 0.8990655
## [15] 0.4788531 1.0795370 0.8254862 1.0544327 1.0168796 1.3009305 0.4041450
## [22] 0.9712539 0.3975329 1.0766885 1.1268528 1.0505634 0.8649325 0.9130004
## [29] 0.9801176 0.8392976 0.9143393 0.9128337 0.9073895 1.0636683 1.1325734
## [36] 0.8352050 0.8973416 0.6342538 0.6152625 0.8304620 0.9725724 1.0882420
## [43] 0.9723854 0.8200874 0.6867993 0.7024688 0.7853109 1.0751419 1.1819398
## [50] 0.9545080
# G = 0.2
rweibull(n = N, shape = VECTOR.K.Wei[2], scale = 1)
## [1] 0.8136433 1.3142550 0.3820811 1.2116704 0.4465724 1.2614551 0.7512971
## [8] 0.6344213 0.6547890 0.7425327 1.2807930 0.5554505 0.7902585 0.5743632
## [15] 1.2023125 1.6318224 1.0397293 1.0909560 0.5879152 0.4221470 1.2614298
## [22] 0.4616862 0.8011008 0.9720054 0.7485822 0.6517124 0.8706295 0.9769478
## [29] 0.3392674 1.4267671 1.2243610 0.7204882 0.6913638 1.2931057 0.9180110
## [36] 0.7694110 0.9747184 0.2242161 0.7370142 1.0472410 1.2237296 1.0603968
## [43] 1.3182296 1.5126216 1.0517346 0.6770933 0.3568441 0.9588193 0.8703490
## [50] 1.3760251
# G = 0.3
rweibull(n = N, shape = VECTOR.K.Wei[3], scale = 1)
## [1] 0.6723143 1.0204454 0.9232960 0.1830853 0.6112538 0.5658132 1.3408509
## [8] 1.2701405 0.3524256 0.4105959 0.5631601 1.1304970 1.2574065 0.8353249
## [15] 0.8015362 1.2382822 0.5101415 1.9628220 0.5743467 0.4057912 0.2576583
## [22] 0.6339090 1.1488608 1.2413480 0.3659594 1.9622461 0.3925628 0.7988883
## [29] 0.9093308 0.2567157 1.3549371 0.9848547 1.4212675 1.3909748 0.6167205
## [36] 0.7529012 0.8633020 0.4472539 0.1867102 0.8092558 0.5837988 1.3179036
## [43] 1.0182293 1.7593036 1.3838934 1.5384903 1.1319911 0.9729842 0.6671840
## [50] 0.5797363
# G = 0.4
rweibull(n = N, shape = VECTOR.K.Wei[4], scale = 1)
## [1] 1.08767967 0.08531036 0.47267741 1.55845215 0.99763332 1.14142591
## [7] 0.83343447 0.29914732 1.32079790 0.93323585 0.07321532 1.71802668
## [13] 0.56566356 0.46244297 0.62249070 0.01545715 2.29858491 0.03352232
## [19] 0.40564555 0.87045019 1.59067177 0.84916787 1.46587031 0.88868382
## [25] 0.30259220 2.24057092 1.08907627 0.27267993 2.36409405 0.34217588
## [31] 0.09177086 0.52219959 3.14912258 0.82007405 0.44490515 2.55395879
## [37] 0.26840425 0.40413231 1.23503922 0.40481684 0.50725091 1.51679215
## [43] 0.50986113 0.14412431 0.26073434 0.35690320 0.48738192 0.62224147
## [49] 0.93145184 1.69910932
# G = 0.5
rweibull(n = N, shape = VECTOR.K.Wei[5], scale = 1)
## [1] 0.198250392 0.333069372 0.237845025 3.999980221 1.745887038 0.513079882
## [7] 2.517775072 0.346277836 0.573288480 1.262987378 0.936348639 0.396413942
## [13] 1.819050501 2.220987563 0.628400750 2.271095143 2.370670320 0.446404075
## [19] 0.572625787 1.185685967 1.324225018 0.458073810 0.185416942 2.968980996
## [25] 0.007694491 3.812757347 1.715693890 1.112633909 0.156775786 1.690863212
## [31] 0.228964583 0.152158504 0.750734909 0.433551608 1.123801845 0.763274714
## [37] 0.059317737 1.404603765 3.796150271 0.033482504 1.346885754 0.424484001
## [43] 2.585196874 0.031236727 0.084970562 0.141387847 0.857128414 0.247139188
## [49] 1.204286545 1.239763075
# G = 0.6
rweibull(n = N, shape = VECTOR.K.Wei[6], scale = 1)
## [1] 7.091276e-05 3.227487e-03 6.642938e-01 1.783915e-01 1.151861e+00
## [6] 1.623692e+00 3.805460e-01 1.103691e+00 3.975339e+00 5.018143e-03
## [11] 2.724118e-01 8.021659e-02 7.113277e-01 2.618417e+00 2.833668e-01
## [16] 6.074188e+00 2.197768e+00 5.406257e-02 2.297640e+00 1.490302e+00
## [21] 1.575632e+00 1.223966e+00 1.721929e+00 5.215889e+00 4.890989e-02
## [26] 1.487526e-02 2.364223e-02 7.337406e-01 7.770753e-01 4.258597e+00
## [31] 1.029382e+00 5.346309e-02 9.886247e-01 6.083213e-01 5.536837e-01
## [36] 2.127760e+00 5.301881e-01 2.840456e+00 3.869279e-01 5.168959e-01
## [41] 1.679148e+00 4.998627e-02 1.887297e+00 8.232671e-01 6.257509e-02
## [46] 1.611400e+00 1.317970e+00 6.873731e-02 3.784992e-01 8.430330e-02
# G = 0.7
rweibull(n = N, shape = VECTOR.K.Wei[7], scale = 1)
## [1] 0.592237444 0.022239075 2.715738109 0.275744899 0.141712732
## [6] 2.410658677 0.370238616 0.202013052 0.013603781 1.221734016
## [11] 0.416263838 18.115657773 0.989494786 0.249838724 0.294747057
## [16] 0.061366518 3.742896404 0.212926524 2.265818190 0.335190559
## [21] 0.172629451 0.458262494 0.992276822 0.168623967 0.602747716
## [26] 1.943821263 0.663088937 5.034518570 1.697265365 0.400448759
## [31] 10.292985475 2.153920593 0.251700719 0.147862649 0.091564003
## [36] 5.308517199 4.298048456 0.006506546 0.010262356 0.012928163
## [41] 0.231678014 2.926650137 0.016503793 0.012746942 2.713051803
## [46] 5.300879991 0.375245898 1.011750414 0.643955080 0.629779910
# G = 0.8
rweibull(n = N, shape = VECTOR.K.Wei[8], scale = 1)
## [1] 3.994930e-02 2.219593e-02 1.724725e+00 6.827818e-02 4.386402e-01
## [6] 1.015844e+01 1.576384e+01 2.321477e+00 2.649634e+00 7.080428e-01
## [11] 3.033911e-01 8.015085e-02 2.283815e+00 5.966757e-03 4.792874e-01
## [16] 1.959981e+00 1.354731e+00 1.254570e-01 6.961699e-02 2.597738e-01
## [21] 8.656272e-03 7.700056e-01 2.904447e+00 5.179642e-01 9.298103e-01
## [26] 2.170066e+00 2.670368e-03 5.339343e-01 2.913978e+01 1.369539e-03
## [31] 9.587958e-01 3.860485e+01 1.091982e-04 9.408407e-01 5.153538e-05
## [36] 4.821076e+00 2.776404e-01 3.303197e-02 3.155741e+00 8.252536e-02
## [41] 2.707620e-01 3.447355e+01 2.038120e+00 5.174183e-01 2.515725e-03
## [46] 3.970787e-06 2.811218e+00 3.132991e-01 1.469905e-01 5.234601e-03
##########################
### GAMMA DISTRIBUTION ###
##########################
# G = 0.1
rgamma(n = N, shape = VECTOR.K.Gam[1], scale = 1)
## [1] 34.63168 27.00038 28.41345 34.69285 37.49449 25.81374 24.36601 32.74267
## [9] 32.21542 28.74403 31.02044 30.32568 29.55222 23.33838 33.67359 35.90658
## [17] 27.33807 20.18409 22.82648 36.04005 25.23121 44.93018 27.49963 26.50892
## [25] 26.54987 21.73886 22.46125 29.37303 31.00386 36.96266 27.16381 27.50749
## [33] 26.86216 34.19346 27.63495 34.21468 39.40432 33.27543 37.21970 34.64967
## [41] 33.54442 33.26967 38.69593 25.74359 27.22599 30.41632 28.67746 35.42144
## [49] 29.63288 23.65935
# G = 0.2
rgamma(n = N, shape = VECTOR.K.Gam[2], scale = 1)
## [1] 10.339464 8.392638 11.563390 7.183345 11.083696 4.725019 8.359876
## [8] 6.472208 5.081425 7.335643 9.957855 6.913409 5.376258 6.837318
## [15] 10.035225 5.928422 7.010587 6.899897 2.850106 9.238046 4.147690
## [22] 7.100980 9.870433 6.436940 8.102035 5.056679 9.397179 5.843702
## [29] 6.366496 4.763882 8.389102 9.049077 4.301848 9.036459 7.802253
## [36] 11.877998 4.717460 9.255811 5.332222 3.370841 5.496392 4.154450
## [43] 8.203489 5.722501 4.933585 19.177219 7.771466 9.665247 10.865428
## [50] 5.443321
# G = 0.3
rgamma(n = N, shape = VECTOR.K.Gam[3], scale = 1)
## [1] 3.7724631 3.4936209 1.8627591 2.3146208 2.9573218 1.2024368 4.3495636
## [8] 0.8930237 4.8122282 2.5500276 5.7443087 1.8925940 5.9788771 7.8752462
## [15] 2.0473102 3.4995891 1.5258269 7.6173220 3.8588901 3.3603524 5.5996564
## [22] 4.8323514 2.4401219 2.7720772 5.4246966 1.9633057 6.8477152 2.9644242
## [29] 5.0038969 4.2057250 1.1722392 4.1961260 2.4776547 3.6048484 3.6301791
## [36] 2.0193636 0.3352092 2.7242666 2.5338895 3.9088356 3.5717306 3.5257127
## [43] 3.5644615 3.7844592 3.9458804 5.7752217 3.9073178 7.1276106 3.1425242
## [50] 0.2409909
# G = 0.4
rgamma(n = N, shape = VECTOR.K.Gam[4], scale = 1)
## [1] 0.2452192 1.1300639 1.4169767 0.6778939 1.3884554 0.1456747 0.7931100
## [8] 3.6556269 0.9269505 3.1995700 5.1107374 0.9267827 0.2722695 0.5081242
## [15] 1.5333889 1.8513418 1.0987622 1.2593836 2.6347716 0.9748723 1.5692000
## [22] 2.0758283 0.6549810 0.6427581 2.3418768 1.1158532 2.0652245 0.4882630
## [29] 1.8105879 1.6982807 3.2677248 0.1650683 1.9411048 2.3447135 3.1456201
## [36] 0.7328015 1.0505221 3.2079448 2.2133208 1.7344136 0.4519682 0.5585969
## [43] 2.0264175 1.7377048 1.0109876 0.2001713 1.4213732 0.4779559 2.7281793
## [50] 0.1177665
# G = 0.5
rgamma(n = N, shape = VECTOR.K.Gam[5], scale = 1)
## [1] 0.04934946 1.68841500 0.62247764 2.05932206 1.05753278 1.82163078
## [7] 0.30121129 1.14435237 0.38105183 0.68435428 1.48151280 0.18310251
## [13] 0.18757649 0.80008880 1.36850923 1.55650540 1.66685735 0.25100687
## [19] 1.01360156 0.06175226 0.04457459 0.20745047 0.98830358 1.52219076
## [25] 2.69518643 0.57721706 0.28802751 0.60881026 1.94008078 0.04216691
## [31] 0.15510919 0.98421483 2.45598520 1.01911155 0.04535016 0.90372297
## [37] 0.58416714 1.14381197 1.70262039 0.83432134 2.19316628 0.77332109
## [43] 2.54619215 0.65016729 0.08349626 0.17352799 0.33650049 0.22392166
## [49] 0.92663822 3.57523533
# G = 0.6
rgamma(n = N, shape = VECTOR.K.Gam[6], scale = 1)
## [1] 2.136827874 0.544901731 1.006575221 0.311428476 1.673919302 0.132624177
## [7] 0.118647262 0.443093202 1.112380598 0.132001432 0.051687700 0.728891279
## [13] 0.298016872 0.119966452 0.457350248 0.006305465 0.001465703 1.100608969
## [19] 1.209050987 0.232051982 0.663074122 0.748821232 0.960546628 0.853658364
## [25] 0.215535316 0.007671702 0.326261216 1.118028600 2.352172826 1.006885791
## [31] 0.168457014 0.011183765 1.182786617 0.264811235 0.419009631 0.722529193
## [37] 2.354158315 0.276775206 0.001780794 0.195088566 0.029536470 0.349494818
## [43] 0.241708259 1.752095723 1.316467390 0.084276294 0.035481699 0.207074986
## [49] 2.230421413 1.902984531
# G = 0.7
rgamma(n = N, shape = VECTOR.K.Gam[7], scale = 1)
## [1] 4.213045e-02 7.914715e-05 3.792429e-03 8.387965e-01 5.361176e-02
## [6] 2.240461e-01 3.685452e-01 2.142289e-01 2.202561e-01 7.921302e-01
## [11] 5.441748e-05 3.149630e-04 3.033587e-02 6.241676e-02 8.919675e-01
## [16] 5.052141e-02 1.882529e-01 3.026228e-02 8.178127e-01 1.331303e+00
## [21] 7.849344e-01 1.222614e-01 9.343051e-01 4.295278e-01 2.088232e+00
## [26] 3.450247e-01 3.398099e-02 5.652823e-03 1.097582e-02 4.299467e-02
## [31] 1.339899e-01 2.092384e-01 2.682824e-02 2.166857e-02 5.332922e-03
## [36] 1.368756e-01 1.143787e+00 6.076660e-03 4.606802e-03 2.748152e-01
## [41] 7.144623e-02 7.678407e-02 8.603165e-03 2.511104e-04 2.357906e-01
## [46] 1.229911e-03 8.540354e-02 2.567427e-02 5.371906e-02 2.980774e-02
# G = 0.8
rgamma(n = N, shape = VECTOR.K.Gam[8], scale = 1)
## [1] 9.139057e-03 8.054584e-02 1.447501e-04 2.406610e-04 2.264403e-02
## [6] 1.543411e-04 3.775935e-01 1.365707e-01 8.561590e-03 1.427175e+00
## [11] 1.491474e-02 2.471447e-01 1.223016e-06 4.169344e-01 1.610903e-06
## [16] 1.032937e-02 3.749964e-10 1.129319e-02 6.263633e-05 3.903581e-03
## [21] 2.979187e-04 8.504979e-02 2.076813e-02 8.044114e-02 4.295114e-01
## [26] 6.369536e-01 4.033731e-02 3.204659e-02 2.851867e-01 5.099091e-01
## [31] 2.212640e-02 3.544844e-03 1.148871e-01 3.049286e-01 9.210711e-06
## [36] 3.295236e-04 3.987036e-02 7.126610e-06 1.247532e-02 1.401004e+00
## [41] 7.477153e-02 1.293020e-01 3.397923e-02 8.715096e-02 1.258967e-01
## [46] 9.072160e-04 1.804427e-04 9.305137e-02 1.876970e-01 8.002716e-03
For both infinite and finite populations, this section describes the expressions for the various estimators of the Gini index, and provides the corresponding R codes for computing such estimators.
Examples for infinite populations are based on this sample:
# Sample selection with size n = 50 from the pareto distribution. G = 0.5.
set.seed(123)
Y.n <- rpareto(n=50, scale= 1, shape=1.5)
Y.n
## [1] 2.295250 1.171846 1.814962 1.086477 1.041768 7.839706 1.530569
## [8] 1.078833 1.487090 1.686424 1.029854 1.694550 1.296274 1.450159
## [15] 4.553238 1.072905 2.546479 8.268442 2.102910 1.031529 1.081160
## [22] 1.277203 1.345811 1.003838 1.324933 1.258233 1.500487 1.414946
## [29] 2.286870 3.588377 1.025436 1.070943 1.279788 1.164801 11.818134
## [36] 1.636207 1.202389 2.774286 2.145609 2.651401 3.660281 1.798669
## [43] 1.801051 1.944332 3.504223 3.730162 2.640708 1.663793 2.417921
## [50] 1.107643
\[\widehat{G}_{n}^{a}=
\frac{1}{n\overline{y}}\sum_{i \in
S}[\{2\widehat{F}_{n}(y_{i})-1\}y_{i}]= \frac{1}{T}\sum_{i \in
S}[\{2\widehat{F}_{n}(y_{i})-1\}y_{i}],\]
where \(T=\sum_{i \in S}y_{i}\).
Gn.a <- function(y)
{
Funct.FDi <- function(Yi, Y) mean(Y<=Yi)
FDi <- sapply(y, Funct.FDi, Y=y)
Output <- (1/sum(y))*sum( (2*FDi-1)*y)
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.a(Y.n)
## [1] 0.3803843
\[\widehat{G}_{n}^{b}=\frac{1}{2n^{2}\overline{y}}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|= \frac{1}{2nT}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|.\]
Gn.b <- function(y)
{
n <- length(y)
Funct.ABSi <- function(Yi, Y) sum(abs(Y-Yi))
ABSi <- sapply(y, Funct.ABSi, Y=y)
Output <- sum(ABSi)/(2*n*sum(y))
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.b(Y.n)
## [1] 0.3603843
\[\widehat{G}_{n}^{b1} = \displaystyle \frac{2}{ n^{2}\overline{y} } \sum_{i \in S} i y_{(i)} - \frac{n+1}{n} =\frac{2}{ nT } \sum_{i=1}^{n} i y_{(i)} - \frac{n+1}{n}.\]
Gn.b1 <- function(y)
{
n <- length(y)
Output <- (2/(n*sum(y)))*sum((1:n)*sort(y))-(n+1)/n
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.b1(Y.n)
## [1] 0.3603843
\[\widehat{G}_{n}^{b2} = \displaystyle\frac{2\widehat{\beta}}{n} - \frac{n+1}{n}.\]
Gn.b2 <- function(y)
{
n <- length(y)
Beta <- sum((1:n)*sort(y))/sum(y)
Output <- 2*Beta/n - (n+1)/n
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.b2(Y.n)
## [1] 0.3603843
\[\widehat{G}_{n}^{b3} = \displaystyle \frac{2}{n\overline{y}} cov(i, y_{(i)}) = \frac{2}{T} cov(i, y_{(i)}).\]
Gn.b3 <- function(y)
{
n <- length(y)
Output <- (2/sum(y))*(mean((1:n)*sort(y))-mean(1:n)*mean(y))
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.b3(Y.n)
## [1] 0.3603843
\[\widehat{G}_{n}^{c}= \frac{n}{n-1}\widehat{G}_{n}^{b}=\frac{1}{2(n-1)T}\sum_{i \in S}\sum_{j \in S}|y_{i}-y_{j}|.\]
Gn.c <- function(y)
{
n <- length(y)
Funct.ABSi <- function(Yi, Y) sum(abs(Y-Yi))
ABSi <- sapply(y, Funct.ABSi, Y=y)
Output <- sum(ABSi)/(2*(n-1)*sum(y))
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.c(Y.n)
## [1] 0.3677391
\[\widehat{G}_{n}^{c1} = \displaystyle \frac{2}{ n(n-1)\overline{y} } \sum_{i \in S} i y_{(i)} - \frac{n+1}{n-1} =\frac{2}{ (n-1)T } \sum_{i=1}^{n} i y_{(i)} - \frac{n+1}{n-1}.\]
Gn.c1 <- function(y)
{
n <- length(y)
Output <- (2/((n-1)*sum(y)))*sum((1:n)*sort(y))-(n+1)/(n-1)
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.c1(Y.n)
## [1] 0.3677391
\[\widehat{G}_{n}^{c2} = \displaystyle \frac{1}{2\overline{y}}\binom{n}{2}^{-1}\sum_{1 \leq i_{1} < i_{2} \leq n} |y_{i_{1}}-y_{i_{2}}|\]
Gn.c2 <- function(y)
{
n <- length(y)
Funct.apply <- function(x, y) y[x]
Matrix.Y <- apply(combn(n,2),2,Funct.apply, y = y)
SUM <- sum(abs(Matrix.Y[1,]-Matrix.Y[2,]))
Output <-(1/(2*mean(y))) * SUM / choose(n,2)
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.c2(Y.n)
## [1] 0.3677391
\[\widehat{G}_{n}^{c3} =1 - \displaystyle \frac{\overline{z}}{\overline{y}}.\]
Gn.c3 <- function(y)
{
n <- length(y)
Funct.Min <- function(i, Y) sum(pmin(Y[-i],Y[i]))
zi <- sapply(1:n, Funct.Min, Y=y)/(n-1)
Output <- 1- mean(zi)/mean(y)
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.c3(Y.n)
## [1] 0.3677391
\[\widehat{G}_{n}^{b2.Jo}= n\widehat{G}_{n}^{b2}-(n-1)\bar{G}_{n}^{b2.J},\] where \[\bar{G}_{n}^{b2.J}=\frac{1}{n}\sum_{k=1}^{n}\widehat{G}_{n}^{b2}(k)\] and \[\widehat{G}_{n}^{b2}(k)=\widehat{G}_{n}^{b2} + \frac{2}{n\overline{y}-y_{(k)}}\left\{\frac{y_{(k)}\widehat{\beta}}{n} + \displaystyle\frac{\sum_{i =1}^{n}iy_{(i)}}{n(n-1)} - \frac{n\overline{y}-\sum_{i=1}^{k}y_{(i)}+ky_{(k)}}{n-1} \right\} - \frac{1}{n(n-1)}.\]
Gn.b2.JO <- function(y)
{
y <- sort(y)
n <- length(y)
SUM <- sum(y)
SUM.iy <- sum((1:n)*y)
Beta <- SUM.iy/SUM
G.b <- 2*Beta/n - (n+1)/n
G.k <- G.b + (2/(SUM-y))*(y*Beta/n + SUM.iy/(n*(n-1)) - (SUM - cumsum(y) + (1:n)*y)/(n-1) ) - 1/(n*(n-1))
G.bar <- mean(G.k)
Output <- n*G.b - (n - 1)*G.bar
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.b2.JO(Y.n)
## [1] 0.3742264
\[\widehat{G}_{n}^{c.Jo}= n\widehat{G}_{n}^{c}-(n-1)\bar{G}_{n}^{c.J},\] where \[\bar{G}_{n}^{c.J}=\frac{1}{n}\sum_{k=1}^{n}\widehat{G}_{n}^{c}(k),\] and \(\widehat{G}_{n}^{c}(k)\) is the estimator \(\widehat{G}_{n}^{c}\) computed from the observations \(\{y_{(i)}:i \in S\}\) after removing the \(k\)-th unit.
GINI.Jackk.c <- function(y)
{
N <- length(y)
Theta <- sum((1:N)*sort(y))/sum(y)
Output <- 2*Theta/(N-1) - (N+1)/(N-1)
Output
}
Gn.c.JO <- function(y)
{
y <- sort(y)
n <- length(y)
SUM <- sum(y)
SUM.iy <- sum((1:n)*y)
Beta <- SUM.iy/SUM
G.c <- 2*Beta/(n-1) - (n+1)/(n-1)
SOL.Jackk <- bootstrap::jackknife(y, GINI.Jackk.c)
Output <- G.c - SOL.Jackk$jack.bias
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
# Usage
Gn.c.JO(Y.n)
## [1] 0.3743616
\[\widehat{G}_{n}^{c.Ba}= \widehat{G}_{n}^{c}-(\bar{G}_{n}^{c.B}-\widehat{G}_{n}^{c})=2\widehat{G}_{n}^{c}-\bar{G}_{n}^{c.B},\] where \(\bar{G}_{n}^{c.B}\) is the empirical expected value of \(\widehat{G}_{n}^{c}\) based on \(B\) bootstrap samples. The empirical expected value based on \(B\) bootstrap samples of a given statistic \(\widehat{\theta}_{n}\) is defined by \[ E_{B}[\widehat{\theta}_{n}]=\bar{\theta}_{n}^{B}=\frac{1}{B}\sum_{b=1}^{B}\widehat{\theta}_{n}^{(b)}, \] where \(\widehat{\theta}_{n}^{(b)}\) is the statistic \(\widehat{\theta}_{n}\) evaluated at the \(b\)-th bootstrap sample \(S^{(b)}\).
# Computes 2 Statistics for Bootstrap used in various estimators
GINI.c.Ske.boot <- function(Y, indices)
{
Y <- Y[indices]
N.boot <- length(Y)
SUM.Y <- sum(Y)
Nk <- rank(Y)
Gini <- (2/(N.boot*SUM.Y))*sum(Nk*Y) - (N.boot+1)/N.boot
Gini <- N.boot*Gini/(N.boot-1)
c(Gini,skewness(Y))
}
Gn.c.Ba <- function(y, R.boot = 1000)
{
SOL.boot <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t <- mean(SOL.boot$t[,1])
Output <- 2*SOL.boot$t0[1] - MEAN.t
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.
# Usage
Gn.c.Ba(Y.n)
## [1] 0.3795352
\[\widehat{G}_{n}^{c.Bm}= \frac{\left(\widehat{G}_{n}^{c}\right)^2}{\bar{G}_{n}^{c.B}}.\]
Gn.c.Bm <- function(y, R.boot = 1000)
{
SOL.boot <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t <- mean(SOL.boot$t[,1])
Output <- SOL.boot$t0[1]^2/MEAN.t
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.
# Usage
Gn.c.Bm(Y.n)
## [1] 0.3814144
\[ \widehat{G}_{n}^{c.Bam} = \left\{ \begin{array}{ll} 2\widehat{G}_{n}^{c}-\bar{G}_{n}^{c.B} & \quad \mbox{if } \widehat{G}_{n}^{c} \geq \bar{G}_{n}^{c.B} \\ & \\ \widehat{G}_{n}^{c} \exp\left\{-\left(\bar{G}_{n}^{c.B} - \widehat{G}_{n}^{c}\right)/\bar{G}_{n}^{c.B} \right\} & \quad \mbox{otherwise} \end{array} \right. \]
Gn.c.Bam <- function(y, R.boot = 1000)
{
SOL.boot <- boot(y, GINI.c.Ske.boot, R = R.boot)
MEAN.t <- mean(SOL.boot$t[,1])
if (SOL.boot$t0[1] >= MEAN.t) Output <- 2*SOL.boot$t0[1] - MEAN.t
else Output <- SOL.boot$t0[1]*exp(-(MEAN.t - SOL.boot$t0[1])/MEAN.t)
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.
# Usage
Gn.c.Bam(Y.n)
## [1] 0.3820706
\[\widehat{G}_{n}^{c.Bp}= \widehat{G}_{n}^{c}-\widehat{q}_{opt}(\widehat{G}_{n}^{c}; \bar{G}_{n}^{c.B}; \widehat{\gamma}_{n}; \bar{\gamma}_{n}^{B}),\] where \[ \widehat{\gamma}_{n}=\displaystyle\frac{\mu_{n.3}}{\sigma_{n}^3} \] is the coefficient of skewness, \(\sigma_{n}=(\mu_{n.2})^{1/2}\) is the sample standard deviation, and \(\mu_{n.\alpha}=n^{-1}\sum_{i \in S}(y_{i}-\overline{y})^\alpha\) is the \(\alpha\)-th central moment based on \(S\).
Gn.c.Bp <- function(y, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
Num.Functions <- 7 # Number of bias functions.
n <- length(y)
SOL.Boot <- boot(y, GINI.c.Ske.boot, R = R.boot)
Values.Boot.t.G <- SOL.Boot$t[,1]
Exp.G <- mean(SOL.Boot$t[,1])
Exp.Gamma <- mean(SOL.Boot$t[,2])
G.Hat <- SOL.Boot$t0[1]
Gamma.Hat <- SOL.Boot$t0[2]
L.G <- quantile(Values.Boot.t.G, 0.005, names=F)
U.G <- quantile(Values.Boot.t.G, 0.995, names=F)
Vector.GT <- runif(Value.T, L.G, U.G)
TV <-Value.T - Value.V
POS.TV <- sample(Value.T, TV)
POS.V <-setdiff(1:Value.T, POS.TV)
MATRIX.T <- data.frame(Y=double(TV), Gt.HAT=double(TV), Gamma1t.HAT=double(TV), Exp.Gt=double(TV), Exp.Gamma1t=double(TV))
t <- 0
Vector.Y <- c()
for (tv in POS.TV)
{
t <- t + 1
Gt <- Vector.GT[tv]
if (Distribution==1) # LOGNORMAL
{
Sigmat <- qnorm( (Gt+1)/2)*sqrt(2)
Yr <- rlnorm(n, meanlog =5, sdlog = Sigmat)
}
if (Distribution==2) # PARETO
{
Alphat <- (1+Gt)/(2*Gt)
Yr <- rpareto(n, scale= 1, shape=Alphat)
}
if (Distribution==3) # Dagum p=20
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=20)
Parameter.at <- SOL.root$root
Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 20)
}
if (Distribution==4) # Dagum p=0.5
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=0.5)
Parameter.at <- SOL.root$root
Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 0.5)
}
SOL.boot.t <- boot(Yr, GINI.c.Ske.boot, R = R.boot)
Exp.G.t <- mean(SOL.boot.t$t[,1])
Exp.Gamma.t <- mean(SOL.boot.t$t[,2])
G.Hat.t <- SOL.boot.t$t0[1]
Gamma.Hat.t <- SOL.boot.t$t0[2]
MATRIX.T[t,2] <- G.Hat.t
MATRIX.T[t,3] <- Gamma.Hat.t
MATRIX.T[t,4] <- Exp.G.t
MATRIX.T[t,5] <- Exp.Gamma.t
Sum.Gr.Par <- 0
for (r in 1:R)
{
if (Distribution==1) Yr <- rlnorm(n, meanlog =5, sdlog = Sigmat)
if (Distribution==2) Yr <- rpareto(n, scale= 1, shape=Alphat)
if (Distribution==3) Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 20)
if (Distribution==4) Yr <- rdagum(n, shape1.a = Parameter.at, shape2.p = 0.5)
Gr.Par <- (2/(n*sum(Yr)))*sum(rank(Yr)*Yr) - (n+1)/n
Gr.Par <- n*Gr.Par/(n-1)
Sum.Gr.Par <- Sum.Gr.Par + Gr.Par
}
Exp.Gr.Par <- Sum.Gr.Par/R
Vector.Y <-c(Vector.Y, Exp.Gr.Par - Gt)
} # for (t in 1:Value.T)
MATRIX.T[,1] <- Vector.Y
# Training group
X1 <- MATRIX.T[,4]-MATRIX.T[,2]
X2 <- MATRIX.T[,5]-MATRIX.T[,3]
Func.q.Par1 <- lm(Y ~ X1, MATRIX.T)
Func.q.Par2 <- lm(Y ~ X1 + Gamma1t.HAT, MATRIX.T)
Func.q.Par3 <- lm(Y ~ X1 + Exp.Gamma1t, MATRIX.T)
Func.q.Par4 <- lm(Y ~ X2, MATRIX.T)
Func.q.Par5 <- lm(Y ~ X1 + X2, MATRIX.T)
Func.q.Par6 <- lm(Y ~ Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)
Func.q.Par7 <- lm(Y ~ Gt.HAT + Exp.Gt + Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)
Coef.q.Par1 <- Func.q.Par1$coefficients
Coef.q.Par2 <- Func.q.Par2$coefficients
Coef.q.Par3 <- Func.q.Par3$coefficients
Coef.q.Par4 <- Func.q.Par4$coefficients
Coef.q.Par5 <- Func.q.Par5$coefficients
Coef.q.Par6 <- Func.q.Par6$coefficients
Coef.q.Par7 <- Func.q.Par7$coefficients
SUM.MSE.Par <- double(Num.Functions)
for (i in 1:Value.V)
{
Gi <- Vector.GT[i]
if (Distribution==1)
{
Sigmai <- qnorm( (Gi+1)/2)*sqrt(2)
Yi <- rlnorm(n, meanlog =5, sdlog = Sigmai)
}
if (Distribution==2)
{
Alphai <- (1+Gi)/(2*Gi)
Yi <- rpareto(n, scale= 1, shape=Alphai)
}
if (Distribution==3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p=20)
Parameter.ai <- SOL.root$root
Yi <- rdagum(n, shape1.a = Parameter.ai, shape2.p = 20)
}
if (Distribution==4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p=0.5)
Parameter.ai <- SOL.root$root
Yi <- rdagum(n, shape1.a = Parameter.ai, shape2.p = 0.5)
}
SOL.boot.i <- boot(Yi, GINI.c.Ske.boot, R = R.boot)
Exp.G.i <- mean(SOL.boot.i$t[,1])
Exp.Gamma.i <- mean(SOL.boot.i$t[,2])
G.Hat.i <- SOL.boot.i$t0[1]
Gamma.Hat.i <- SOL.boot.i$t0[2]
qv1 <- Coef.q.Par1[[1]] + Coef.q.Par1[[2]]*(Exp.G.i-G.Hat.i)
qv2 <- Coef.q.Par2[[1]] + Coef.q.Par2[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par2[[3]]*Gamma.Hat.i
qv3 <- Coef.q.Par3[[1]] + Coef.q.Par3[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par3[[3]]*Exp.Gamma.i
qv4 <- Coef.q.Par4[[1]] + Coef.q.Par4[[2]]*(Exp.Gamma.i-Gamma.Hat.i)
qv5 <- Coef.q.Par5[[1]] + Coef.q.Par5[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par5[[3]]*(Exp.Gamma.i-Gamma.Hat.i)
qv6 <- Coef.q.Par6[[1]] + Coef.q.Par6[[2]]*Gamma.Hat.i + Coef.q.Par6[[3]]*Exp.Gamma.i
qv7 <- Coef.q.Par7[[1]] + Coef.q.Par7[[2]]*G.Hat.i + Coef.q.Par7[[3]]*Exp.G.i
+ Coef.q.Par7[[4]]*Gamma.Hat.i + Coef.q.Par7[[5]]*Exp.Gamma.i
Estv1 <- G.Hat.i - qv1
Estv2 <- G.Hat.i - qv2
Estv3 <- G.Hat.i - qv3
Estv4 <- G.Hat.i - qv4
Estv5 <- G.Hat.i - qv5
Estv6 <- G.Hat.i - qv6
Estv7 <- G.Hat.i - qv7
SUM.MSE.Par <- SUM.MSE.Par + c( (Estv1-Gi)^2, (Estv2-Gi)^2, (Estv3-Gi)^2,
(Estv4-Gi)^2, (Estv5-Gi)^2, (Estv6-Gi)^2, (Estv7-Gi)^2)
} # for (v in POS.V)
OPT.MSE.Par <- which.min(SUM.MSE.Par)
if (OPT.MSE.Par ==1) q.opt.Par <- Coef.q.Par1[[1]] + Coef.q.Par1[[2]]*(Exp.G-G.Hat)
if (OPT.MSE.Par ==2) q.opt.Par <- Coef.q.Par2[[1]] + Coef.q.Par2[[2]]*(Exp.G-G.Hat) + Coef.q.Par2[[3]]*Gamma.Hat
if (OPT.MSE.Par ==3) q.opt.Par <- Coef.q.Par3[[1]] + Coef.q.Par3[[2]]*(Exp.G-G.Hat) + Coef.q.Par3[[3]]*Exp.Gamma
if (OPT.MSE.Par ==4) q.opt.Par <- Coef.q.Par4[[1]] + Coef.q.Par4[[2]]*(Exp.Gamma-Gamma.Hat)
if (OPT.MSE.Par ==5) q.opt.Par <- Coef.q.Par5[[1]] + Coef.q.Par5[[2]]*(Exp.G-G.Hat) + Coef.q.Par5[[3]]*(Exp.Gamma-Gamma.Hat)
if (OPT.MSE.Par ==6) q.opt.Par <- Coef.q.Par6[[1]] + Coef.q.Par6[[2]]*Gamma.Hat +
Coef.q.Par6[[3]] * Exp.Gamma
if (OPT.MSE.Par ==7) q.opt.Par <- Coef.q.Par7[[1]] + Coef.q.Par7[[2]]*G.Hat +
Coef.q.Par7[[3]] * Exp.G +
Coef.q.Par7[[4]] * Gamma.Hat +
Coef.q.Par7[[5]] * Exp.Gamma
Output <- G.Hat - q.opt.Par
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
## R.boot: number of bootstrap samples.
## R: number of replications.
## Value.T: Total number of samples (Training + Validation group).
## Value.V: number of samples in the Validation group.
## Distribution: Probabilistic distribution for the parametric bootstrap.
### 1 = Lognormal; 2 = Pareto; 3 = Dagum with p = 20; 4 = Dagum with p = 0.5.
# Usage
Gn.c.Bp(Y.n, Distribution = 2)
## [1] 0.3864234
Examples for finite populations are based on this sample selected under randomized systematic sampling design. The effect of the design is increased by generating inclusion probabilities \(\pi_i\) with a correlation of \(0.7\) between \(\pi_i\) and \(y_i\)
# Population, sample selection ans survey weights (using SRSWOR). n = 50 and G = 0.5.
# Population and sample selection and survey weights
set.seed(123)
N <- 10000 # Population size.
n <- 50
Labels <- (1:N)
Cor <- 0.7
Y.N <- rpareto(n=N, scale= 1, shape=1.5)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.N))
Size.Variable.U <- 1 + Cor * Y.N + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.N[Sample]
Prob.S <- Prob.U[Sample]
w.n <- 1/Prob.S
n <- length(Sample)
Y.n
## [1] 1.256281 1.153969 1.274485 1.007599 3.253916 1.148925 17.619978
## [8] 1.390136 8.977209 1.711177 1.515922 1.216494 1.924554 2.066960
## [15] 1.182792 4.700734 1.277821 1.951636 2.607654 1.038222 1.283669
## [22] 1.898321 6.621856 16.672066 2.893945 2.263599 1.313010 1.392572
## [29] 1.014446 3.824858 2.699629 1.777259 1.270751 1.030797 2.119897
## [36] 2.272461 1.219897 3.254782 1.201556 1.134711 1.334193 1.555776
## [43] 1.571195 6.969796 1.004846 1.091881 2.716155 1.049775 1.205693
## [50] 1.074515
w.n
## [1] 201.2816 189.7074 166.6904 207.3157 138.3292 278.9388 140.5038 209.0983
## [9] 238.6331 205.3096 314.5301 219.3074 206.2540 189.6478 144.7267 262.5656
## [17] 186.2906 215.6584 239.8989 172.3857 169.0149 267.6195 157.5581 162.2901
## [25] 156.1198 253.0959 173.3194 142.1086 164.6035 205.1006 156.5991 197.0950
## [33] 151.5781 260.0406 224.7287 322.2452 179.0356 269.4481 298.5258 190.6057
## [41] 175.9704 191.0797 229.3220 228.9040 179.0971 147.8013 377.6547 249.1136
## [49] 307.9467 167.5910
\[\widehat{G}_{w}^{a}= \frac{2}{\widehat{N}\overline{y}_{w}}\sum_{i \in S}w_{i}y_{i}\widehat{F}_{w}(y_{i})-1.\]
Gw.a <- function(y,w)
{
N.Hat <- sum(w)
Funct.FDi <- function(Yi, Y, W, N.Hat) (1/N.Hat)*sum(W*(Y<=Yi))
FDi <- sapply(y, Funct.FDi, Y=y, W=w, N.Hat=N.Hat)
Mean.y <- (1/N.Hat)*sum(w*y)
Output <- (2/(N.Hat*Mean.y))*sum(w*y*FDi) - 1
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
# Usage
Gw.a(Y.n, w.n)
## [1] 0.4473034
\[\widehat{G}_{w}^{b}=\frac{1}{2\widehat{N}^{2}\overline{y}_{w}}\sum_{i \in S}\sum_{j \in S}w_{i}w_{j}|y_{i}-y_{j}|.\]
Gw.b <- function(y,w)
{
N.Hat <- sum(w)
Funct.ABSi <- function(Yi, Y, W) sum(W*abs(Y-Yi))
ABSi <- sapply(y, Funct.ABSi, Y=y, W=w)
Mean.y <- (1/N.Hat)*sum(w*y)
Output <- sum(w*ABSi)/(2*N.Hat^2*Mean.y)
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
# Usage
Gw.b(Y.n, w.n)
## [1] 0.4268247
\[\widehat{G}_{w}^{c} =1 - \displaystyle \frac{\overline{z}_{w}}{\overline{y}_{w}}.\]
# Function Gw.c
Gw.c <- function(y,w)
{
N.Hat <- sum(w)
n <- length(y)
Funct.Min <- function(i, Y, W, N.Hat) sum(W[-i]*pmin(Y[-i],Y[i]))/(N.Hat-W[i])
zi <- sapply(1:n, Funct.Min, Y=y, W=w, N.Hat=N.Hat)
Mean.z <- (1/N.Hat)*sum(w*zi)
Mean.y <- (1/N.Hat)*sum(w*y)
Output <- 1- Mean.z/Mean.y
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
# Usage
Gw.c(Y.n, w.n)
## [1] 0.4352065
\[\widehat{G}_{w}^{c.Ba}= \widehat{G}_{w}^{c}-(\bar{G}_{w}^{c.B}-\widehat{G}_{w}^{c})=2\widehat{G}_{w}^{c}-\bar{G}_{w}^{c.B}.\]
First, we create some functions require to compute various estimators for the case of finite populations. In particular, the function Rescaled.Bootstrap computes the estimates of the Gini index for a set of bootstrap samples.
MU.w <- function(y,w,ALPHA)
{
N.Hat <- sum(w)
Mean.Hat <- (1/N.Hat)*sum(w*y)
(1/N.Hat)*sum(w*(y-Mean.Hat)^ALPHA)
}
Ske.W <- function(y,w) MU.w(y,w,3)/(MU.w(y,w,2)^(3/2))
BOOTSTRAP.WEIGHTS <- function(n, Weight.S)
{
SAMPLE <- table(sample(n, n, replace = T))
Ri <- as.integer(SAMPLE)
UNITS <- as.integer(names(SAMPLE))
UNITS.OUT <- setdiff(1:n, UNITS)
LEN.OUT <- length(UNITS.OUT)
Ri2 <- c(Ri, rep(0,LEN.OUT))
UNITS2 <- c(UNITS,UNITS.OUT)
ORDER <- order(UNITS2)
Ri.order <- Ri2[ORDER]
OUTPUT <- Ri.order * Weight.S * n / (n-1)
OUTPUT
}
Rescaled.Bootstrap <- function(y, R.boot=1000, w)
{
n <- length(y)
Sum.Ske <- 0
Vector.Gini.B <- c()
for (i in 1:R.boot)
{
w.b <- BOOTSTRAP.WEIGHTS(n, w)
Vector.Gini.B <- c(Vector.Gini.B, Gw.c(y, w.b))
Sum.Ske <- Sum.Ske + Ske.W(y,w.b)
}
list(Mean.Gini =mean(Vector.Gini.B), Mean.Ske = Sum.Ske/R.boot, Vector.Gini.B = Vector.Gini.B)
}
# Usage
Rescaled.Bootstrap(Y.n, R.boot = 50, w= w.n)
## $Mean.Gini
## [1] 0.4421744
##
## $Mean.Ske
## [1] 3.387153
##
## $Vector.Gini.B
## [1] 0.3217424 0.4774353 0.4500154 0.3835550 0.4229068 0.2654164 0.4687924
## [8] 0.5061761 0.4460293 0.3617398 0.4529941 0.3588451 0.4760658 0.4765871
## [15] 0.4778994 0.4336448 0.4411065 0.4088126 0.4607407 0.5327704 0.4221788
## [22] 0.4470892 0.4895719 0.3985759 0.4257491 0.3859401 0.4804542 0.4818865
## [29] 0.4718366 0.5603562 0.4309820 0.2805123 0.4692137 0.4152673 0.3765406
## [36] 0.4918474 0.3956425 0.4542818 0.5026744 0.3778840 0.4168256 0.4680288
## [43] 0.4881710 0.4264974 0.4413882 0.5134525 0.4210026 0.5144043 0.5329497
## [50] 0.5042383
The function Gw.c.Ba gives the bias corrected estimator of the Gini index based on the additive bootstrap.
Gw.c.Ba <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Output <- 2*Gw.c(y, w) - SOL.ResBoot$Mean.Gini
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
## R.boot: Number of bootstrap samples.
# Usage
Gw.c.Ba(Y.n, w.n)
## [1] 0.4400799
\[\widehat{G}_{w}^{c.Bm}= \frac{\left(\widehat{G}_{w}^{c}\right)^2}{\bar{G}_{w}^{c.B}}.\]
Gw.c.Bm <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Output <- Gw.c(y, w)^2/SOL.ResBoot$Mean.Gini
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
## R.boot: number of bootstrap samples.
# Usage
Gw.c.Bm(Y.n, w.n)
## [1] 0.4405649
\[ \widehat{G}_{w}^{c.Bam} = \left\{ \begin{array}{ll} 2\widehat{G}_{w}^{c}-\bar{G}_{w}^{c.B} & \quad \mbox{if } \widehat{G}_{w}^{c} \geq \bar{G}_{w}^{c.B} \\ & \\ \widehat{G}_{w}^{c} \exp\left\{-\left(\bar{G}_{w}^{c.B} - \widehat{G}_{w}^{c}\right)/\bar{G}_{w}^{c.B} \right\} & \quad \mbox{otherwise} \end{array} \right. \]
Gw.c.Bam <- function(y, w, R.boot = 1000)
{
SOL.ResBoot <- Rescaled.Bootstrap(y, R.boot, w)
Gc <- Gw.c(y, w)
if (Gc >= SOL.ResBoot$Mean.Gini) Output <- 2*Gc - SOL.ResBoot$Mean.Gini
else Output <- Gc*exp(-(SOL.ResBoot$Mean.Gini - Gc)/SOL.ResBoot$Mean.Gini)
Output
}
# Arguments
## y: vector with the sample observations of the variable Y.
## w: vector with the sample survey weights.
## R.boot: number of bootstrap samples.
# Usage
Gw.c.Bam(Y.n, w.n)
## [1] 0.4429545
\[\widehat{G}_{w}^{c.Bp}= \widehat{G}_{w}^{c}-\widehat{q}_{opt}(\widehat{G}_{w}^{c}; \bar{G}_{w}^{c.B}; \widehat{\gamma}_{w}; \bar{\gamma}_{w}^{B}),\] where \[\widehat{\gamma}_{w}=\displaystyle\frac{\mu_{w.3}}{\sigma_{w}^3}\] is the weighted coefficient of skewness, \(\sigma_{w}=(\mu_{w.2})^{1/2}\) and \(\mu_{w.\alpha}=\widehat{N}^{-1}\sum_{i \in S}w_{i}(y_{i}-\overline{y}_{w})^\alpha\).
Gw.c.Bp <- function(y, w, N, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
Num.Functions <- 7 # Number of bias functions.
n <- length(y)
Labels <- (1:N)
Cor <- 0.7
Gamma.Hat <- Ske.W(y, w)
G.Hat <- Gw.c(y,w)
SOL.Boot <- Rescaled.Bootstrap(y, R.boot, w)
# Step 1. Generate T parameters from interval [G.L, G.U]
Values.Boot.t.G <- SOL.Boot$Vector.Gini.B
Exp.G <- SOL.Boot$Mean.Gini
Exp.Gamma <- SOL.Boot$Mean.Ske
L.G <- quantile(Values.Boot.t.G, 0.005, names=F)
U.G <- quantile(Values.Boot.t.G, 0.995, names=F)
Vector.GT <- runif(Value.T, L.G, U.G)
TV <-Value.T - Value.V
POS.TV <- sample(Value.T, TV)
POS.V <-setdiff(1:Value.T, POS.TV)
MATRIX.T <- data.frame(Y=double(TV), Gt.HAT=double(TV), Gamma1t.HAT=double(TV), Exp.Gt=double(TV), Exp.Gamma1t=double(TV))
t <- 0
Vector.Y.F <- c()
for (tv in POS.TV)
{
t <- t + 1
Gt <- Vector.GT[tv]
if (Distribution==1) # LOGNORMAL
{
Sigmat <- qnorm( (Gt+1)/2)*sqrt(2)
Yr.FU <- rlnorm(N, meanlog =5, sdlog = Sigmat)
}
if (Distribution==2) # PARETO
{
Alphat <- (1+Gt)/(2*Gt)
Yr.FU <- rpareto(N, scale= 1, shape=Alphat)
}
if (Distribution==3) # Dagum p=20
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=20)
Parameter.at <- SOL.root$root
Yr.FU <- rdagum(N, shape1.a = Parameter.at, shape2.p = 20)
}
if (Distribution==4) # Dagum p=0.5
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gt, p=0.5)
Parameter.at <- SOL.root$root
Yr.FU <- rdagum(N, shape1.a = Parameter.at, shape2.p = 0.5)
}
Residual.SD <- sqrt(1 -Cor^2)*sqrt(var(Yr.FU))
Size.Variable.U <- 1 + Cor * Yr.FU + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
## Step 4 (Expected values based on Bootstrap samples)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yr.F <- Yr.FU[Sample]
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.t <- length(Sample)
SOL.boot.t <- Rescaled.Bootstrap(Yr.F, R.boot, Weight.S)
Exp.G.t <- SOL.boot.t$Mean.Gini
Exp.Gamma.t <- SOL.boot.t$Mean.Ske
G.Hat.t <- Gw.c(Yr.F, Weight.S)
Gamma.Hat.t <- Ske.W(Yr.F, Weight.S)
MATRIX.T[t,2] <- G.Hat.t
MATRIX.T[t,3] <- Gamma.Hat.t
MATRIX.T[t,4] <- Exp.G.t
MATRIX.T[t,5] <- Exp.Gamma.t
# Step 5 (Expected values based on Replications)
Sum.Gr <- 0
for (r in 1:R)
{
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yr.F <- Yr.FU[Sample]
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.r <- length(Sample)
Gr.F <- Gw.c(Yr.F, Weight.S)
Sum.Gr <- Sum.Gr + Gr.F
}
rm(Yr.FU)
Exp.Gr <- Sum.Gr/R
Vector.Y.F <-c(Vector.Y.F, Exp.Gr - Gt)
} # for (t in 1:Value.T)
MATRIX.T[,1] <- Vector.Y.F
# Training group
X1.F <- MATRIX.T[,4]-MATRIX.T[,2]
X2.F <- MATRIX.T[,5]-MATRIX.T[,3]
Func.q.Par1.F <- lm(Y ~ X1.F, MATRIX.T)
Func.q.Par2.F <- lm(Y ~ X1.F + Gamma1t.HAT, MATRIX.T)
Func.q.Par3.F <- lm(Y ~ X1.F + Exp.Gamma1t, MATRIX.T)
Func.q.Par4.F <- lm(Y ~ X2.F, MATRIX.T)
Func.q.Par5.F <- lm(Y ~ X1.F + X2.F, MATRIX.T)
Func.q.Par6.F <- lm(Y ~ Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)
Func.q.Par7.F <- lm(Y ~ Gt.HAT + Exp.Gt + Gamma1t.HAT + Exp.Gamma1t, MATRIX.T)
Coef.q.Par1.F <- Func.q.Par1.F$coefficients
Coef.q.Par2.F <- Func.q.Par2.F$coefficients
Coef.q.Par3.F <- Func.q.Par3.F$coefficients
Coef.q.Par4.F <- Func.q.Par4.F$coefficients
Coef.q.Par5.F <- Func.q.Par5.F$coefficients
Coef.q.Par6.F <- Func.q.Par6.F$coefficients
Coef.q.Par7.F <- Func.q.Par7.F$coefficients
SUM.MSE <- double(Num.Functions)
for (i in 1:Value.V)
{
Gi <- Vector.GT[i]
if (Distribution==1)
{
Sigmai <- qnorm( (Gi+1)/2)*sqrt(2)
Yi.FU <- rlnorm(N, meanlog =5, sdlog = Sigmai)
}
if (Distribution==2)
{
Alphai <- (1+Gi)/(2*Gi)
Yi.FU <- rpareto(N, scale= 1, shape=Alphai)
}
if (Distribution==3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p = 20)
Parameter.ai <- SOL.root$root
Yi.FU <- rdagum(N, shape1.a = Parameter.ai, shape2.p = 20)
}
if (Distribution==4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=Gi, p = 0.5)
Parameter.ai <- SOL.root$root
Yi.FU <- rdagum(N, shape1.a = Parameter.ai, shape2.p = 0.5)
}
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Yi.FU))
Size.Variable.U <- 1 + Cor * Yi.FU + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, n)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Yi.F <- Yi.FU[Sample]
Prob.S <- Prob.U[Sample]
Weight.S <- 1/Prob.S
Sample.Size.i <- length(Sample)
rm(Yi.FU)
SOL.boot.i <- Rescaled.Bootstrap(Yi.F, R.boot, Weight.S)
Exp.G.i <- SOL.boot.i$Mean.Gini
Exp.Gamma.i <- SOL.boot.i$Mean.Ske
G.Hat.i <- Gw.c(Yi.F, Weight.S)
Gamma.Hat.i <- Ske.W(Yi.F, Weight.S)
qv1 <- Coef.q.Par1.F[[1]] + Coef.q.Par1.F[[2]]*(Exp.G.i-G.Hat.i)
qv2 <- Coef.q.Par2.F[[1]] + Coef.q.Par2.F[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par2.F[[3]]*Gamma.Hat.i
qv3 <- Coef.q.Par3.F[[1]] + Coef.q.Par3.F[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par3.F[[3]]*Exp.Gamma.i
qv4 <- Coef.q.Par4.F[[1]] + Coef.q.Par4.F[[2]]*(Exp.Gamma.i-Gamma.Hat.i)
qv5 <- Coef.q.Par5.F[[1]] + Coef.q.Par5.F[[2]]*(Exp.G.i-G.Hat.i) + Coef.q.Par5.F[[3]]*(Exp.Gamma.i-Gamma.Hat.i)
qv6 <- Coef.q.Par6.F[[1]] + Coef.q.Par6.F[[2]]*Gamma.Hat.i + Coef.q.Par6.F[[3]]*Exp.Gamma.i
qv7 <- Coef.q.Par7.F[[1]] + Coef.q.Par7.F[[2]]*G.Hat.i + Coef.q.Par7.F[[3]]*Exp.G.i + Coef.q.Par7.F[[4]]*Gamma.Hat.i + Coef.q.Par7.F[[5]]*Exp.Gamma.i
Estv1 <- G.Hat.i - qv1
Estv2 <- G.Hat.i - qv2
Estv3 <- G.Hat.i - qv3
Estv4 <- G.Hat.i - qv4
Estv5 <- G.Hat.i - qv5
Estv6 <- G.Hat.i - qv6
Estv7 <- G.Hat.i - qv7
SUM.MSE <- SUM.MSE + c( (Estv1-Gi)^2, (Estv2-Gi)^2, (Estv3-Gi)^2,
(Estv4-Gi)^2, (Estv5-Gi)^2, (Estv6-Gi)^2, (Estv7-Gi)^2)
} # for (v in POS.V)
OPT.MSE <- which.min(SUM.MSE)
if (OPT.MSE ==1) q.opt <- Coef.q.Par1.F[[1]] + Coef.q.Par1.F[[2]]*(Exp.G-G.Hat)
if (OPT.MSE ==2) q.opt <- Coef.q.Par2.F[[1]] + Coef.q.Par2.F[[2]]*(Exp.G-G.Hat) + Coef.q.Par2.F[[3]]*Gamma.Hat
if (OPT.MSE ==3) q.opt <- Coef.q.Par3.F[[1]] + Coef.q.Par3.F[[2]]*(Exp.G-G.Hat) + Coef.q.Par3.F[[3]]*Exp.Gamma
if (OPT.MSE ==4) q.opt <- Coef.q.Par4.F[[1]] + Coef.q.Par4.F[[2]]*(Exp.Gamma-Gamma.Hat)
if (OPT.MSE ==5) q.opt <- Coef.q.Par5.F[[1]] + Coef.q.Par5.F[[2]]*(Exp.G-G.Hat) + Coef.q.Par5.F[[3]]*(Exp.Gamma-Gamma.Hat)
if (OPT.MSE ==6) q.opt <- Coef.q.Par6.F[[1]] + Coef.q.Par6.F[[2]] * Gamma.Hat + Coef.q.Par6.F[[3]] * Exp.Gamma
if (OPT.MSE ==7) q.opt <- Coef.q.Par7.F[[1]] + Coef.q.Par7.F[[2]]*G.Hat + Coef.q.Par7.F[[3]] * Exp.G + Coef.q.Par7.F[[4]] * Gamma.Hat + Coef.q.Par7.F[[5]] * Exp.Gamma
Output <- G.Hat - q.opt
Output
}
# Argument
## y: vector with the sample observations of the variable Y.
## w: vector with the survey weights.
## N: population size.
## R.boot: number of bootstrap samples.
## R: number of replications.
## Value.T: Total number of samples (Training + Validation group).
## Value.V: number of samples in the Validation group.
## Distribution: Probabilistic distribution for the parametric bootstrap.
### 1 = Lognormal; 2 = Pareto; 3 = Dagum with p = 20; 4 = Dagum with p = 0.5.
# Usage
Gw.c.Bp(Y.n, w.n, N, Distribution = 2)
## [1] 0.4532425
This section shows the effect of the skewness on the bias of the Gini index estimator \(\widehat{G}_n^c\) using box plots. We consider samples, with size \(n=50\), randomly selected from the most (Pareto) and the least (Gamma) skewed distributions from this study. Box plots are obtained using a total of 1000 estimates of \(\widehat{G}_n^c\).
PLOT.BOX <- function(R)
{
VECTOR.G <- seq(0.2, 0.8, by = 0.2)
LEN <- length(VECTOR.G)
VECTOR.ALPHA.Par <- (1+VECTOR.G)/(2*VECTOR.G)
VECTOR.K.Gam <- c()
for (L in 1:LEN)
{
GINI.L <- VECTOR.G[L]
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=GINI.L)
VECTOR.K.Gam <- c(VECTOR.K.Gam, SOL.root$root)
}
N <- c(50)
MATRIX.Est.Par <- array(dim=c(LEN, R, 1))
MATRIX.Est.Gam <- array(dim=c(LEN, R, 1))
for (L in 1:LEN)
{
ALPHA.Par <- VECTOR.ALPHA.Par[L]
K.Gam <- VECTOR.K.Gam[L]
Gini.Teo <- VECTOR.G[L]
for (r in 1:R)
{
Y.Par <- rpareto(N, scale= 1, shape=ALPHA.Par)
Y.Gam <- rgamma(n=N,shape=K.Gam,scale=1)
#####################################
########## ESTIMATORS ###############
#####################################
#### Pareto
MATRIX.Est.Par[L,r,1] <- Gn.c1 (Y.Par)# N*GINI.2013.Par/(N-1)
#### Gamma
MATRIX.Est.Gam[L,r,1] <- Gn.c1 (Y.Gam)
}
} # for (L in 1:LEN)
DATA <- data.frame(EST_PAR_0.2=MATRIX.Est.Par[1,,],
EST_GAM_0.2=MATRIX.Est.Gam[1,,],
EST_PAR_0.4=MATRIX.Est.Par[2,,],
EST_GAM_0.4=MATRIX.Est.Gam[2,,],
EST_PAR_0.6=MATRIX.Est.Par[3,,],
EST_GAM_0.6=MATRIX.Est.Gam[3,,],
EST_PAR_0.8=MATRIX.Est.Par[4,,],
EST_GAM_0.8=MATRIX.Est.Gam[4,,])
boxplot(DATA, las=2, ylab = "Estimation of the Gini index" ,
at= c(1,2, 4,5, 7,8, 10,11), col=rep(c("snow2","blue"),4), xaxt="n",xlab="G")#,
mtext("0.2", side=1, at= 1.5, line=1.15)
mtext("0.4", side=1, at= 4.5, line=1.15)
mtext("0.6", side=1, at= 7.5, line=1.15)
mtext("0.8", side=1, at= 10.5, line=1.15)
VECTOR.G<-c(0.2,0.4,0.6,0.8)
LEN<-length(VECTOR.G)
for (L in 1:LEN)
{
abline(h=VECTOR.G[L],lty=2)
}
legend("topleft",legend = c("Pareto", "Gamma"), fill = c("snow2", "blue"), bty = "n" )
}
set.seed(123)
R <- 1000
PLOT.BOX(R)
estimator | Expression |
---|---|
1 | \(\widehat{G}_{n}^{a}\) |
2 | \(\widehat{G}_{n}^{b1}\) |
3 | \(\widehat{G}_{n}^{c1}\) |
4 | \(\widehat{G}_{n}^{c.JO}\) |
5 | \(\widehat{G}_{n}^{c.Ba}\) |
6 | \(\widehat{G}_{n}^{c.Bm}\) |
7 | \(\widehat{G}_{n}^{c.Bam}\) |
8 | \(\widehat{G}_{n}^{c.Bp}\) |
Distribution | Probabilistic distribution |
---|---|
1 | Lognormal |
2 | Pareto |
3 | Dagum (\(p=20\)) |
4 | Dagum (\(p=0.5\)) |
5 | Weibull |
6 | Gamma |
inf.empirical.measures <- function(G, estimator, Sample.size, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
if ( (estimator == 8)&&((Distribution==5)||(Distribution==6))) stop("Give a distribution between 1 and 4 for estimator 8.")
if (Distribution == 1) SIGMA.Log <- qnorm((G + 1)/2)*sqrt(2)
if (Distribution == 2) ALPHA.Par <- (1 + G) / (2*G)
if (Distribution == 3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=20)
a.Dag <- SOL.root$root
}
if (Distribution == 4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=0.5)
a.Dag <- SOL.root$root
}
if (Distribution == 5) K.Wei <- 1/log2(1/(1-G))
if (Distribution == 6)
{
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=G)
K.Gam <- SOL.root$root
}
Sum <- 0
Sum2 <- 0
MSE <- 0
Sum.Ske <- 0
for (i in 1:R)
{
if (Distribution == 1) Sample <- rlnorm(Sample.size, meanlog = 5, sdlog = SIGMA.Log)
if (Distribution == 2) Sample <- rpareto(Sample.size, scale = 1, shape = ALPHA.Par)
if (Distribution == 3) Sample <- rdagum(Sample.size, shape1.a = a.Dag, shape2.p = 20)
if (Distribution == 4) Sample <- rdagum(Sample.size, shape1.a = a.Dag, shape2.p = 0.5)
if (Distribution == 5) Sample <- rweibull(n = Sample.size, shape = K.Wei, scale = 1)
if (Distribution == 6) Sample <- rgamma(n = Sample.size, shape = K.Gam, scale = 1)
if (estimator == 1) SOL <- Gn.a(Sample)
if (estimator == 2) SOL <- Gn.b1(Sample)
if (estimator == 3) SOL <- Gn.c1(Sample)
if (estimator == 4) SOL <- Gn.c.JO(Sample)
if (estimator == 5) SOL <- Gn.c.Ba(Sample, R.boot = R.boot)
if (estimator == 6) SOL <- Gn.c.Bm(Sample, R.boot = R.boot)
if (estimator == 7) SOL <- Gn.c.Bam(Sample, R.boot = R.boot)
if (estimator == 8) SOL <- Gn.c.Bp(Sample, R.boot = R.boot, R = R, Value.T = Value.T, Value.V = Value.V, Distribution = Distribution)
Sum <- Sum + SOL
Sum2 <- Sum2 + SOL^2
MSE <- MSE + (SOL - G)^2
Sum.Ske <- Sum.Ske + skewness(Sample)
}
Mean <- Sum/R
RB <- round(100*(Mean - G)/G, 2)
RRMSE <- round(100*sqrt(MSE/R)/G, 2)
Var <- Sum2/R - Mean^2
BR <- round((Mean - G)/sqrt(Var), 2)
Mean.Ske <- Sum.Ske/R
list(Expected.Gini = Mean, RB=RB, RRMSE = RRMSE, BR = BR, Expected.Skewness=Mean.Ske)
}
# Simulation study for estimator = 1 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 1, Sample.size = 50, Distribution = 1)
## $Expected.Gini
## [1] 0.4101402
##
## $RB
## [1] 2.54
##
## $RRMSE
## [1] 10.78
##
## $BR
## [1] 0.24
##
## $Expected.Skewness
## [1] 1.960247
# Simulation study for estimator = 2 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 2, Sample.size = 50, Distribution = 1)
## $Expected.Gini
## [1] 0.3901402
##
## $RB
## [1] -2.46
##
## $RRMSE
## [1] 10.77
##
## $BR
## [1] -0.24
##
## $Expected.Skewness
## [1] 1.960247
# Simulation study for estimator = 3 (G = 0.4; n = 50; Lognormal)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 1)
## $Expected.Gini
## [1] 0.3981022
##
## $RB
## [1] -0.47
##
## $RRMSE
## [1] 10.71
##
## $BR
## [1] -0.04
##
## $Expected.Skewness
## [1] 1.960247
# Simulation study for estimator = 3 (G = 0.4; n = 50; Pareto)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 2)
## $Expected.Gini
## [1] 0.3708247
##
## $RB
## [1] -7.29
##
## $RRMSE
## [1] 25.85
##
## $BR
## [1] -0.29
##
## $Expected.Skewness
## [1] 3.632704
# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 20)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 3)
## $Expected.Gini
## [1] 0.3828812
##
## $RB
## [1] -4.28
##
## $RRMSE
## [1] 20.13
##
## $BR
## [1] -0.22
##
## $Expected.Skewness
## [1] 3.047151
# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 0.5)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 4)
## $Expected.Gini
## [1] 0.3963076
##
## $RB
## [1] -0.92
##
## $RRMSE
## [1] 12.85
##
## $BR
## [1] -0.07
##
## $Expected.Skewness
## [1] 1.961689
# Simulation study for estimator = 3 (G = 0.4; n = 50; Weibull)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 5)
## $Expected.Gini
## [1] 0.3984262
##
## $RB
## [1] -0.39
##
## $RRMSE
## [1] 9.08
##
## $BR
## [1] -0.04
##
## $Expected.Skewness
## [1] 1.090686
# Simulation study for estimator = 3 (G = 0.4; n = 50; Gamma)
set.seed(123)
inf.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, Distribution = 6)
## $Expected.Gini
## [1] 0.3980721
##
## $RB
## [1] -0.48
##
## $RRMSE
## [1] 9.08
##
## $BR
## [1] -0.05
##
## $Expected.Skewness
## [1] 1.251534
estimator | Expression |
---|---|
1 | \(\widehat{G}_{w}^{a}\) |
2 | \(\widehat{G}_{w}^{b}\) |
3 | \(\widehat{G}_{w}^{c}\) |
4 | \(\widehat{G}_{w}^{c.Ba}\) |
5 | \(\widehat{G}_{w}^{c.Bm}\) |
6 | \(\widehat{G}_{w}^{c.Bam}\) |
7 | \(\widehat{G}_{w}^{c.Bp}\) |
fin.empirical.measures <- function(G, estimator, Sample.size, N, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution)
{
if ( (estimator == 7)&&((Distribution==5)||(Distribution==6))) stop("Give a distribution between 1 and 4 for estimator 7.")
if (Distribution == 1) SIGMA.Log <- qnorm((G + 1)/2)*sqrt(2)
if (Distribution == 2) ALPHA.Par <- (1 + G) / (2*G)
if (Distribution == 3)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=20)
a.Dag <- SOL.root$root
}
if (Distribution == 4)
{
SOL.root <- uniroot(GINI.DAGUM, c(0.1, 1000), GINI=G, p=0.5)
a.Dag <- SOL.root$root
}
if (Distribution == 5) K.Wei <- 1/log2(1/(1-G))
if (Distribution == 6)
{
SOL.root <- uniroot(GINI.GAMMA, c(0.15,130), GINI=G)
K.Gam <- SOL.root$root
}
Labels <- 1:N
Cor <- 0.7
Sum <- 0
Sum2 <- 0
MSE <- 0
Sum.Ske <- 0
for (i in 1:R)
{
if (Distribution == 1)
{
Y.Log.FU <- rlnorm(N, meanlog =5, sdlog = SIGMA.Log)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Log.FU))
Size.Variable.U <- 1+ Cor * Y.Log.FU + rnorm(n=N, sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Log.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Log.FU)
}
if (Distribution == 2)
{
Y.Par.FU <- rpareto(N, scale= 1, shape= ALPHA.Par)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Par.FU))
Size.Variable.U <- 1+ Cor * Y.Par.FU + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Par.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Par.FU)
}
if (Distribution == 3)
{
Y.Dag.FU <- rdagum(N, shape1.a = a.Dag, shape2.p = 20)
Residual.SD <- sqrt(1-Cor^2)*sqrt(var(Y.Dag.FU))
Size.Variable.U <- 1+ Cor * Y.Dag.FU + rnorm(n=N, sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) +1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Dag.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Dag.FU)
}
if (Distribution == 4)
{
Y.Dag05.FU <- rdagum(N, shape1.a = a.Dag, shape2.p = 0.5)
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Dag05.FU))
Size.Variable.U <- 1 + Cor * Y.Dag05.FU + rnorm(n=N,sd=Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Dag05.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Dag05.FU)
}
if (Distribution == 5)
{
Y.Wei.FU <- rweibull(n = N, shape = K.Wei, scale = 1)
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Wei.FU))
Size.Variable.U <- 1 + Cor * Y.Wei.FU + rnorm(n = N, sd = Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Wei.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Wei.FU)
}
if (Distribution == 6)
{
Y.Gam.FU <- rgamma(n = N, shape = K.Gam, scale = 1)
Residual.SD <- sqrt(1 - Cor^2)*sqrt(var(Y.Gam.FU))
Size.Variable.U <- 1 + Cor * Y.Gam.FU + rnorm(n = N, sd = Residual.SD)
Size.Variable.U <- Size.Variable.U + abs(min(Size.Variable.U)) + 1
Prob.U <- inclusionprobabilities(Size.Variable.U, Sample.size)
Sample <- Labels[UPrandomsystematic(Prob.U)==1]
Y.n <- Y.Gam.FU[Sample]
Prob.S <- Prob.U[Sample]
w <- 1/Prob.S
Sample.Size <- length(Sample)
rm(Y.Gam.FU)
}
if (estimator == 1) SOL <- Gw.a(Y.n, w)
if (estimator == 2) SOL <- Gw.b(Y.n, w)
if (estimator == 3) SOL <- Gw.c(Y.n, w)
if (estimator == 4) SOL <- Gw.c.Ba(Y.n, w, R.boot = R.boot)
if (estimator == 5) SOL <- Gw.c.Bm(Y.n, w, R.boot = R.boot)
if (estimator == 6) SOL <- Gw.c.Bam(Y.n, w, R.boot = R.boot)
if (estimator == 7) SOL <- Gw.c.Bp(Y.n, w, N = N, R.boot = R.boot, R = R, Value.T = Value.T, Value.V = Value.V, Distribution = Distribution)
Sum <- Sum + SOL
Sum2 <- Sum2 + SOL^2
MSE <- MSE + (SOL - G)^2
Sum.Ske <- Sum.Ske + skewness(Sample)
}
Mean <- Sum/R
RB <- round(100*(Mean - G)/G, 2)
RRMSE <- round(100*sqrt(MSE/R)/G, 2)
Var <- Sum2/R - Mean^2
BR <- round((Mean - G)/sqrt(Var), 2)
Mean.Ske <- Sum.Ske/R
list(Expected.Gini = Mean, RB=RB, RRMSE = RRMSE, BR = BR, Expected.Skewness=Mean.Ske)
}
# Simulation study for estimator = 1 (G = 0.4; n = 50; Lognormal)
N <- 10000
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 1, Sample.size = 50, N = N, Distribution = 1)
## $Expected.Gini
## [1] 0.4113975
##
## $RB
## [1] 2.85
##
## $RRMSE
## [1] 9.16
##
## $BR
## [1] 0.33
##
## $Expected.Skewness
## [1] -0.002321214
# Simulation study for estimator = 2 (G = 0.4; n = 50; Lognormal)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 2, Sample.size = 50, N = N, Distribution = 1)
## $Expected.Gini
## [1] 0.3923017
##
## $RB
## [1] -1.92
##
## $RRMSE
## [1] 9.03
##
## $BR
## [1] -0.22
##
## $Expected.Skewness
## [1] -0.002321214
# Simulation study for estimator = 3 (G = 0.4; n = 50; Lognormal)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 1)
## $Expected.Gini
## [1] 0.3988507
##
## $RB
## [1] -0.29
##
## $RRMSE
## [1] 8.92
##
## $BR
## [1] -0.03
##
## $Expected.Skewness
## [1] -0.002321214
# Simulation study for estimator = 3 (G = 0.4; n = 50; Pareto)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 2)
## $Expected.Gini
## [1] 0.3929852
##
## $RB
## [1] -1.75
##
## $RRMSE
## [1] 19.2
##
## $BR
## [1] -0.09
##
## $Expected.Skewness
## [1] 0.004515528
# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 20)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 3)
## $Expected.Gini
## [1] 0.3936652
##
## $RB
## [1] -1.58
##
## $RRMSE
## [1] 13.67
##
## $BR
## [1] -0.12
##
## $Expected.Skewness
## [1] -0.01088615
# Simulation study for estimator = 3 (G = 0.4; n = 50; Dagum with p = 0.5)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 4)
## $Expected.Gini
## [1] 0.3982037
##
## $RB
## [1] -0.45
##
## $RRMSE
## [1] 10.81
##
## $BR
## [1] -0.04
##
## $Expected.Skewness
## [1] 0.001194609
# Simulation study for estimator = 3 (G = 0.4; n = 50; Weibull)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 5)
## $Expected.Gini
## [1] 0.4023592
##
## $RB
## [1] 0.59
##
## $RRMSE
## [1] 9.28
##
## $BR
## [1] 0.06
##
## $Expected.Skewness
## [1] 0.00450743
# Simulation study for estimator = 3 (G = 0.4; n = 50; Gamma)
set.seed(123)
fin.empirical.measures(G = 0.4, estimator = 3, Sample.size = 50, N = N, Distribution = 6)
## $Expected.Gini
## [1] 0.3993217
##
## $RB
## [1] -0.17
##
## $RRMSE
## [1] 8.92
##
## $BR
## [1] -0.02
##
## $Expected.Skewness
## [1] 0.0034934
In this section, we use the data set ES-SILC, with size \(n=26\), to compute estimators of the Gini index. The various real data sets described in the manuscript can be loaded using the file Datasets.RData.
# ES-SILC with n = 26.
SILCn26<- c(6974.50, 10539.60, 9157.30, 25193.40, 23033.40, 19379.80, 30839.70,
71958.30, 227173.60, 74166.80, 26155.50, 132202.10, 16219.50, 45297.20, 19217.74,
23238.30, 23070.50, 38165.50, 15111.90, 67500.70, 28608.10, 2291.40, 28934.00,
9375.90, 15858.00, 22555.10)
# Sample size (n)
length(SILCn26)
## [1] 26
# Estimation of coefficient of skewness
skewness(SILCn26)
## [1] 2.807518
# Estimation of the Gini index using Gn.c
Gn.c(SILCn26)
## [1] 0.5178801
# Estimation of the Gini index using Gn.c.JO
Gn.c.JO(SILCn26)
## [1] 0.5343978
# Kolmogorov-Smirnov (KS) p-value for the Lognormal distribution
lSILCn26 <- log(SILCn26)
mean.MLE <- mean(lSILCn26)
sd.MLE <- sqrt(mean(lSILCn26^2)-mean.MLE^2)
ks.test(SILCn26, "plnorm", mean.MLE, sd.MLE)
##
## Exact one-sample Kolmogorov-Smirnov test
##
## data: SILCn26
## D = 0.13826, p-value = 0.6527
## alternative hypothesis: two-sided
# Estimation of the Gini index using Gn.c.Bp (Lognormal distribution)
set.seed(123)
Gn.c.Bp(SILCn26, R.boot = 1000, R = 1000, Value.T = 200, Value.V = 140, Distribution = 1)
## [1] 0.525814