Sampling Procedure
The sample size for this survey was determined using the previous 2010 Household Income and Expenditure Survey (HIES) outputs, and especially the per capita monthly total expenditure. From the 2010 HIES the mean, standard deviation and standard error were computed (per capita expenditure) and from the 2016 Census the distribution of the population across the 6 provinces of Vanuatu was used as a base. According to the accuracy of this variable of interest within each province the sample size per province were adjusted in order to get an expected sampling error around 5% within each province.
The sampling frame used is the last 2016 Vanuatu census for the computation of the probability of selection of the Enumeration Areas (EAs) and the random selection method started with the random selection of EAs using the probability proportional to size. Then within each selected EAs 10 households were randomly selected using the sampling uniformed method.
Within each selected EA the household listing were updated by the team before random selection and interview.
i) The only variable considered is per capita total household expenditure (variable of interest), as in addition to being one of the main indicators derived from the Household Income and Expenditure Survey (HIES), it is likely highly correlated with many other variables of interest (e.g. poverty). From the 2010 HIES dataset, using this variable of interest, a list of relevant indicators were calculated, those indicators provide information on:
- (a)the status of the household expenditure distribution within each province,
- (b) The efficiency provided by the 2010 HIES sample design
- (c) The accuracy of the estimates calculated from the 2010 HIES dataset (especially the per capita household expenditure, our variable or interest)
ii) The original dataset has been trimmed using the variable of interest, the lowest and the highest percentiles (the 1% households with the lowest and highest per capita total household expenditure) were removed from the analysis (outliers). The dataset ends up with 4,289 households (given 4,377 households were completed).
iii) The 2010 Vanuatu HIES sample was based on a stratified multi stages selection
- Stratification: geographical provinces (by urban / rural locations)
- First stage of selection: Enumerations Areas (EAs) with probability of selection proportional to size
- Second stage: households, with uniform probability of selection within the EAs
iv) The mean and standard deviation indicate the status of the variable of interest within each strata. The intracluster correlation (p), and the design effect (DEFF) highlight the efficiency of the sampling strategy, and the standard error/relative standard error (SE/RSE) of the variable of interest show its accuracy.
v) The purpose of this analysis is to get some insights from the 2010 HIES sample design in order to improve the 2019 survey. There is no point to improve the sample size in strata where the sample is not efficient (the gain in accuracy will be minor compared to the related cost).
vi) The challenge in the 2019 Vanuatu baseline survey:
- Meet precision targets in each strata (provincial level) including Penama where Ambae island has been evacuated at the time of the sample design.
- Acceptable sample size (due to budget constraints)
- Following international recommendations (12 months of field operation)
- Enhance the monitoring and supervision of the field staff and simplify management of the logistics in the field
==> Optimize the variance/cost ratio of the survey design
vii) Table 1 from the Document Sample Design (provided as External Resources) presents the Vanuatu 2010 HIES survey specifications, efficiency and accuracy in each strata (for the variable of interest). It shows that some improvements can be done in Torba, and Shefa rural (where the RSE is higher than 5%), and it shows a high intraclass correlation in Malampa, Shefa rural and Tafea (that lead to a high design effect in those strata). In Torba, the high design effect comes from the high number of households interviewed in each selected EA (on average 33 households per selected EA in this strata were interviewed).
- Torba: the sample size is good, there is just a need to reduce the number of households to interview within each strata (and in order to keep a similar sample size the number of EAs to select in the province will be increased)
- Malampa: given the high intracluster correlation in this province, a higher number of EAs to select is required (with the same number of households per EA to interview).
- Shefa rural: keep the same number of households to interview within each EA, and increase the number of EA to select (this will lead to a higher sample size)
- Tafea: similar to Malampa province, the high intraclass correlation indicates that the number of EAs to select has to be increased (therefore the sample size as well).
The sample size has to be increased in Malampa, Shefa rural and Tafea, for the rest, the 2019 design will have to be similar as 2010 (in order to provide at least the same level of accuracy).
viii) The 2019 Vanuatu base line survey follows the international recommendations in terms of data collection schedule (12-month coverage) and considers a better management and supervision of the field staff. In this context, the field staff will work by team, given that:
- A team is made of 1 supervisor (team leader) and 2 or 3 interviewers
- Each interviewer will be responsible for 5 interview per round
- A round of survey is a 1 week period
- 1 EA is covered during 1 round, after the round completion, the team moves to the next EA for the next round.
- A team complete 32 rounds during the 12 month field operation period (roughly every 2 rounds/2 weeks) of work is followed by 1 round/1 week of rest).
ix) Table 3 from the Document Sample Design (provided as External Resources) presents a survey schedule starting February 2019 and ending February 2020. During this period of 32 working weeks (corresponding to 32 different selected EAs) the teams will be on the field (a 3 weeks period of rest during Christmas period).
x) The number of interviewer by team and number of team by province will determine the total sample size within each province. A team made of 3 interviewers can achieve 480 households over the period, while a team of 2 interviewers can achieve only 320 cases.
xi) The intraclass correlation is used to calculate the precision loss due to clustering. Like the standard deviation, the intracluster correlation is considered to be a true population parameter, and therefore transferable between designs. We have to accept the hypothesis that this correlation factor has not changed during the period 2010-2019, and therefore can be used to predict DEFF and RSE for the next survey given an adjusted design (based on the conclusions provided by the 2010 design). Table 2 from the Document Sample Design (provided as External Resources) predicts the design effect and sampling error of the variable of interest given the new sample design that is based on:
- the sample size within each strata
- the number of teams within each strata
- the number of interviewers per team
In order to allow more flexibility in the sample size, it is preferable to set up some teams of 3 interviewers, that can achieve 480 households, which represent a good sample size for Torba and Sanma urban and some teams of 2 interviewers that will achieve 320 households each (2 teams will be required in other provinces).
xii) The proposed design in Table 2 from the Document Sample Design (provided as External Resources) shows a total sample size of 4,640 households and a higher level of accuracy of the estimate of the variable of interest in all the stratas. Only Shefa rural shows a RSE higher than 5%, which will be still acceptable. The high intraclass correlation in Shefa rural impacts the variance of the estimates and lead to an increase the sample size or a decrease of the number of households to interview per EA which is logistically and financially not recommended.