Column "A"
assumes Simple Random Sampling methodology. If you are using
Telephone,
Internet or
Postal data collection methodology then sample size calculated
in column "A" is a initial sample size. If clustering of sampling units for this type of data
collection methodology is not used then you can ignore column "B" and go straight to column "C"
which requires expected values for Response Rates, Eligibility and Coverage Rates.
CONFIDENCE LEVEL - Probability
Statistical inference from surveys is typically expressed in the form of Confidence Intervals that
the population value lies within +/- % from the estimated sample value. If estimated proportion from
the sample is 30% then we may say, with 95% confidence (or, another way, there is a chance 95 in 100),
that the population value lies between 27% and 33% i.e. +/- 3%.
In this box you need to enter the confidence level that you have in your statement about population
value. Usually this is 95% but it could be 99% or 90%.
PRECISION REQUIRED - Confidence Interval
Precision required is often referred to as margin of error. If we are asked to provide an estimate
within +/- 4% then in the box for precision required we need to enter 4. Confidence interval and
confidence level are part of the same statement.
POPULATION SIZE
Precision of the sample estimate depends on the sample size only and not on the population size.
We have included population size in the Sample Size Calculator but you will quickly realise
that population size have very little effect on the final sample size. However, for populations
that have only few 1000s members population size make a small difference when calculating required
sample size.
If your population is large then in this box you can enter value 1,000,000.
ESTIMATED PROPORTION IN THE POPULATION
To calculate required sample size we need to know variability in the population of the variable
under study. Variability in the population differ from one variable to another and it is not
known in advance. When estimating proportions variability is highest when proportion is equal
50%.
In this box, as a default value we have set proportion to be equal to 50% and for this value the sample size will be at is maximum.
Column "B"
allows us to adjust sample size calculated in column A for the
effect of clustering.
You will need to provide two parameters in order to calculate Design effect on the sample size: Intra-cluster
Correlation Coefficient and the number of interviews required within each cluster. Intra-cluster correlation
values are portable and can be "borrowed" from previous surveys (if available). Knowing intra-cluster
correlation and number of interviews per cluster allow us to calculate Design effect.
INTRA-CLUSTER CORRELATION
Intra-cluster (Intraclass) correlation measures similarity of people within
the same cluster compared to how similar they are to the people in other clusters. Income or voting
patterns tend to have higher intra-cluster correlations because households who live next to each other
tend to have similar financial status or voting preferences. Behavioural and attitudinal variables
tend to have smaller intra-cluster correlations. The smaller the geographical size of the first stage
sampling unit the higher the intra-cluster correlations for various characteristics of interest.
Values of intra-cluster correlation depend on the geographical size of clusters and every variable
within one survey will have different intra-cluster correlation coefficient. Some values of intra-cluster
correlation are listed below (they have been collected and averaged from various sources and can serve
as an indication of possible effects of cluster homogeneity on design effect).
Ethnic origin = from 0.06 to 0.25 or 0.27
Tenure = from 0.20 to 0.40
Party political identification =>
Conservative = 0.08
Labour = 0.12
Liberal Democrat = 0.04
Religion = 0.21 for non Christians and 0.03 for Christians
NUMBER OF INTERVIEWS WITHIN CLUSTER
It is always more efficient sample design that allows for selection of large number of first-stage
sampling units (clusters) and smaller number of households or persons within selected clusters. If
geographical area selected to represent the cluster is small in size it is advisable to select as
few as possible respondents from the cluster and increase the number of clusters. This decision will
obviously depend on the available budget and will be traded with the loss of precision of the survey estimates.
DESIGN EFFECT (Deff)
The Design effect (Deff) is a factor that summarise effects of complex sample designs, such as
clustering and stratification by comparing variance of the simple random sample with variance from
complex sample design. If Deff=1 then precision of, for example, stratified cluster sample, is the
same as precision from simple random sample. when using clustering in the sample design then in
most cases Deff>1 and reflects losses due to clustering i.e. homogeneity of the sample clusters.
*** Effect on Confidence Interval ***
If your survey results are based on cluster sample design and you use simple random sample formula
to calculate confidence intervals (these are provided as a standard output of many standard statistical
packages) then you would fail to adjust confidence interval for the amount equal to SQUARE ROOT of Deff.
For example, if design effect calculated in column "B" is 1.7 then you need to adjust confidence interval
by factor of 1.3 (SQUARE ROOT of 1.7=1.3). For example, if you estimated that your population proportion
lies within +/- 3% then, assuming design factor of 1.3 will increase width of the interval to 1.3 x 3 =
+/- 3.9%. Note that Design factor, often denoted as Deft, is related to standard errors and they can be
"borrowed" from other similar surveys.
Column "C"
allows us to adjust calculated sample sizes in columns "A" and "B" for Response
Rates, Eligibility Rates and Coverage Rates.
RESPONSE RATE
Response rate refers to the number of completed interviews by eligible units in the sample. If we
expect that 20% of the sampled elements will not respond than our response rate will be 80.
ELIGIBILITY RATE
If population of interest are males 24-54 and on average we expect 85% per dwelling then in the box
enter eligibility rate of 85.
COVERAGE RATE
If we are sure that our sampling frame covers all elements of the survey population then in this box
we need to write 100. Deliberate exclusion are not regarded as non-coverage. For example, if we want
to survey adults 18+ then exclusion of population under 18 would not be considered as coverage problem.
But if our sampling frame fail to include all eligible members of the population then coverage rate
will be less than 100%.