Sample Size Calculator

A

B


C

Column "A"

assumes Simple Random Sampling methodology. If you are using Telephone, Internet or Postal data collection methodology then sample size calculated in column "A" is a initial sample size. If clustering of sampling units for this type of data collection methodology is not used then you can ignore column "B" and go straight to column "C" which requires expected values for Response Rates, Eligibility and Coverage Rates.

CONFIDENCE LEVEL - Probability

Statistical inference from surveys is typically expressed in the form of Confidence Intervals that the population value lies within +/- % from the estimated sample value. If estimated proportion from the sample is 30% then we may say, with 95% confidence (or, another way, there is a chance 95 in 100), that the population value lies between 27% and 33% i.e. +/- 3%.

In this box you need to enter the confidence level that you have in your statement about population value. Usually this is 95% but it could be 99% or 90%.

PRECISION REQUIRED - Confidence Interval

Precision required is often referred to as margin of error. If we are asked to provide an estimate within +/- 4% then in the box for precision required we need to enter 4. Confidence interval and confidence level are part of the same statement.

POPULATION SIZE

Precision of the sample estimate depends on the sample size only and not on the population size.

We have included population size in the Sample Size Calculator but you will quickly realise that population size have very little effect on the final sample size. However, for populations that have only few 1000s members population size make a small difference when calculating required sample size.

If your population is large then in this box you can enter value 1,000,000.

ESTIMATED PROPORTION IN THE POPULATION

To calculate required sample size we need to know variability in the population of the variable under study. Variability in the population differ from one variable to another and it is not known in advance. When estimating proportions variability is highest when proportion is equal 50%.

In this box, as a default value we have set proportion to be equal to 50% and for this value the sample size will be at is maximum.

Column "B"

allows us to adjust sample size calculated in column A for the effect of clustering. You will need to provide two parameters in order to calculate Design effect on the sample size: Intra-cluster Correlation Coefficient and the number of interviews required within each cluster. Intra-cluster correlation values are portable and can be "borrowed" from previous surveys (if available). Knowing intra-cluster correlation and number of interviews per cluster allow us to calculate Design effect.

INTRA-CLUSTER CORRELATION

Intra-cluster (Intraclass) correlation measures similarity of people within the same cluster compared to how similar they are to the people in other clusters. Income or voting patterns tend to have higher intra-cluster correlations because households who live next to each other tend to have similar financial status or voting preferences. Behavioural and attitudinal variables tend to have smaller intra-cluster correlations. The smaller the geographical size of the first stage sampling unit the higher the intra-cluster correlations for various characteristics of interest.

Values of intra-cluster correlation depend on the geographical size of clusters and every variable within one survey will have different intra-cluster correlation coefficient. Some values of intra-cluster correlation are listed below (they have been collected and averaged from various sources and can serve as an indication of possible effects of cluster homogeneity on design effect).

                                Ethnic origin = from 0.06 to 0.25 or 0.27

                                Tenure = from 0.20 to 0.40

                                Party political identification =>

                                    Conservative = 0.08
                                    Labour = 0.12
                                    Liberal Democrat = 0.04

                                Religion = 0.21 for non Christians and 0.03 for Christians
                            

NUMBER OF INTERVIEWS WITHIN CLUSTER

It is always more efficient sample design that allows for selection of large number of first-stage sampling units (clusters) and smaller number of households or persons within selected clusters. If geographical area selected to represent the cluster is small in size it is advisable to select as few as possible respondents from the cluster and increase the number of clusters. This decision will obviously depend on the available budget and will be traded with the loss of precision of the survey estimates.

DESIGN EFFECT (Deff)

The Design effect (Deff) is a factor that summarise effects of complex sample designs, such as clustering and stratification by comparing variance of the simple random sample with variance from complex sample design. If Deff=1 then precision of, for example, stratified cluster sample, is the same as precision from simple random sample. when using clustering in the sample design then in most cases Deff>1 and reflects losses due to clustering i.e. homogeneity of the sample clusters.

*** Effect on Confidence Interval ***
If your survey results are based on cluster sample design and you use simple random sample formula to calculate confidence intervals (these are provided as a standard output of many standard statistical packages) then you would fail to adjust confidence interval for the amount equal to SQUARE ROOT of Deff. For example, if design effect calculated in column "B" is 1.7 then you need to adjust confidence interval by factor of 1.3 (SQUARE ROOT of 1.7=1.3). For example, if you estimated that your population proportion lies within +/- 3% then, assuming design factor of 1.3 will increase width of the interval to 1.3 x 3 = +/- 3.9%. Note that Design factor, often denoted as Deft, is related to standard errors and they can be "borrowed" from other similar surveys.

Column "C"

allows us to adjust calculated sample sizes in columns "A" and "B" for Response Rates, Eligibility Rates and Coverage Rates.

RESPONSE RATE

Response rate refers to the number of completed interviews by eligible units in the sample. If we expect that 20% of the sampled elements will not respond than our response rate will be 80.

ELIGIBILITY RATE

If population of interest are males 24-54 and on average we expect 85% per dwelling then in the box enter eligibility rate of 85.

COVERAGE RATE

If we are sure that our sampling frame covers all elements of the survey population then in this box we need to write 100. Deliberate exclusion are not regarded as non-coverage. For example, if we want to survey adults 18+ then exclusion of population under 18 would not be considered as coverage problem. But if our sampling frame fail to include all eligible members of the population then coverage rate will be less than 100%.