Often sampling is done in order to estimate the proportion of a population that has a specific characteristic, such as the proportion of all items coming off an assembly line that are defective or the proportion of all people entering a retail store who make a purchase before leaving. The population proportion is denoted \(p\) and the sample proportion is denoted \(\hat
\). Thus if in reality \(43\%\) of people entering a store make a purchase before leaving,
\[p = 0.43 \nonumber \]
if in a sample of \(200\) people entering the store, \(78\) make a purchase,
The sample proportion is a random variable: it varies from sample to sample in a way that cannot be predicted with certainty. Viewed as a random variable it will be written \(\hat\). It has a mean \(μ_<\hat>\) and a standard deviation \(σ_<\hat>\). Here are formulas for their values.
Suppose random samples of size \(n\) are drawn from a population in which the proportion with a characteristic of interest is \(p\). The mean \(μ_<\hat>\) and standard deviation \(σ_<\hat>\) of the sample proportion \(\hat\) satisfy
The Central Limit Theorem has an analogue for the population proportion \(\hat
\). To see how, imagine that every element of the population that has the characteristic of interest is labeled with a \(1\), and that every element that does not is labeled with a \(0\). This gives a numerical population consisting entirely of zeros and ones. Clearly the proportion of the population with the special characteristic is the proportion of the numerical population that are ones; in symbols,
But of course the sum of all the zeros and ones is simply the number of ones, so the mean \(μ\) of the numerical population is
Thus the population proportion \(p\) is the same as the mean \(μ\) of the corresponding population of zeros and ones. In the same way the sample proportion \(\hat
\) is the same as the sample mean \(\bar\). Thus the Central Limit Theorem applies to \(\hat
\). However, the condition that the sample be large is a little more complicated than just being of size at least \(30\).
For large samples, the sample proportion is approximately normally distributed, with mean \(μ_<\hat
>=p\) and standard deviation \(\sigma _<\hat
>=\sqrt>\).
A sample is large if the interval \(\left [ p-3\sigma _<\hat
>,\, p+3\sigma _<\hat
> \right ]\) lies wholly within the interval \([0,1]\).
In actual practice \(p\) is not known, hence neither is \(σ_<\hat
>\). In that case in order to check that the sample is sufficiently large we substitute the known quantity \(\hat
\) for \(p\). This means checking that the interval
lies wholly within the interval \([0,1]\). This is illustrated in the examples.
Figure \(\PageIndex\) shows that when \(p = 0.1\), a sample of size \(15\) is too small but a sample of size \(100\) is acceptable.
Figure \(\PageIndex\) shows that when \(p=0.5\) a sample of size \(15\) is acceptable.
Suppose that in a population of voters in a certain region \(38\%\) are in favor of particular bond issue. Nine hundred randomly selected voters are asked if they favor the bond issue.
\) computed from samples of size \(900\) meets the condition that its sampling distribution be approximately normal.
\):
Then \(3\sigma _<\hat
>=3(0.01618)=0.04854\approx 0.05\) so
which lies wholly within the interval \([0,1]\), so it is safe to assume that \(\hat
\) is approximately normally distributed.
An online retailer claims that \(90\%\) of all orders are shipped within \(12\) hours of being received. A consumer group placed \(121\) orders of different sizes and at different times of day; \(102\) orders were shipped within \(12\) hours.
\[\left [ p-3\sigma _<\hat
>,\, p+3\sigma _<\hat
> \right ]=[0.90-0.08,0.90+0.08]=[0.82,0.98]\nonumber \]
it is appropriate to use the normal distribution to compute probabilities related to the sample proportion \(\hat\).
=0.84\), when taken from a population in which the actual proportion is \(0.90\). This is so unlikely that it is reasonable to conclude that the actual value of \(p\) is less than the \(90\%\) claimed.
This page titled 6.3: The Sample Proportion is shared under a CC BY-NC-SA 3.0 license and was authored, remixed, and/or curated by Anonymous via source content that was edited to the style and standards of the LibreTexts platform.