Extended sampling. Seminar topic: sampling in sociological research Key concepts. Practical calculation examples

The concept of “representativeness” in relation to sociological surveys - public opinion polls - has an almost magical effect on people. The term “representation” itself, in addition to its scientific meaning, also has a clearly political meaning.

What is the reason? The whole point is that it is assumed that a sample (a group of people selected for a survey) can represent (represent) the entire population. The general population in the case of all-Russian surveys is the entire population of the country. Now let's imagine that we're talking about about a political decision - supporting a bill or voting in an election. With the help of a sample survey, we get an excellent mechanism of political representation - a mechanism in which a small group of people can represent the opinion or position of the entire population of the country. That is why the representativeness of the study is given such an important place.

The concept of representativeness is, of course, used not only in political research. The term is used almost always when talking about large-scale research, be it in the field of marketing, economic behavior or education.

Representative survey methodology

How, after interviewing 1,500 people, can one draw conclusions about all Russians, of whom there are more than 140 million (and even more than 110 million voters)? The technology behind representative surveys is based on statistical laws. The closest basis is the law of large numbers, or Bernoulli's theorem.

In a simplified way, its meaning can be conveyed as follows. Suppose we have some attribute, for example, the amount of precipitation per day in Yekaterinburg during the twentieth century. If we write down all its values ​​along with their frequency (this is called a distribution), and then randomly take enough big number cases (that is, not all days in the 20th century, but quite a lot), then we will see that the distribution in our sample will be very similar to the distribution for the entire 20th century. Thus, if we sample some units from a population, they may indeed represent the entire population, and there is in fact no need to collect data on all cases.

However, there is a key condition: this is only true if the selection is strictly random. The only problem here may be deviation from randomness. So, if we take only precipitation data for last years(for example, because this data is easier to find) or we survey 1,500 of our acquaintances (because it is easier to contact them), rather than random people, then the sample, of course, will not be representative.

Imagine that out of 143.5 million Russians you randomly select the 1,500 people you need. Then, for example, the proportion of middle managers among them will be approximately equal to the proportion of middle managers in the population, which shows that your sample can represent the entire population. Could it be that these two indicators will be very different? For example, among Russians it is 14%, but in the sample it will be only 1%? Theoretically, this is possible, but the probability of this is so small that it can be neglected (much like meeting a dragon on the street).

Moreover, the most pleasant thing about this probability is not even that it is small, but that for random processes this probability can be calculated. We can tell how likely our sample value is to deviate from the population value by 13% (as in the example above), and how likely it is to deviate, say, from the population by 2.5%. Usually, however, they do the opposite: first they determine the probability with which we want our value not to deviate from the value in the general population (most often it is fixed at the level of 95%), and then they look at what the magnitude of the deviation is for a particular size samples. This deviation is called a confidence interval, sometimes called sampling error or statistical error, and is often listed next to survey results.

So, the probability of deviation, the magnitude of the deviation (confidence interval) and sample size are related. Based on this, the formula for calculating the sample size is as follows:

where n is the sample size, Δ is the confidence interval, z is the value of the normal distribution function for a given probability of deviation (for a 5% probability this value is 1.96).

This is a simplified formula; real surveys use slightly more complex formulas. This formula may also fail if the value is very different from 50% (so, for example, this formula will not be suitable for estimating the proportion of people with a rare disease in a country).

This is what happens if you substitute some values ​​into this formula:

In other words, if we took a random sample of Russians of 1,600 people and assessed some indicator, for example, the willingness to vote for a certain politician, then with a probability of 95% our assessment will not differ from the willingness to vote for him among all Russians by more than 2. 45%.

Sample size

So, the larger the sample size, the more likely we are to be closer to the population proportion. It would seem that this means that we need to try to bring the sample closer to 143.5 million. In fact, as can be seen from the table, the nature of random processes is such that from a certain moment the probability of falling into the interval begins to increase very slowly (and this moment comes pretty quickly). After we sample 1,500 units, no matter how much we increase the sample size, the probability that our sample value will fall into the population value will increase very, very slowly.

In fact, there is almost no difference between 1,500 and 10,000 respondents. Around 1500, we can already say that our estimates will differ from the share in the general population by 2–3%. If we increase the sample further, then this possible error will decrease, but very slightly. In other words, a sample of 100,000 is better than a sample of 2,500, but the difference is so small that it is meaningless and, in the case of social surveys, not economically justifiable. It is usually expensive to enlarge a sample, so it does not make sense to inflate it in order to gain one percentage point in the size of the confidence interval.

It is important that the size of the population does not appear in the formula at all. The point is that when the population is large (more than 20,000), it has virtually no effect on the sample size. This way, we don't need to know how many people live in Russia to build representative sample. It is clear that choosing 1500 out of 2000 most likely does not make sense - it is easier to examine 2000 and get an accurate estimate. But by making a sample, if necessary, we get the opportunity to generalize its results to the general population. And for the same reason, the sample size will not be different for large and small countries.

Representativeness and accuracy

To understand the meaning of the concept of “representativeness,” let's consider a sample of 15 people. Oddly enough, if you made it by accident, it is also representative. Moreover, you can make a sample of one unit. Imagine a box of balls from which you randomly draw one ball. If this is a randomly selected ball, then it will also represent all the balls that are in this box. He will simply represent them not exactly. Why? Because there is a very high probability of making a mistake. Next time we can pull out a different ball and get a different view of the balls in the box. To represent inaccurately means to have a wide range of estimates.

In the same way, 15 people represent any general population, but they do not represent it accurately, because the error and confidence interval are very large. We will have to add +/- 33% to get a 95% chance that we will fall into the interval. If we are ready to allow this, then we take 15 people, find out that 7 of them are middle managers, and then we get an estimate that 7/15 of the total, that is, 47% +/- 33%, is the estimate the share of managers in the general population, and this is an absolutely correct conclusion. It just doesn't have any value. We could say this without an examination. Therefore, when planning a sample, it makes sense to achieve a sample size that makes sense from a cost-effectiveness perspective.

All that has been said is intended to convey one simple idea, which is very often not realized: sample size is not related to its representativeness.

A small sample is imprecise, but it can still be representative. The sample sizes that are used today in mass surveys in Russia almost always have fairly high accuracy.

What threatens the representativeness of the sample is not its size, but bias, that is, deviation from the principle of randomness.

Violation of the randomness principle

If we begin to select units in a non-random manner, the sample becomes unrepresentative. For example, if something prevents us from selecting them randomly. Let's imagine that we want to select balls from our box at random, but it turns out that some of the balls bite. A mechanism in which we will take only those balls that are given to us is a mechanism that violates randomness and therefore violates representativeness. In this case, no matter how many marbles we take from the box (even if we take all the marbles that do not bite), we will have an unrepresentative sample, because we will not take into account any of the ones that bite - they will simply bypass our sample .

The biggest problem with biting balls is that they may differ from those that come into our hands, and differ in precisely the way that interests us. This situation is called sampling bias.

It is necessary to distinguish the situation of inaccurate representation, which we described above, from the situation of unrepresentation. These are different problems, and they have different ways solutions. You cannot solve one of them by solving the other. If the sample lacks representativeness, there is no point in increasing it. Moreover, large samples in social surveys tend to accumulate errors, so large sample sizes can only make the representation problem worse.

Why representativeness is impossible

In the notes to tables with survey results, you can often see that “the sample size is 1,600 people, the sample is representative of gender and age.” From the above it is obvious that these are two different parameters: The indication of representativeness is not related to sample size. What this really means is that certain procedures were followed to ensure a match between the sample and the population. For example, to ensure representativeness by gender, men and women are recruited into the sample in the same proportions as exist among Russians according to census data. But representativeness by gender does not mean representativeness, for example, by political views.

Why do we have to equalize the sample by gender and other socio-demographic categories? Because true representativeness can only be ensured by a random sample, and it is impossible to implement it in practice for a variety of reasons. Once you try to do this, you will encounter many problems - no matter what method you choose to use. Some respondents will be completely inaccessible to your method (for example, for personal interviews, houses with intercoms and security are a big problem), another part will be absent, not answer, or will prefer to mind their own business. There are people who have language problems and cannot speak to us. There are people who don't understand why this is needed, and they don't want to talk to us. All of these are serious violations of randomness that make its implementation impossible.

Those who reduce the problem of representation in mass surveys to statistics forget that people are very specific blobs. There are balls that run and hide. There are balls that bite. They are not passive objects, they fight back. They say, “I don't want to take your survey,” thereby violating the randomness. Therefore, in the strict sense of the word, representativeness in mass surveys is, of course, impossible in any form.

A mechanism has been developed by which the appearance of representativeness is usually ensured: we align the sample in some categories and pretend that it is also aligned in all other possible categories. In fact, we have no reason to say this. But the problem is that there is no way to check this - again due to the fact that some balls bite. To check for bias, the reviewer would have to go to those we didn't interview and interview them. But they, as we remember, do not want to be questioned at all. It is impossible to interview those who categorically do not answer. Therefore, everyone works on the assumption that if we have balanced the sample along two or three parameters, it is representative of the entire population, although there is no good basis for this assumption.

Representative sampling is a technology borrowed by sociologists from statistics. Therefore, it inevitably contains elements of a mathematical and statistical picture of the world. Perhaps the strongest assumption is that the sample survey itself is politically and sociologically neutral: participation and non-participation in the survey does not carry political meaning and is not related to other sociologically important parameters. But today, polls have become one of the main political institutions and have become a key intermediary between large corporations and consumers. Under these conditions, it is no longer possible to believe in their political sterility. However, we still know little about how surveys are understood in modern societies and what they actually represent.

One of the main components of a well-designed study is defining the sample and what a representative sample is. It's like the cake example. After all, you don’t have to eat the whole dessert to understand its taste? A small part is enough.

So, the cake is population (that is, all respondents who are eligible for the survey). It can be expressed geographically, for example, only residents of the Moscow region. Gender - women only. Or have age restrictions - Russians over 65 years old.

Calculating the population is difficult: you need to have data from the population census or preliminary assessment surveys. Therefore, usually the general population is “estimated”, and from the resulting number they calculate sample population or sample.

What is a representative sample?

Sample– this is a clearly defined number of respondents. Its structure should coincide as much as possible with the structure of the general population in terms of the main characteristics of selection.

For example, if potential respondents are the entire population of Russia, where 54% are women and 46% are men, then the sample should contain exactly the same percentage. If the parameters coincide, then the sample can be called representative. This means that inaccuracies and errors in the study are reduced to a minimum.

The sample size is determined taking into account the requirements of accuracy and economy. These requirements are inversely proportional to each other: the larger the sample size, the more accurate the result. Moreover, the higher the accuracy, the correspondingly more costs are required to conduct the study. And vice versa, the smaller the sample, the less costs it costs, the less accurately and more randomly the properties of the general population are reproduced.

Therefore, to calculate the volume of choice, sociologists invented a formula and created special calculator:

Confidence probability And confidence error

What do the terms " confidence probability" And " confidence error"? Confidence probability is an indicator of measurement accuracy. And the confidence error is a possible error in the research results. For example, with a population of more than 500,00 people (let’s say living in Novokuznetsk), the sample will be 384 people with a confidence probability of 95% and an error of 5% OR (with a confidence interval of 95±5%).

What follows from this? When conducting 100 studies with such a sample (384 people), in 95 percent of cases the answers obtained, according to the laws of statistics, will be within ±5% of the original one. And we will receive a representative sample with a minimum probability of statistical error.

After calculating the sample size is completed, you can see if there is a sufficient number of respondents in the demo version of the Questionnaire Panel. You can find out more about how to conduct a panel survey.

Empirical studies are considered one of the main means of studying social relations and processes. They provide reliable, complete and representative information.

Specifics of techniques

Empirical ones provide fact-recording knowledge. They contribute to the establishment and generalization of circumstances through indirect or direct registration of events characteristic of the relationships, objects, and phenomena being studied. Empirical methods differ from theoretical ones in that the subject of analysis is:

  1. Behavior of individuals and their groups.
  2. Products of human activity.
  3. Verbal actions of individuals, their judgments, views, opinions.

Sample studies

Empirical study is always focused on obtaining objective and accurate information and quantitative data. In this regard, when performing it, it is necessary to ensure the representativeness of the information. Accordingly, correct sample population. This This means that the selection must be carried out in such a way that the data obtained from a narrow group reflect the trends occurring in the general mass of respondents. For example, when surveying 200-300 people, the data obtained can be extrapolated to the entire urban population. The indicators of the sample population allow us to take a different approach to the study of socio-economic processes in the region and in the country as a whole.

Terminology

To better understand the issues surrounding sampling studies, it is necessary to clarify some definitions. The unit of observation is the direct source of information. It can be an individual, a group, a document, an organization, and so on. The general population is complex of observation units. They all must be relevant to the problem being studied. Subject to direct analysis. The study is carried out in accordance with developed methods of collecting information. To determine this proportion of the entire array of respondents, use concept of "sample population". Its ability to reflect the key parameters of the total mass of people is called representativeness. In some cases there are no matches. Then they talk about the representativeness error.

Ensuring representativeness

Issues related to it are discussed in detail within the framework of statistics. The problems are complex, since, on the one hand, we are talking about providing a quantitative representation, which gives general population. This means, in particular, that groups of respondents should be represented in optimal numbers. The quantity must be sufficient for normal representation. On the other hand, we also mean qualitative representation. It presupposes a certain subject composition that forms sample population. This This means that, for example, we cannot talk about representativeness if only men or only women, elderly people or young people are surveyed. The study should be carried out within all groups represented.

Sample characteristics

This term is considered in two aspects. First of all, it is defined as a complex of elements from the general array of people whose opinions are being studied - this is sample population. This also the process of creating a certain category of respondents while ensuring representativeness as required. In practice, there are several types and types of selection. Let's look at them.

Types

There are three of them:

  1. Spontaneous sample population. This a set of respondents selected on the principle of voluntariness. At the same time, it is ensured that units from the total mass of people can be included in a specific study group. Spontaneous selection is used quite often in practice. For example, during surveys in the press, at the post office. However, this technique has a significant drawback. It is impossible to qualitatively represent the entire volume of the general sample. This technique is used with economy in mind. In some surveys this option is the only possible one.
  2. Spontaneous sample population. This one of the main techniques used in studying. The key principle of such selection is to ensure that each observation unit has the opportunity to fall from the general mass of individuals into a narrow group. Various techniques are used for this. For example, this could be a lottery, mechanical selection, or a table of random numbers.
  3. Stratified (quota) sampling. It is based on the formation of a qualitative model of the total mass of respondents. After this, units are selected into the sample population. For example, it is carried out by age or gender, by segment of the population, and so on.

Kinds

The following samples exist:

Additionally

Samples can also be dependent or independent. In the first case, the experimental procedure and the results that will be obtained during it for one group of respondents have a certain impact on another. Accordingly, independent samples do not suggest such an effect. Here, however, one important point should be noted. One group of subjects, in respect of which a psychological examination was carried out twice (even if it was aimed at studying various qualities, characteristics, signs), will by default be considered dependent.

Probability selections

Let's look at some types of samples:

  1. Random. It assumes homogeneity of the total population, one probability of availability of all components, and the presence of a complete list of elements. Typically, the selection process uses a table with random numbers.
  2. Mechanical. This variety random sample involves ordering according to a certain criterion. For example, by phone number, in alphabetical order, by date of birth, and so on. The first component is selected at random. Next, each k element is selected with step n. The size of the total population will be N=k*n.
  3. Stratified. This sample is used when the overall population is heterogeneous. The latter is divided into strata (groups). In each of them, selection is carried out mechanically or randomly.
  4. Serial. The selection of groups is carried out randomly. Inside them, objects are studied in bulk.

Non-probability selections

They involve sampling not according to the principle of randomness, but according to subjective criteria: typicality, availability, equal representation, and so on. The following selections fall into this category:

Nuance

To ensure representativeness, an accurate and complete list of population units is necessary. The objects of observation, as a rule, are one person. It is better to select from the list by numbering the units and using a table with random numbers. But the quasi-random method is also used quite often. It involves selecting every n element from a list.

Influencing factors

The volume of a population is the number of its units. According to experts, it does not have to be large. No doubt than larger number respondents, the more accurate the result. However, at the same time, a large volume does not always guarantee success. For example, this happens when the total pool of respondents is heterogeneous. A population will be considered homogeneous if the controlled parameter, for example, the literacy level, is distributed evenly, that is, there are no voids or condensations. In this case, it will be enough to interview several people. Based on the results of the survey, it will be possible to conclude that most people have a normal level of literacy. It follows from this that the representativeness of information is influenced not by quantitative characteristics, but by the qualitative characteristics of the population - the level of its homogeneity, in particular.

Errors

They represent the deviation of the average parameters of the sample population from the values ​​of the total mass of respondents. In practice, errors are identified using comparison. When surveying adults, information from censuses, statistics, and the results of past surveys is usually used. The control parameters are usually a comparison of the average values ​​of populations (general and sample), determining the error in accordance with this and reducing this deviation is called control of representativeness.

conclusions

Sample research is a way of collecting data about people’s attitudes and behavior through a survey of specially selected groups of respondents. This technique is considered reliable and economical, although it requires some technique. The sample population serves as the basis. It acts as a certain proportion of the total mass of people. Selection is made using special techniques and is aimed at obtaining information about the entire population. The latter, in turn, is represented by all possible public objects or that group of them that will be studied. Often the population is so large that interviewing every representative would be a costly and cumbersome process. Therefore, a reduced model of it is used. The sample population includes all those who receive questionnaires, who are called respondents, who, in fact, act as the object of study. Simply put, it is made up of many people who are surveyed.

Conclusion

The objectives of the survey are determined by specific categories included in the population. As for the specific share of the total mass of people, it consists of subjects included in groups using mathematical calculations. To select units, a description of the object in the original population is necessary. After determining the number of subjects, the method or method of forming groups is determined. The results of the survey will allow us to describe the characteristic being studied in relation to all representatives of the general mass of people. As practice shows, mostly selective rather than comprehensive studies are carried out.

Research usually begins with some assumption that requires verification using facts. This assumption - a hypothesis - is formulated in relation to the connection of phenomena or properties in a certain set of objects.

To test such assumptions against facts, it is necessary to measure the corresponding properties of their bearers. But it is impossible to measure anxiety in all women and men, just as it is impossible to measure aggressiveness in all adolescents. Therefore, when conducting research, it is limited to only a relatively small group of representatives of the relevant populations of people.

Population— this is the entire set of objects in relation to which a research hypothesis is formulated.

For example, all men; or all women; or all the inhabitants of a city. The general populations in relation to which the researcher is going to draw conclusions based on the results of the study may be more modest in number, for example, all first-graders of a given school.

Thus, the general population is, although not infinite in number, but, as a rule, inaccessible for continuous research, a set of potential subjects.

Sample or sample population- this is a group of objects limited in number (in psychology - subjects, respondents), specially selected from the general population to study its properties. Accordingly, the study of the properties of the general population using a sample is called sampling study. Almost everything psychological research are sampled, and their conclusions extend to general populations.

Thus, after a hypothesis has been formulated and the corresponding populations have been identified, the researcher faces the problem of organizing a sample. The sample should be such that the generalization of the conclusions of the sample study is justified - generalization, extension of them to the general population. Main criteria for the validity of research conclusionsthese are the representativeness of the sample and the statistical reliability of the (empirical) results.

Representativeness of the sample- in other words, its representativeness is the ability of the sample to represent the phenomena under study quite fully - from the point of view of their variability in the general population.

Of course, only the general population can give a complete picture of the phenomenon being studied, in all its range and nuances of variability. Therefore, representativeness is always limited to the extent that the sample is limited. And it is the representativeness of the sample that is the main criterion in determining the boundaries of generalization of research findings. However, there are techniques that make it possible to obtain a sample representativeness sufficient for the researcher (These techniques are studied in the course “Experimental Psychology”).


The first and main technique is simple random (randomized) selection. It involves ensuring such conditions that each member of the population has equal chances with others to be included in the sample. Random selection ensures that a variety of representatives of the general population can be included in the sample. In this case, special measures are taken to prevent the emergence of any pattern during selection. And this allows us to hope that ultimately, in the sample, the property being studied will be represented, if not in all, then in its maximum possible diversity.

The second way to ensure representativeness is stratified random sampling, or selection based on the properties of the general population. It involves a preliminary determination of those qualities that can influence the variability of the property being studied (this could be gender, level of income or education, etc.). Then the percentage ratio of the number of groups (strata) differing in these qualities in the general population is determined and an identical percentage ratio of the corresponding groups in the sample is ensured. Next, subjects are selected into each subgroup of the sample according to the principle of simple random selection.

Statistical significance, or statistical significance, the results of a study are determined using statistical inference methods.

Are we insured against making mistakes when making decisions, when drawing certain conclusions from the research results? Of course not. After all, our decisions are based on the results of the study of the sample population, as well as on the level of our psychological knowledge. We are not completely immune from mistakes. In statistics, such errors are considered acceptable if they occur no more often than in one case out of 1000 (probability of error α = 0.001 or the associated confidence probability of a correct conclusion p = 0.999); in one case out of 100 (probability of error α = 0.01 or the associated confidence probability of a correct conclusion p = 0.99) or in five cases out of 100 (probability of error α = 0.05 or the associated confidence probability of a correct conclusion output p=0.95). It is at the last two levels that decisions are made in psychology.

Sometimes, when talking about statistical significance, they use the concept of “level of significance” (denoted as α). The numerical values ​​of p and α complement each other up to 1,000 - a complete set of events: either we made the right conclusion, or we made a mistake. These levels are not calculated, they are given. The level of significance can be understood as a kind of “red” line,” the intersection of which will allow us to talk about this event as non-random. In every good scientific report or publication, the conclusions drawn should be accompanied by an indication of the p or α values ​​at which the conclusions were drawn.

Methods of statistical inference are discussed in detail in the Mathematical Statistics course. Now we just note that they have certain requirements for the number, or sample size.

Unfortunately, there are no strict guidelines for pre-determining the required sample size. Moreover, the researcher usually receives the answer to the question about the necessary and sufficient number too late - only after analyzing the data of an already surveyed sample. However, the most general recommendations can be formulated:

1. The largest sample size is required when developing a diagnostic technique - from 200 to 1000-2500 people.

2. If it is necessary to compare 2 samples, their total number must be at least 50 people; the number of samples being compared should be approximately the same.

3. If the relationship between any properties is being studied, then the sample size should be at least 30-35 people.

4. The more variability property being studied, the larger the sample size should be. Therefore, variability can be reduced by increasing the homogeneity of the sample, for example, by gender, age, etc. This, of course, reduces the ability to generalize conclusions.

Dependent and independent samples. A common research situation is when a property of interest to a researcher is studied on two or more samples for the purpose of further comparison. These samples can be in different proportions, depending on the procedure for their organization. Independent samples are characterized by the fact that the probability of selection of any subject in one sample does not depend on the selection of any of the subjects in another sample. Against, dependent samples are characterized by the fact that each subject from one sample is matched according to a certain criterion by a subject from another sample.

In general, dependent samples involve pairwise selection of subjects into compared samples, and independent samples imply an independent selection of subjects.

It should be noted that cases of “partially dependent” (or “partially independent”) samples are unacceptable: this unpredictably violates their representativeness.

In conclusion, we note that two paradigms of psychological research can be distinguished.

So-called R-methodology involves the study of the variability of a certain property (psychological) under the influence of a certain influence, factor or other property. A sample is a set of subjects.

Another approach Q-methodology, involves the study of the variability of a subject (individual) under the influence of various stimuli (conditions, situations, etc.). It corresponds to the situation when the sample is a set of stimuli.

The total number of objects of observation (people, households, enterprises, settlements etc.), possessing a certain set of characteristics (gender, age, income, number, turnover, etc.), limited in space and time. Examples of populations

  • All residents of Moscow (10.6 million people according to the 2002 census)
  • Male Muscovites (4.9 million people according to the 2002 census)
  • Legal entities Russia (2.2 million at the beginning of 2005)
  • Retail outlets selling food products (20 thousand at the beginning of 2008), etc.

Sample (Sample Population)

A portion of a population selected for study in order to draw conclusions about the entire population. In order for the conclusion obtained by studying the sample to be extended to the entire population, the sample must have the property of representativeness.

Representativeness of the sample

The property of a sample to correctly reflect the population. The same sample can be representative and unrepresentative for different populations.
Example:

  • A sample consisting entirely of Muscovites who own a car does not represent the entire population of Moscow.
  • A sample of Russian enterprises with up to 100 employees does not represent all enterprises in Russia.
  • A sample of Muscovites shopping at the market does not represent the purchasing behavior of all Muscovites.

At the same time, these samples (subject to other conditions) can perfectly represent Muscovites who own cars, small and medium-sized Russian enterprises, and buyers who make purchases in markets, respectively.
It is important to understand that sample representativeness and sampling error are different phenomena. Representativeness, unlike error, does not depend in any way on the sample size.
Example:
No matter how much we increase the number of Muscovites who are car owners surveyed, we will not be able to represent all Muscovites with this sample.

Sampling error (confidence interval)

The deviation of the results obtained using sample observation from the true data of the general population.
There are two types of sampling error - statistical and systematic. Statistical error depends on sample size. The larger the sample size, the lower it is.
Example:
For a simple random sample of 400 units, the maximum statistical error (with 95% confidence level) is 5%, for a sample of 600 units - 4%, for a sample of 1100 units - 3% Usually, when they talk about sampling error, they mean statistical error .
Systematic error depends on various factors that constantly influence the study and bias the results of the study in a certain direction.
Example:

  • Using any probability samples will underestimate the proportion of people with high incomes who lead an active lifestyle.
  • This happens due to the fact that it is much more difficult to find such people in any specific place (for example, at home).

The problem of respondents refusing to answer questions (the share of “refuseniks” in Moscow, for different surveys, ranges from 50% to 80%)

In some cases, when the true distributions are known, the systematic error can be leveled out by introducing quotas or reweighting the data, but in most real studies it can be quite problematic to even estimate it.

Sample types

  • Samples are divided into two types:
  • probabilistic

non-probabilistic
1. Probability samples
1.1 Random sampling (simple random sampling) Such a sample assumes the homogeneity of the general population, the same probability of availability of all elements, the presence full list
all elements. When selecting elements, as a rule, a table of random numbers is used.
A type of random sample, ordered by some characteristic (alphabetical order, phone number, date of birth, etc.). The first element is selected randomly, then, with step 'n', every 'k'th element is selected. The size of the population, in this case – N=n*k
1.3 Stratified (zoned)
It is used in case of heterogeneity of the population. The general population is divided into groups (strata). In each stratum, selection is carried out randomly or mechanically.
1.4 Serial (cluster or cluster) sampling
In serial sampling, the units of selection are not the objects themselves, but groups (clusters or nests). Groups are selected randomly. Objects within groups are examined in bulk.

2. Non-probability samples
Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.
2.1. Quota sampling
Initially, a number of groups of objects are identified (for example, men aged 20-30 years, 31-45 years and 46-60 years old; persons with income up to 30 thousand rubles, with income from 30 to 60 thousand rubles and with income over 60 thousand rubles ) For each group, the number of objects that must be examined is specified. The number of objects that should fall into each of the groups is most often set either in proportion to the previously known share of the group in the general population, or the same for each group. Within groups, objects are selected randomly. Quota sampling is used quite often.
2.2. Snowball method
The sample is constructed as follows. Each respondent, starting with the first, is asked for contact information of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with a high income, respondents belonging to the same professional group, respondents with any similar hobbies/interests, etc.)
2.3 Spontaneous sampling
The most accessible respondents are surveyed. Typical examples spontaneous samples - in newspapers/magazines, given to respondents for self-completion, most online surveys. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents.
2.4 Sample of typical cases
Units of the general population that have an average (typical) value of the characteristic are selected. This raises the problem of selecting a feature and determining its typical value.

Course of lectures on the theory of statistics

More detailed information on sample observations can be obtained by viewing.

Have questions?

Report a typo

Text that will be sent to our editors: