Population vs SampleThe population includes all objects of interest whereas the sample is only a portion of the population. Parameters are associated with populations and statistics with samples. Parameters are usually denoted using Greek letters (mu, sigma) while statistics are usually denoted using Roman letters (x, s). Show
There are several reasons why we don't work with populations. They are usually large, and it is often impossible to get data for every object we're studying. Sampling does not usually occur without cost, and the more items surveyed, the larger the cost. We compute statistics, and use them to estimate parameters. The computation is the first part of the statistics course (Descriptive Statistics) and the estimation is the second part (Inferential Statistics) Discrete vs ContinuousDiscrete variables are usually obtained by counting. There are a finite or countable number of choices available with discrete data. You can't have 2.63 people in the room. Continuous variables are usually obtained by measuring. Length, weight, and time are all examples of continous variables. Since continuous variables are real numbers, we usually round them. This implies a boundary depending on the number of decimal places. For example: 64 is really anything 63.5 <= x < 64.5. Likewise, if there are two decimal places, then 64.03 is really anything 63.025 <= x < 63.035. Boundaries always have one more decimal place than the data and end in a 5. Levels of MeasurementThere are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. These go from lowest level to highest level. Data is classified according to the highest level which it fits. Each additional level adds something the previous level didn't have.
Types of SamplingThere are five types of sampling: Random, Systematic, Convenience, Cluster, and Stratified.
Table of Contents Understanding Sampling Methods (Visuals and Code)Image from AuthorSampling is the process of selecting a subset(a predetermined number of observations) from a larger population. It’s a pretty common technique wherein, we run experiments and draw conclusions about the population, without the need of having to study the entire population. In this blog, we will go through two types of sampling methods:
Random SamplingUnder Random sampling, every element of the population has an equal probability of getting selected. Below fig. shows the pictorial view of the same — All the points collectively represent the entire population wherein every point has an equal chance of getting selected. Random SamplingYou can implement it using python as shown below — import randompopulation = 100data = range(population)print(random.sample(data,5)) > 4, 19, 82, 45, 41 Stratified SamplingUnder stratified sampling, we group the entire population into subpopulations by some common property. For example — Class labels in a typical ML classification task. We then randomly sample from those groups individually, such that the groups are still maintained in the same ratio as they were in the entire population. Below fig. shows a pictorial view of the same — We have two groups with a count ratio of x and 4x based on the colour, we randomly sample from yellow and green sets separately and represent the final set in the same ratio of these groups. Stratified SamplingYou can implement it very easily using python sklearn lib. as shown below — from sklearn.model_selection import train_test_splitstratified_sample, _ = train_test_split(population, test_size=0.9, stratify=population[['label']]) You can also implement it without the lib., read this. Cluster SamplingIn Cluster sampling, we divide the entire population into subgroups, wherein, each of those subgroups has similar characteristics to that of the population when considered in totality. Also, instead of sampling individuals, we randomly select the entire subgroups. As can be seen in the below fig. that we had 4 clusters with similar properties (size and shape), we randomly select two clusters and treat them as samples. Cluster SamplingReal-Life example — Class of 120 students divided into groups of 12 for a common class project. Clustering parameters like (Designation, Class, Topic) are all similar over here as well. You can implement it using python as shown below — import numpy as npclusters=5pop_size = 100 sample_clusters=2#assigning cluster ids sequentially from 1 to 5 on gap of 20 cluster_ids = np.repeat([range(1,clusters+1)], pop_size/clusters)cluster_to_select = random.sample(set(cluster_ids), sample_clusters)indexes = [i for i, x in enumerate(cluster_ids) if x in cluster_to_select]cluster_associated_elements = [el for idx, el in enumerate(range(1, 101)) if idx in indexes]print (cluster_associated_elements) Systematic SamplingSystematic sampling is about sampling items from the population at regular predefined intervals(basically fixed and periodic intervals). For example — Every 5th element, 21st element and so on. This sampling method tends to be more effective than the vanilla random sampling method in general. Below fig. shows a pictorial view of the same — We sample every 9th and 7th element in order and then repeat this pattern. Systematic SamplingYou can implement it using python as shown below — population = 100step = 5sample = [element for element in range(1, population, step)] print (sample) Multistage samplingUnder Multistage sampling, we stack multiple sampling methods one after the other. For example, at the first stage, cluster sampling can be used to choose clusters from the population and then we can perform random sampling to choose elements from each cluster to form the final set. Below fig. shows a pictorial view of the same — Multi-stage SamplingYou can implement it using python as shown below — import numpy as npclusters=5pop_size = 100 sample_clusters=2 sample_size=5#assigning cluster ids sequentially from 1 to 5 on gap of 20 cluster_ids = np.repeat([range(1,clusters+1)], pop_size/clusters)cluster_to_select = random.sample(set(cluster_ids), sample_clusters)indexes = [i for i, x in enumerate(cluster_ids) if x in cluster_to_select]cluster_associated_elements = [el for idx, el in enumerate(range(1, 101)) if idx in indexes]print (random.sample(cluster_associated_elements, sample_size)) Convenience SamplingUnder convenience sampling, the researcher includes only those individuals who are most accessible and available to participate in the study. Below fig. shows the pictorial view of the same — Blue dot is the researcher and orange dots are the most accessible set of people in orange’s vicinity. Convenience SamplingVoluntary SamplingUnder Voluntary sampling, interested people usually take part by themselves by filling in some sort of survey forms. A good example of this is the youtube survey about “Have you seen any of these ads”, which has been recently shown a lot. Here, the researcher who is conducting the survey has no right to choose anyone. Below fig. shows the pictorial view of the same — Blue dot is the researcher, orange one’s are those who voluntarily agreed to take part in the study. Voluntary SamplingSnowball SamplingUnder Snowball sampling, the final set is chosen via other participants, i.e. The researcher asks other known contacts to find people who would like to participate in the study. Below fig. shows the pictorial view of the same — Blue dot is the researcher, orange ones are known contacts(of the researcher), and yellow ones (orange’s contacts) are other people that got ready to participate in the study. Snowball Sampling
I hope you enjoyed reading this. If you’d like to support me as a writer, consider signing up to become a Medium member. It’s just $5 a month and you get unlimited access to Medium So, that’s it for this blog. Thank you for your time! In which type of sampling does every element in the population have a known non zero probability?Probability sampling is a technique in which every unit in the population has a chance (non-zero probability) of being selected in the sample, and this chance can be accurately determined.
Which type of sampling is where every element in the population being studied has a known chance of being selected for the study?In simple random sampling (SRS), each sampling unit of a population has an equal chance of being included in the sample. Consequently, each possible sample also has an equal chance of being selected.
What type of sampling is when you number every element in the population and then select every kth element from the list?A method of sampling from a list of the population so that the sample is made up of every kth member on the list, after randomly selecting a starting point from 1 to k.
What is the type of sampling technique where each element of population?Simple Random Sampling
Simple random sampling requires using randomly generated numbers to choose a sample. More specifically, it initially requires a sampling frame, a list or database of all members of a population.
|