Which of the following statements is true for correlation analysis it is a bivariate analysis it is a multivariate analysis it is a univariate analysis both A and C?

Chi-square goodness of fit tests are often used in genetics. One common application is to check if two genes are linked (i.e., if the assortment is independent). When genes are linked, the allele inherited for one gene affects the allele inherited for another gene.

Suppose that you want to know if the genes for pea texture (R = round, r = wrinkled) and color (Y = yellow, y = green) are linked. You perform a dihybrid cross between two heterozygous (RY / ry) pea plants. The hypotheses you’re testing with your experiment are:

  • Null hypothesis (H0): The population of offspring have an equal probability of inheriting all possible genotypic combinations.
    • This would suggest that the genes are unlinked.
  • Alternative hypothesis (Ha): The population of offspring do not have an equal probability of inheriting all possible genotypic combinations.
    • This would suggest that the genes are linked.

You observe 100 peas:

  • 78 round and yellow peas
  • 6 round and green peas
  • 4 wrinkled and yellow peas
  • 12 wrinkled and green peas

Step 1: Calculate the expected frequencies

To calculate the expected values, you can make a Punnett square. If the two genes are unlinked, the probability of each genotypic combination is equal.

RYryRyrYRYRRYYRrYyRRYyRrYYryRrYyrryyRryyrrYyRyRRYyRryyRRyyRrYyrYRrYYrrYyRrYyrrYY

The expected phenotypic ratios are therefore 9 round and yellow: 3 round and green: 3 wrinkled and yellow: 1 wrinkled and green.

From this, you can calculate the expected phenotypic frequencies for 100 peas:

PhenotypeObservedExpectedRound and yellow78100 * (9/16) = 56.25Round and green6100 * (3/16) = 18.75Wrinkled and yellow4100 * (3/16) = 18.75Wrinkled and green12100 * (1/16) = 6.21

Step 2: Calculate chi-square

PhenotypeObservedExpectedO − E(O − E)2(O − E)2 / ERound and yellow7856.2521.75473.068.41Round and green618.75−12.75162.568.67Wrinkled and yellow418.75−14.75217.5611.6Wrinkled and green126.215.7933.525.4

Χ2 = 8.41 + 8.67 + 11.6 + 5.4 = 34.08

Step 3: Find the critical chi-square value

Since there are four groups (round and yellow, round and green, wrinkled and yellow, wrinkled and green), there are three degrees of freedom.

For a test of significance at α = .05 and df = 3, the Χ2 critical value is 7.82.

Step 4: Compare the chi-square value to the critical value

Χ2 = 34.08

Critical value = 7.82

The Χ2 value is greater than the critical value.

Step 5: Decide whether the reject the null hypothesis

The Χ2 value is greater than the critical value, so we reject the null hypothesis that the population of offspring have an equal probability of inheriting all possible genotypic combinations. There is a significant difference between the observed and expected genotypic frequencies (p < .05).

The data supports the alternative hypothesis that the offspring do not have an equal probability of inheriting all possible genotypic combinations, which suggests that the genes are linked

When investigating the relationship between two or more numeric variables, it is important to know the difference between correlation and regression. The similarities/differences and advantages/disadvantages of these tools are discussed here along with examples of each.

Correlation quantifies the direction and strength of the relationship between two numeric variables, X and Y, and always lies between -1.0 and 1.0. Simple linear regression relates X to Y through an equation of the form Y = a + bX.

Which of the following statements is true for correlation analysis it is a bivariate analysis it is a multivariate analysis it is a univariate analysis both A and C?

Key similarities 

  • Both quantify the direction and strength of the relationship between two numeric variables.
  • When the correlation (r) is negative, the regression slope (b) will be negative. 
  • When the correlation is positive, the regression slope will be positive. 
  • The correlation squared (r2 or R2) has special meaning in simple linear regression. It represents the proportion of variation in Y explained by X.

Key differences 

  • Regression attempts to establish how X causes Y to change and the results of the analysis will change if X and Y are swapped. With correlation, the X and Y variables are interchangeable.
  • Regression assumes X is fixed with no error, such as a dose amount or temperature setting. With correlation, X and Y are typically both random variables*, such as height and weight or blood pressure and heart rate. 
  • Correlation is a single statistic, whereas regression produces an entire equation.

 

Prism helps you save time and make more appropriate analysis choices. Try Prism for free.

 

*The X variable can be fixed with correlation, but confidence intervals and statistical tests are no longer appropriate. Typically, regression is used when X is fixed.

Learn more about correlation vs regression analysis with this video by 365 Data Science

Key advantage of correlation

  • Correlation is a more concise (single value) summary of the relationship between two variables than regression. In result, many pairwise correlations can be viewed together at the same time in one table. 

Key advantage of regression

  • Regression provides a more detailed analysis which includes an equation which can be used for prediction and/or optimization. 

 

Correlation Example

As an example, let’s go through the Prism tutorial on correlation matrix which contains an automotive dataset with Cost in USD, MPG, Horsepower, and Weight in Pounds as the variables. Instead of just looking at the correlation between one X and one Y, we can generate all pairwise correlations using Prism’s correlation matrix. If you don’t have access to Prism, download the free 30 day trial here. These are the steps in Prism:

  1. Open Prism and select Multiple Variables from the left side panel.
  2. Choose Start with sample data to follow a tutorial and select Correlation matrix.
  3. Click Create. 
  4. Click Analyze.
  5. Select Multiple variable analyses > Correlation matrix.
  6. Click OK twice.
  7. On the left side panel, double click on the graph titled Pearson r: Correlation of Data 1. 

Which of the following statements is true for correlation analysis it is a bivariate analysis it is a multivariate analysis it is a univariate analysis both A and C?

The Prism correlation matrix displays all the pairwise correlations for this set of variables.

  • The red boxes represent variables that have a negative relationship.
  • The blue boxes represent variables that have a positive relationship
  • The darker the box, the closer the correlation is to negative or positive 1. 
  • Ignore the dark blue diagonal boxes since they will always have a correlation of 1.00. 

Key findings: 

  • Horsepower and MPG have a strong negative relationship (r = -0.74), higher horsepower cars have lower MPG.
  • Horsepower and cost have a strong positive relationship (r = 0.88), higher horsepower cars cost more.

Note that the matrix is symmetric. For example, the correlation between “weight in pounds” and “cost in USD” in the lower left corner (0.52) is the same as the correlation between “cost in USD” and “weight in pounds” in the upper right corner (0.52). This reinforces the fact that X and Y are interchangeable with regard to correlation. The correlations along the diagonal will always be 1.00 and a variable is always perfectly correlated with itself.

When interpreting correlations, you should be aware of the four possible explanations for a strong correlation:

  • Changes in the X variable causes a change the value of the Y variable.
  • Changes in the Y variable causes a change the value of the X variable.
  • Changes in another variable influence both X and Y.
  • X and Y don’t really correlate at all, and you just happened to observe such a strong correlation by chance. The P value quantifies the likelihood that this could occur.
Regression Example

The strength of UV rays varies by latitude. The higher the latitude, the less exposure to the sun, which corresponds to a lower skin cancer risk. So where you live can have an impact on your skin cancer risk. Two variables, cancer mortality rate and latitude, were entered into Prism’s XY table. The Prism graph (right) shows the relationship between skin cancer mortality rate (Y) and latitude at the center of a state (X). It makes sense to compute the correlation between these variables, but taking it a step further, let’s perform a regression analysis and get a predictive equation.

Which of the following statements is true for correlation analysis it is a bivariate analysis it is a multivariate analysis it is a univariate analysis both A and C?

The relationship between X and Y is summarized by the fitted regression line on the graph with equation: mortality rate = 389.2 - 5.98*latitude.  Based on the slope of -5.98, each 1 degree increase in latitude decreases deaths due to skin cancer by approximately 6 per 10 million people.

Since regression analysis produces an equation, unlike correlation, it can be used for prediction. For example, a city at latitude 40 would be expected to have 389.2 - 5.98*40 = 150 deaths per 10 million due to skin cancer each year.Regression also allows for the interpretation of the model coefficients:

  • Slope: every one degree increase in latitude decreases mortality by 5.98 deaths per 10 million. 
  • Intercept: at 0 degrees latitude (Equator), the model predicts 389.2 deaths per 10 million. Although, since there are no data at the intercept, this prediction relies heavily on the relationship maintaining its linear form to 0.

 

Improve your linear regression with Prism. Start your free trial today.

 

Summary and Additional Information

In summary, correlation and regression have many similarities and some important differences. Regression is primarily used to build models/equations to predict a key response, Y, from a set of predictor (X) variables. Correlation is primarily used to quickly and concisely summarize the direction and strength of the relationships between a set of 2 or more numeric variables. 

The table below summarizes the key similarities and differences between correlation and regression.

Topic

Correlation

Regression

When to use

For a quick and simple summary of the direction and strength of pairwise relationships between two or more numeric variables. 

To predict, optimize, or explain a numeric response Y from X, a numeric variable thought to influence Y.

Which of the following statements is true for correlation analysis?

The correct answer is d. The correlation value between the given two variables denotes the strength and direction of the linear relationship between them. Its value always lies between -1 and 1.

Is correlation analysis is a univariate analysis?

Correlation analysis is a simple and useful univariate method to test whether two variables are related.

What is true about correlation in statistics?

Correlation is a statistical measure (expressed as a number) that describes the size and direction of a relationship between two or more variables. A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable.

What is correlation Mcq?

Correlation is a statistical tool that shows the association between two variables. Regression, on the other hand, evaluates the relationship between an independent and a dependent variable.