Module 7 Discussion:
For your initial post, choose one of the following two prompts to respond to. Then in your two follow up posts, respond at least once in each option. Use the discussion topic as a place to ask questions, speculate about answers, and share insights. Be sure to embed and cite your references for any supporting images.
Option 1:
Think of a problem dealing with two possibly related variables (Y and X) that you may be interested in. Share your problem and discuss why a regression analysis could be appropriate for this problem.
Specifically, what statistical questions are you asking? Why would you want to predict the value of Y? What if you wanted to predict a value of Y that’s beyond the highest value of X (for example if X is time and you want to forecast Y in the future)?
You should describe the data collection process that you are proposing but you do not need to collect any data.
Option 2:
Give an example of a problem dealing with two possibly related variables (Y and X) for which a linear regression model would not be appropriate. For example, the relationship could be curved instead of linear, or there may be no significant correlation at all.
What is the impact of using a linear regression model in this case? What options, other than linear regression, can you see? You do not need to collect any data.
For your response to a classmate (two responses required, one in each option), examine your classmate’s problem to assess the appropriateness and accuracy of using a linear regression model. Discuss the meaning of the standard error of the estimate and how it affects the predicted values of Y for that analysis.
Study Material for this Discussion:
Requirements for a Regression Analysis The calculation of the linear regression line Y = mX + b in Module Six is strictly a mathematical one. The statistical methods that analyze the significance of the correlation between X and Y, however, require specific statistical assumptions. Even when X and Y have a linear relationship, individual values of Y will not be exactly equal to the predicted values mX + b. The difference is called the error, written as ε (the Greek letter epsilon). This error is a random quantity that can be different for every data point. If (x1, y1) is the first data value, then the first error ε1 is: ε1 = y1 − (mx1 + b). There are two main requirements that must be met in order to perform a statistical analysis: 1. X and Y are linearly related with a random deviation (error) affecting each measurement. In regression analysis theory, the population parameter for the slope is usually written as β1 (the Greek letter beta, instead of m) and the population parameter for the intercept is usually written as β0 (instead of b). The linear model is thus: Y = β1 X + β0 + ε 2. The errors ε (one for each data point) are independent of each other and normally distributed with mean 0 and the same standard deviation.
With those two requirements, techniques for hypothesis testing can be used. Hypothesis Tests of the Slope In Module Six, the significance of the correlation coefficient r was tested; however, it is much more common to test the significance of the slope. The formal statement of a hypothesis test for the slope is: H0: β1 = 0 2 MAT 240 Module Seven H1: β1 ≠ 0 The alternative hypothesis is usually two-tailed. The estimate of the slope, b1 (also written as m), has a t distribution with n–2 degrees of freedom (n is the number of data values). The calculation of the standard error of b1, the estimate of β1, can be lengthy; the use of software (StatCrunch or Excel) for the computations is strongly encouraged. As for previous hypothesis tests, this can use either the classical method (critical value) or the p-value method. Rejecting the null hypothesis indicates that there is a significant correlation between X and Y. Not rejecting the null hypothesis indicates that the data is inconclusive. Hypothesis Tests of the Intercept In addition to hypothesis tests of the slope, hypothesis tests of the intercept can be performed. The parameter β0 is the value of Y when X = 0. The value β0 = 0 may be of interest, but other values could be meaningful too. For example, in an economic model where Y is the total cost and X is the number of units produced, β0 can be interpreted as the fixed cost. The formal statement of a hypothesis test for the intercept is: H0: β0 = some number H1: β0 ≠ some number where the number is chosen to be appropriate for the application under study. The calculation of the standard error of b0, the estimate of β0, can be lengthy, and students are strongly encouraged to use software (StatCrunch or Excel) for the computations.
The Multiple Linear Regression Model The linear regression models studied so far use only one independent variable—an X. In real life, there are often multiple independent or explanatory variables that should be considered. A multiple linear regression model is one that uses multiple independent variables (X1, X2, X3, …, Xk) to model one dependent variable (Y). A model that uses only one independent variable is called a simple linear regression. The equation for the multiple linear regression line is: Y = β1X1 + β2X2 + … + βkXk + β0 MAT 240 Module Seven 3 where X1, X2, …, Xk are k independent variables, β1, β2, …, βk are their coefficients (i.e., slopes), and β0 is the intercept. As for simple linear regression, the errors ε (one for each data point) need to be independent of each other and normally distributed with mean 0 and the same standard deviation. Interpreting Multiple Linear Regression Coefficients Analyzing multiple linear regression models is complicated because the variables X1, X2, …, Xk are usually correlated among themselves. The coefficient β1 measures the effect of the variable X1 on Y, but only when the values of X2, X3, …, Xk do not change. Rejecting the null hypothesis of β1 = β2 = … = βk (using either the p-value method or the classical method with a critical value of the F distribution) means that the variables X1, X2, …, Xk, taken as a whole, are significantly correlated with Y. There is no conclusion about any specific variable.