linear regression assignment pdf

Shalabh [email protected] [email protected] Department of Mathematics & Statistics Indian Institute of Technology Kanpur , Kanpur - 208016 ( India )

MTH 416 : Regression Analysis

Syllabus: Simple and multiple linear regression, Polynomial regression and orthogonal polynomials, Test of significance and confidence intervals for parameters. Residuals and their analysis for test of departure from the assumptions such as fitness of model, normality, homogeneity of variances, detection of outliers, Influential observations, Power transformation of dependent and independent variables. Problem of multicollinearity, ridge regression and principal component regression, subset selection of explanatory variables, Mallow's Cp statistic. Nonlinear regression, different methods for estimation (Least squares and Maximum likelihood), Asymptotic properties of estimators. Generalised Linear Models (GLIM), Analysis of binary and grouped data using logistic and log-linear models.

Grading Scheme : Quizzes: 20%, Mid semester exam: 30%, End semester exam: 50%

Books: 1. Introduction to Linear Regression Analysis by Douglas C. Montgomery, Elizabeth A. Peck, G. Geoffrey Vining (Wiley), Low price Indian edition is available.

2. Applied Regression Analysis by Norman R. Draper, Harry Smith (Wiley), Low price Indian edition is available.

3. Linear Models and Generalizations - Least Squares and Alternatives by C.R. Rao, H. Toutenburg, Shalabh, and C. Heumann (Springer, 2008)

4. A Primer on Linear Models by John F. Monahan (CRC Press, 2008)

5. Linear Model Methodology by Andre I. Khuri (CRC Press, 2010)

Assignaments:

Assignment 1

Assignment 2

Assignment 3

Assignment 4

Assignment 5

Assignment 6

Assignment 7

Assignment 8

Lecture notes for your help (If you find any typo, please let me know)

Lecture Notes 1 : Introduction

Lecture Notes 2 : Simple Linear Regression Analysis

Lecture Notes 3 : Multiple Linear Regression Model

Lecture Notes 4 : Model Adequacy Checking

Lecture Notes 5 : Transformation and Weighting to Correct Model Inadequacies

Lecture Notes 6 : Diagnostic for Leverage and Influence

Lecture Notes 7 : Generalized and Weighted Least Squares Estimation

Lecture Notes 8 : Indicator Variables

Lecture Notes 9 : Multicollinearity

Lecture Notes 10 : Heteroskedasticity

Lecture Notes 11 : Autocorrelation

Lecture Notes 12 : Polynomial Regression Models

Lecture Notes 13 : Variable Selection and Model Building

Lecture Notes 14 : Logistic Regression Models

Lecture Notes 15 : Poisson Regression Models

Lecture Notes 16 : Generalized Linear Models

Browse Course Material

Course info.

Prof. Dimitris Bertsimas

Departments

Sloan School of Management

As Taught In

Operations Management
Probability and Statistics

Learning Resource Types

The analytics edge, 2 linear regression.

2.1 Welcome to Unit 2

2.1.1 Welcome to Unit 2

2.2 The Statistical Sommelier: An Introduction to Linear Regression

2.2.1 Video 1: Predicting the Quality of Wine
2.2.2 Quick Question
2.2.3 Video 2: One-Variable Linear Regression
2.2.4 Quick Question
2.2.5 Video 3: Multiple Linear Regression
2.2.6 Quick Question
2.2.7 Video 4: Linear Regression in R
2.2.8 Quick Question
2.2.9 Video 5: Understanding the Model
2.2.10 Quick Question
2.2.11 Video 6: Correlation and Multicollinearity
2.2.12 Quick Question
2.2.13 Video 7: Making Predictions
2.2.14 Quick Question
2.2.15 Video 8: Comparing the Model to the Experts

2.3 Moneyball: The Power of Sports Analytics

2.3.1 A Quick Introduction to Baseball
2.3.2 Video 1: The Story of Moneyball
2.3.3 Video 2: Making it to the Playoffs
2.3.4 Quick Question
2.3.5 Video 3: Predicting Runs
2.3.6 Quick Question
2.3.7 Video 4: Using the Models to Make Predictions
2.3.8 Quick Question
2.3.9 Video 5: Winning the World Series
2.3.10 Quick Question
2.3.11 Video 6: The Analytics Edge in Sports
2.3.12 Quick Question

2.4 Playing Moneyball in the NBA (Recitation)

2.4.1 Welcome to Recitation 2
2.4.2 Video 1: The Data
2.4.3 Video 2: Playoffs and Wins
2.4.4 Video 3: Points Scored
2.4.5 Video 4: Making Predictions

2.5 Assignment 2

2.5.1 Climate Change
2.5.2 Reading Test Scores
2.5.3 Detecting Flu Epidemics via Search Engine Query Data
2.5.4 State Data

Back: 1.5 Assignment Internet Privacy Poll

Welcome to Unit 2

Download video
Download transcript

Video 1: Predicting the Quality of Wine

The slides from all videos in this Lecture Sequence can be downloaded here: Introduction to Linear Regression (PDF - 1.3MB) .

Continue: Quick Question

Introduction to Baseball Video

If you are unfamiliar with the game of baseball, please watch this short video clip for a quick introduction to the game. You don’t need to be a baseball expert to understand this lecture, but basic knowledge of the game will be helpful to you.

TruScribe. “Baseball Rules of Engagement.” March 27, 2012. YouTube. This video is from TrueScribeVideos and is not covered by our Creative Commons license .

Back: Video 8: Comparing the Model to the Experts
Continue: Video 1: The Story of Moneyball

Welcome to Recitation 2

Back: Quick Question
Continue: Video 1: The Data

Climate Change

There have been many studies documenting that the average global temperature has been increasing over the last century. The consequences of a continued rise in global temperature will be dire. Rising sea levels and an increased frequency of extreme weather events will affect billions of people.

In this problem, we will attempt to study the relationship between average global temperature and several other factors.

The file climate_change (CSV) contains climate data from May 1983 to December 2008. The available variables include:

Year : the observation year.
Month : the observation month.
Temp : the difference in degrees Celsius between the average global temperature in that period and a reference value. This data comes from the Climatic Research Unit at the University of East Anglia .
CO2 , N2O , CH4 , CFC.11 , CFC.12 : atmospheric concentrations of carbon dioxide (CO2), nitrous oxide (N2O), methane (CH4), trichlorofluoromethane (CCl3F; commonly referred to as CFC-11) and dichlorodifluoromethane (CCl2F2; commonly referred to as CFC-12), respectively. This data comes from the ESRL/NOAA Global Monitoring Division .
CO2, N2O and CH4 are expressed in ppmv (parts per million by volume – i.e., 397 ppmv of CO2 means that CO2 constitutes 397 millionths of the total volume of the atmosphere)
CFC.11 and CFC.12 are expressed in ppbv (parts per billion by volume).
Aerosols : the mean stratospheric aerosol optical depth at 550 nm. This variable is linked to volcanoes, as volcanic eruptions result in new particles being added to the atmosphere, which affect how much of the sun’s energy is reflected back into space. This data is from the Godard Institute for Space Studies at NASA .
TSI : the total solar irradiance (TSI) in W/m2 (the rate at which the sun’s energy is deposited per unit area). Due to sunspots and other solar phenomena, the amount of energy that is given off by the sun varies substantially with time. This data is from the SOLARIS-HEPPA project website .
MEI : multivariate El Nino Southern Oscillation index (MEI), a measure of the strength of the El Nino/La Nina-Southern Oscillation (a weather effect in the Pacific Ocean that affects global temperatures). This data comes from the ESRL/NOAA Physical Sciences Division .

Problem 1.1 - Creating Our First Model

We are interested in how changes in these variables affect future temperatures, as well as how well these variables explain temperature changes so far. To do this, first read the dataset climate_change.csv into R.

Then, split the data into a training set , consisting of all the observations up to and including 2006, and a testing set consisting of the remaining years (hint: use subset). A training set refers to the data that will be used to build the model (this is the data we give to the lm() function), and a testing set refers to the data we will use to test our predictive ability.

Next, build a linear regression model to predict the dependent variable Temp, using MEI, CO2, CH4, N2O, CFC.11, CFC.12, TSI, and Aerosols as independent variables ( Year and Month should NOT be used in the model). Use the training set to build the model.

Enter the model R2 (the “Multiple R-squared” value):

Numerical Response

Explanation

First, read in the data and split it using the subset command:

climate = read.csv(“climate_change.csv”)

train = subset(climate, Year <= 2006)

test = subset(climate, Year > 2006)

Then, you can create the model using the command:

climatelm = lm(Temp ~ MEI + CO2 + CH4 + N2O + CFC.11 + CFC.12 + TSI + Aerosols, data=train)

Lastly, look at the model using summary(climatelm). The Multiple R-squared value is 0.7509.

CheckShow Answer

Problem 1.2 - Creating Our First Model

Which variables are significant in the model? We will consider a variable signficant only if the p-value is below 0.05. (Select all that apply.)

If you look at the model we created in the previous problem using summary(climatelm), all of the variables have at least one star except for CH4 and N2O. So MEI, CO2, CFC.11, CFC.12, TSI, and Aerosols are all significant.

Problem 2.1 - Understanding the Model

Current scientific opinion is that nitrous oxide and CFC-11 are greenhouse gases: gases that are able to trap heat from the sun and contribute to the heating of the Earth. However, the regression coefficients of both the N2O and CFC-11 variables are negative , indicating that increasing atmospheric concentrations of either of these two compounds is associated with lower global temperatures.

Which of the following is the simplest correct explanation for this contradiction?

Climate scientists are wrong that N2O and CFC-11 are greenhouse gases - this regression analysis constitutes part of a disproof.

There is not enough data, so the regression coefficients being estimated are not accurate.

All of the gas concentration variables reflect human development - N2O and CFC.11 are correlated with other variables in the data set.

The linear correlation of N2O and CFC.11 with other variables in the data set is quite large. The first explanation does not seem correct, as the warming effect of nitrous oxide and CFC-11 are well documented, and our regression analysis is not enough to disprove it. The second explanation is unlikely, as we have estimated eight coefficients and the intercept from 284 observations.

Problem 2.2 - Understanding the Model

Compute the correlations between all the variables in the training set. Which of the following independent variables is N2O highly correlated with (absolute correlation greater than 0.7)? Select all that apply.

Which of the following independent variables is CFC.11 highly correlated with? Select all that apply.

You can calculate all correlations at once using cor(train) where train is the name of the training data set.

Problem 3 - Simplifying the Model

Given that the correlations are so high, let us focus on the N2O variable and build a model with only MEI, TSI, Aerosols and N2O as independent variables. Remember to use the training set to build the model.

Enter the coefficient of N2O in this reduced model:

(How does this compare to the coefficient in the previous model with all of the variables?)

Enter the model R2:

We can create this simplified model with the command:

LinReg = lm(Temp ~ MEI + N2O + TSI + Aerosols, data=train)

You can get the coefficient for N2O and the model R-squared by typing summary(LinReg).

We have observed that, for this problem, when we remove many variables the sign of N2O flips. The model has not lost a lot of explanatory power (the model R2 is 0.7261 compared to 0.7509 previously) despite removing many variables. As discussed in lecture, this type of behavior is typical when building a model where many of the independent variables are highly correlated with each other. In this particular problem many of the variables (CO2, CH4, N2O, CFC.11 and CFC.12) are highly correlated, since they are all driven by human industrial development.

Back: Video 4: Making Predictions
Continue: Reading Test Scores

You are leaving MIT OpenCourseWare

Mathematics
Regression Analysis (Video)
Co-ordinated by : IIT Kharagpur
Available from : 2012-07-11
Intro Video
Simple Linear Regression
Simple Linear Regression (Contd.)
Simple Linear Regression (Contd. )
Simple Linear Regression ( Contd.)
Simple Linear Regression ( Contd. )
Multiple Linear Regression
Multiple Linear Regression (Contd.)
Multiple Linear Regression (Contd. )
Multiple Linear Regression ( Contd.)
Selecting the BEST Regression Model
Selecting the BEST Regression Model (Contd.)
Selecting the BEST Regression Model (Contd. )
Selecting the BEST Regression Model ( Contd.)
Multicollinearity
Multicollinearity (Contd.)
Multicollinearity ( Contd.)
Model Adequacy Checking
Model Adequacy Checking (Contd.)
Model Adequacy Checking ( Contd.)
Test for Influential Observations
Transformation and Weighting to correct model inadequacies
Transformation and Weighting to correct model inadequacies (Contd.)
Transformation and Weighting to correct model inadequacies ( Contd.)
Dummy Variables
Dummy Variables (Contd.)
Dummy Variables (Contd. )
Polynomial Regression Models
Polynomial Regression Models (Contd.)
Polynomial Regression Models (Contd. )
Generalized Linear Models
Generalized Linear Models (Contd.)
Non-Linear Estimation
Regression Models with Autocorrelated Errors
Regression Models with Autocorrelated Errors (Contd.)
Measurement Errors and Calibration Problem
Tutorial - I
Tutorial - II
Tutorial - III
Tutorial - IV
Tutorial - V
Watch on YouTube
Assignments
Download Videos
Transcripts
Self Evaluation (3)

Module Name	Description	Download Size
Simple Linear Regression	Please see all questions attached with the last module.	24
Tutorial - V	This is a questionnaire with answers that covers all the modules and could be attempted after listening the full course.	140
Tutorial - V	This is a questionnaire with answers that covers all the modules and could be attempted after listening the full course.	5120

Sl.No	Chapter Name	MP4 Download
1	Simple Linear Regression
2	Simple Linear Regression (Contd.)
3	Simple Linear Regression (Contd. )
4	Simple Linear Regression ( Contd.)
5	Simple Linear Regression ( Contd. )
6	Multiple Linear Regression
7	Multiple Linear Regression (Contd.)
8	Multiple Linear Regression (Contd. )
9	Multiple Linear Regression ( Contd.)
10	Selecting the BEST Regression Model
11	Selecting the BEST Regression Model (Contd.)
12	Selecting the BEST Regression Model (Contd. )
13	Selecting the BEST Regression Model ( Contd.)
14	Multicollinearity
15	Multicollinearity (Contd.)
16	Multicollinearity ( Contd.)
17	Model Adequacy Checking
18	Model Adequacy Checking (Contd.)
19	Model Adequacy Checking ( Contd.)
20	Test for Influential Observations
21	Transformation and Weighting to correct model inadequacies
22	Transformation and Weighting to correct model inadequacies (Contd.)
23	Transformation and Weighting to correct model inadequacies ( Contd.)
24	Dummy Variables
25	Dummy Variables (Contd.)
26	Dummy Variables (Contd. )
27	Polynomial Regression Models
28	Polynomial Regression Models (Contd.)
29	Polynomial Regression Models (Contd. )
30	Generalized Linear Models
31	Generalized Linear Models (Contd.)
32	Non-Linear Estimation
33	Regression Models with Autocorrelated Errors
34	Regression Models with Autocorrelated Errors (Contd.)
35	Measurement Errors and Calibration Problem
36	Tutorial - I
37	Tutorial - II
38	Tutorial - III
39	Tutorial - IV
40	Tutorial - V

Sl.No	Chapter Name	English
1	Simple Linear Regression
2	Simple Linear Regression (Contd.)
3	Simple Linear Regression (Contd. )
4	Simple Linear Regression ( Contd.)
5	Simple Linear Regression ( Contd. )
6	Multiple Linear Regression
7	Multiple Linear Regression (Contd.)
8	Multiple Linear Regression (Contd. )
9	Multiple Linear Regression ( Contd.)	PDF unavailable
10	Selecting the BEST Regression Model	PDF unavailable
11	Selecting the BEST Regression Model (Contd.)	PDF unavailable
12	Selecting the BEST Regression Model (Contd. )	PDF unavailable
13	Selecting the BEST Regression Model ( Contd.)	PDF unavailable
14	Multicollinearity	PDF unavailable
15	Multicollinearity (Contd.)	PDF unavailable
16	Multicollinearity ( Contd.)	PDF unavailable
17	Model Adequacy Checking	PDF unavailable
18	Model Adequacy Checking (Contd.)	PDF unavailable
19	Model Adequacy Checking ( Contd.)	PDF unavailable
20	Test for Influential Observations	PDF unavailable
21	Transformation and Weighting to correct model inadequacies	PDF unavailable
22	Transformation and Weighting to correct model inadequacies (Contd.)	PDF unavailable
23	Transformation and Weighting to correct model inadequacies ( Contd.)	PDF unavailable
24	Dummy Variables	PDF unavailable
25	Dummy Variables (Contd.)	PDF unavailable
26	Dummy Variables (Contd. )	PDF unavailable
27	Polynomial Regression Models	PDF unavailable
28	Polynomial Regression Models (Contd.)	PDF unavailable
29	Polynomial Regression Models (Contd. )	PDF unavailable
30	Generalized Linear Models	PDF unavailable
31	Generalized Linear Models (Contd.)	PDF unavailable
32	Non-Linear Estimation	PDF unavailable
33	Regression Models with Autocorrelated Errors	PDF unavailable
34	Regression Models with Autocorrelated Errors (Contd.)	PDF unavailable
35	Measurement Errors and Calibration Problem	PDF unavailable
36	Tutorial - I	PDF unavailable
37	Tutorial - II	PDF unavailable
38	Tutorial - III	PDF unavailable
39	Tutorial - IV	PDF unavailable
40	Tutorial - V	PDF unavailable

Sl.No	Language	Book link
1	English	Not Available
2	Bengali	Not Available
3	Gujarati	Not Available
4	Hindi	Not Available
5	Kannada	Not Available
6	Malayalam	Not Available
7	Marathi	Not Available
8	Tamil	Not Available
9	Telugu	Not Available

COMMENTS

PDF Chapter 9 Simple Linear Regression
c plot.9.2 Statistical hypothesesFor simple linear regression, the chief null hypothesis is H0 : β1 = 0, and the corresponding alter. ative hypothesis is H1 : β1 6= 0. If this null hypothesis is true, then, from E(Y ) = β0 + β1x we can see that the population mean of Y is β0 for every x value, which t.
PDF Assignment 1: Linear Regression
Sales, TV, radio, and newspaper, rather than in terms of the coe cients of the linear model. 3.4 (p. 120 ISLR) I collect a set of data (n = 100 observations) containing a single predictor and a quantitative response. I then t a linear regression model to the data, as well as a separate cubic regression, i.e. Y = 0 + 1X+ 2X2 + 3X3 + .
PDF Applied Linear Regression
12.2 Regression Models for Counts, 272 12.2.1 Binomial Regression, 272 12.2.2 Deviance, 277 12.3 Poisson Regression, 279 12.3.1 Goodness of Fit Tests, 282 12.4 Transferring What You Know about Linear Models, 283 12.4.1 Scatterplots and Regression, 283
PDF Week 2 { Linear Regression
In general, linear regression is a technique used for modeling and analysis of numerical data. It tries to leverage the information between di erent variables in a way that allows us to infer the value of one given the others. In statistics, prediction can be used for prediction, estimation, hypothesis testing, and modeling ...
PDF Linear and Nonlinear Regression and Classification
1- linear regression. Assume that the data is formed by. yi = wxi+ noisei. where... the noise signals are independent. the noise has a normal distribution with mean 0 and unknown variance σ2. P(y|w,x) has a normal distribution with. mean wx. variance σ2.
PDF Lecture 2: Linear regression
In the case of linear regression, the model simply consists of linear functions. Recall that a linear function of Dinputs is parameterized in terms of Dcoe cients, which we'll call the weights, and an intercept term, which we'll call the bias. Mathematically, this is written as: y= X j w jx j + b: (1) Figure 1 shows two ways to visualize ...
PDF Module 8: Linear Regression
The hypothesis was tested using a bivariate linear regression to determine whether student grades on Assignment 2 could be predicted based on student grades from Assignment 1. Regression analysis revealed that the model significantly predicted Assignment 2 grades based on Assignment 1 grades, F (1, 23) = 18.207, p < .001. R2 for
PDF Lecture 9: Linear Regression
Regression. Technique used for the modeling and analysis of numerical data. Exploits the relationship between two or more variables so that we can gain information about one of them through knowing values of the other. Regression can be used for prediction, estimation, hypothesis testing, and modeling causal relationships.
PDF Ordinary Least Squares Linear Regression
(b) Some potential linear ﬁts to the Income data with the parameterization y = mx + b. Figure 1: Raw data and simple linear functions. There are many diﬀerent loss functions we could come up with to express diﬀerent ideas about what it means to be bad at ﬁtting our data, but by far the most popular one for linear regression is
PDF Machine Learning Linear Regression
Part 1: Motivation (Regression Problems) Part 2: Linear Regression Basics. Part 3: The Cost Function. Part 4: The Gradient Descent Algorithm. Part 5: The Normal Equation. Part 6: Linear Algebra overview. Part 7: Using Octave. Part 8: Using R. Part 9: Using Python.
PDF Introduction to Linear Regression Analysis
el for the delivery time data isy 0(1.2)Equation. (1.2) is called a linear regression model. Customarily x is called the inde-pendent varia. le and y is called the dependent variable. However, this often causes confusion with the concept of statistical independence, so we refer to x as the pre-dictor or regress.
PDF Chapter 1 Introduction Linear Models and Regression Analysis
The regression analysis is a technique which helps in determining the statistical model by using the data on study and explanatory variables. The classification of linear and nonlinear regression analysis is based on the determination of linear and nonlinear models, respectively.
PDF Class Notes Linear Regression and Correlation
Chapter 12 Class Notes - Linear Regression and Correlation We'll skip all of §12.7 and parts of §12.8, and cover the rest. We'll consider the following two illustrations (graphs are below): Example 1 (p.503 #12.3.2): y = drop in body temperature, x = log10(dose of ethanol)
PDF MAT 120 REGRESSION ASSIGNMENT Regression Analysis x
2 MAT 120 REGRESSION ASSIGNMENT A sample of the output created on Mac Numbers is given below. The regression equation y = 6137:4x 478:9 is on the graph. To predict the price of a diamond weighing 0:85 carats, calculate price by plugging in x = 0:85 into the regression equation y = 6137:4(0:85) 478:9 = $4737:89: Weight (carats) Price ($) 0.5 ...
PDF Chapter 2 Simple Linear Regression Analysis The simple linear
The reduced major axis regression method minimizes the sum of the areas of rectangles defined between the observed data points and the nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is shown in the following figure: yi. (xi yi) Y . 0.
PDF LINEAR MODELS IN STATISTICS
11.2.1 A Bayesian Multiple Regression Model with a Conjugate Prior 280 11.2.2 Marginal Posterior Density of b 282 11.2.3 Marginal Posterior Densities of tand s2 284 11.3 Inference in Bayesian Multiple Linear Regression 285 11.3.1 Bayesian Point and Interval Estimates of Regression Coefﬁcients 285 11.3.2 Hypothesis Tests for Regression ...
PDF CSC 411 Lecture 6: Linear Regression
Linear regression is one of only a handful of models in this course that permit direct solution. UofT CSC 411: 06-Linear Regression 16 / 37. Direct solution The minimum must occur at a point where the partial derivatives are zero. @J @w j = 0 @J @b = 0: If @J=@w j 6= 0, you could reduce the cost by changing w j.
PDF Name: Class: Date: ID: A Linear Regression Assignment 1. Year High
a. The data have a linear pattern. As the number of hours studied increases, the math SAT scores improve. b. y = 22.54x + 344.19. c. For every increase of 1 hour in studying, the math SAT score will increase by about 22.54 points. If a student studies for 0 hours, their math SAT score will be 344.19. PTS: 1.
MTH 416 : Regression Analysis
5. Linear Model Methodology by Andre I. Khuri (CRC Press, 2010) Assignaments: Assignment 1. Assignment 2 . Assignment 3 . Assignment 4 . Assignment 5 . Assignment 6 . Assignment 7 . Assignment 8 . Lecture notes for your help (If you find any typo, please let me know) Lecture Notes 1: Introduction. Lecture Notes 2: Simple Linear Regression Analysis
2 Linear Regression
Next, build a linear regression model to predict the dependent variable Temp, using MEI, CO2, CH4, N2O, CFC.11, CFC.12, TSI, and Aerosols as independent variables ( Year and Month should NOT be used in the model). Use the training set to build the model. Enter the model R2 (the "Multiple R-squared" value): Exercise 1.
NPTEL :: Mathematics
Assignments; Download Videos; Transcripts; Books; Self Evaluation (3) Module Name Download Description Download Size; Simple Linear Regression: Self Evaluation: Please see all questions attached with the last module. 24: Tutorial - V: Self Evaluation: ... Multiple Linear Regression ( Contd.) PDF unavailable: 10: Selecting the BEST Regression Model: