R Programming for Multiple Linear Regression Analysis

In multiple linear regression with R, we step it up from simple linear regression by adding more than one independent variable. Using the LM function, we include multiple variables to predict the dependent variable. With the summary function, we can check the significance and correlation between variables. We’ll show even more complex regression in future videos. Stay tuned! 📈📊 #DataScience #RProgramming

In this video, we will discuss multiple linear regression with R. While simple linear regression involves only one explained variable, multiple linear regression is used when there are more than one explanatory variable. We have previously discussed polynomial regression, which is a special case of multiple linear regression. In R, we can use the LM function to perform multiple linear regression. The dependent variable is on the left side of the tilde symbol, while the independent variables are included on the right side, separated by a plus sign.

Performing Multiple Linear Regression in R 🧮

To demonstrate how to perform multiple linear regression, we will create a data frame with five columns representing different states in the USA: murder, population, illiteracy, income, and FL. Before carrying out the linear regression, we first view the correlation between the variables using the correlation matrix and plot matrix.

VariableMurderPopulationIlliteracyIncomeFL
Murder1-0.23010.704-0.4130.199
Population-0.23011-0.106-0.264-0.051
Illiteracy0.704-0.1061-0.4480.318
Income-0.413-0.264-0.4481-0.117
FL0.199-0.0510.318-0.1171

From the correlation matrix, we can see that murder and illiteracy have a positive correlation of 0.704, while income and murder have a negative correlation of -0.413. The plot matrix shows the density curves for each variable and the single linear regression fit lines between the variable pairs.

Next, we include the four independent variables in the multiple linear regression using the LM function: fit <- LM(Murder ~ Population + Illiteracy + Income + FL). We then use the summary function to view the statistics in the results.

Viewing the Results with Summary Function 📊

The summary function lists the statements that we have run, followed by the residuals with their minimum, first quantile, median, third quantile, and maximum values. The coefficient estimates are then listed, including the intercept and the coefficients for each independent variable. These estimates represent the increase in the dependent variable with respect to a one-unit change in the independent variable, holding all other independent variables constant.

VariableCoefficient EstimateStandard ErrorT ValueP Value
Intercept0.8281.0290.8050.424
Population-0.0050.002-2.2380.029*
Illiteracy4.1460.6106.7910.000***
Income-0.0690.021-3.3030.002**
FL-0.2910.117-2.4850.016*

From the coefficient estimates, we can see that illiteracy has the largest coefficient, indicating that an increase in illiteracy by one unit is associated with an increase in murder rate by 4.146 units, holding all other variables constant. Population and income are not significant, while FL has a negative coefficient, indicating that an increase in FL is associated with a decrease in murder rate. The results also show the residual standard error, multiple r², and the F statistic

About the Author

About the Channel:

Share the Post:
en_GBEN_GB