# Regression analysis

The objective of his study is to create a regression model for predicting the percentage wining of a basketball team among many basketball teams in a particular basketball season.

Regression analysis is a method that aids us in predicting the outcome of a variable, given the values of one or more other (independent) variables. The model thus obtained is examined to ascertain the reliability of its prediction. In our analysis, therefore, we are out to examine a multiple regression model that we shall build, and improve on it until we find the best model for the job.

We are motivated by the fact that fans of teams every now and then go into arguments (and even betting) about what chance there is for a particular team to win. Winning a game, we believe, is not entirely a chance occurrence.

We therefore want to investigate what factors can be expected to determine the winning chance of a team. We do not expect to get a magical model, but that we will have to modify our model until its predictive ability has been greatly improved.

The importance of this work lies in the fact that, without accurate knowledge of the most influential factors affecting a phenomenon, one may end up spending a lot of resources (time, energy and money) on a factor that might not be so important, at the expense of the really important factors.

This results in a lot of input with no corresponding output, thereby leading to frustration. This can be especially true in sports and related activities. This work is our little contribution to more efficient planning and sport outing for a basketball team.

The data that we have used is taken from ……… It presents the statistics for sixty-eight (68) teams in a sporting season. Therefore we shall not be going into issues of time series or other techniques that come into play when dealing with data that has been collected over an extended period.

The data presents a list of 68 basketball teams. Each team has played a number of games in a particular basketball sporting season. The spreadsheet contains a lot of information on these 68 teams, such as their winning percentage and vital statistics of the games played in this particular season.

In this work, we are going to designate a dependent variable (Y) and seven independent variables (X1, X2, X3, X4, X5, X6 and X7). The variables are defined as follows:

Y = Winning Percentage

X1 = Opponent’s 3-point per game

X2 = Team’s 3-point per game

X3 = Team’s free throws pr game

X4 = Team’s turnover per game

X5 = Opponent’s turnover per game

X6 = Team’s rebound per game

X7 = Opponent’s rebound per game

With the above variables, we shall formulate a regression model for the winning percentage of a team in this data.

6.1 Preliminaries

Our first task, having obtained the data, is to examine the descriptive statistics for each of our independent variables. The Minitab result is presented in Appendix I. The data appears to be normally distributed, since the mean and median are close. To further verify this, we will look at the box plots for each of the variables.

The box plots reveal that the data is normally distributed, except for “turnover per game” and “opponent turnover per game” with one outlier each, and “home rebound per game” with three outliers. The Box plots are presented in Appendix II.

To further understand our data, we still look at the scatter plots of each variable against the winning percentage. This will show us the extent to which each of then influence the winning percentage. Although this is not the final regression model, it presents us with marginal regression relationships between each variable and the winning percentage. The details of the results are presented in Appendix III.

The marginal regressions reveal that some of the variables are more influential to the winning percentage than others, but we note that this is not the final regression model yet. On close examination, we observe that Opponent’s 3-point per game accounts for very little of the chances of winning a game, and in fact is negatively correlated with percentage wins of a team.

A similar case arises concerning Team’s turnover per game, only that the relationship is even weaker here. The same goes for Team’s rebound per game. The rest exhibit a positive correlation. The strongest correlation observable from the scatter plots is that of Team’s free throws per game, and the weakest positive correlation is that of Opponent’s turnover per game.