Definition

“Regression is a process by which we estimate one of the variables, which is dependent variable, on the basis of another variable, which is independent .”

Dependent Variable

Dependent variable is the one which is intended to be estimate or predicted is referred as dependent variable. Dependent variable is also known as regressand, predicted variable, responsive variable or explained variable. The dependent variable whose values are determined on the basis of the independent variable is called random variable. or

Independent Variable

The variable on the basis of which dependent variable is predicted estimated is called the independent variable. The independent variable is also called regressor, predictor, regression variable or explanatory variable. The values of independent variable are chosen by the experimenter and are assumed to be as fixed.

For Example

If we want to estimate the heights of children on the basis of their ages, than the height will be the dependent variable and age will be the independent variable. Similarly, if we want to calculate yield of crop on the basis of amount of fertilizer used, than the yield will be dependent variable and the amount of fertilizer used will be independent variable.

Explanation

If X is the independent variable and Y is the independent variable, then the relation between the two variables will be described by a straight line. Mathematically, straight line will be represented as follows;

clip_image002

The above equation is called linear equation. It is the usual practice to express relation between the variables in the form of equation. Such an equation connects the two variables. To solve such equation, first of all we need to collect data showing corresponding value of the variables, which will be further be considered for solving the problem. Suppose X and Y denote the respectively the heights and weights of students. Then a sample of n individuals would give the heights as clip_image004 and the corresponding weights will be asclip_image006.

If Y is to be estimated on the basis of X by means of linear equation, or any other equation, than we shall call the linear equation as the regression equation of Y on X. Whereas, if we intended to estimate X on the basis of Y, than the linear equation shall be called the regression equation of X on Y.When two variables are involved in the process of regression than the process shall be referred as simple regression. If there are three or more variables are involved in the regression process than the process shall be named as multiple regressions.

Properties of Relationship between Two Variables:

The relationship between two variables is divided into four categories, which as follows;

i. Positive Linear Relationship

Graph 1

clip_image007

ii. Negative Linear Relationship

Graph 2

clip_image008

iii. Non-linear Relationship

Graph 3

clip_image009

iv. No Relationship

Graph 4

clip_image010

Calculation of Regression

The problem of regression is solved by the method of least square. The method of least square is defined as follows;

“Of all the possible cures or lines which can be drawn to approximate a given set of data, the best fitting curve or line is the one for which the sum of squares of deviations, which is the vertical distances between the observed values and the corresponding estimated values, is least.”

It is the requirement of the least square method that the sum of squares of the vertical deviations, the distances, from the points to the curve or line is minimum. Therefore, if clip_image012 represents the vertical deviations from the points to the line A, as shown in the following Graph 5, then the curve or line has the property that clip_image014 is minimum. Thus a line having this property is called least square line. A curve bearing this property is perfect for the least square sense and is called a least square curve.

Graph 5

clip_image015

In the definition of least squares as provided above, we use Y for the dependent variable and X for the independent variable. Conversely, if X is used as dependent variable and Y is used as independent variable, then the definition of least square provided will be modified and we consider horizontal deviations instead of vertical deviations.

Since the regression is represented by the linear equation, therefore, the equation of least square is also the linear equation. i.e. clip_image017. The normal equations for regressions are as follows;

clip_image019………. Eq. (i)

clip_image021……. Eq. (ii)

Where, a and b are the constants. The above first equation is obtained by multiplying both sides of linear equation by ∑ and then summing both sides of the equation.

i.e.

clip_image023

clip_image025

The second equation is obtained by multiplying the linear equation by X and them summing both sides of the equation.

i.e.

clip_image027

clip_image029

By solving Eq. (i) and (ii) simultaneously we the equations for solving the constants a and b, which are provided below;

clip_image031

clip_image033

Contrary, if the variable X is taken as dependent variable and variable Y is taken independent variable, then the above equation shall be modified. The linear equation or the least square equation will become as;

clip_image035

The normal equations will become as;

clip_image037

clip_image039

And the equations for constants c and d will become as;

clip_image041

clip_image043

For Example:

Find the linear equation clip_image017[1]. For the following values of X and Y. Find the estimated values of clip_image045 and show that clip_image047, by using least square method.

X 1 2 3 4 5
Y 2 5 6 8 9

The following Table 20 will be used in solving the problem.

Table 20

X Y XY X2 Y’ D = Y – Y’
1 2 2 1 2.6 -0.6
2 5 10 4 4.3 0.7
3 6 18 9 6 0
4 8 32 16 7.7 0.3
5 9 45 25 9.4 -0.4
∑X = 15 ∑Y = 30 ∑XY = 107 ∑X2 = 55 ∑Y’ = 30 ∑D = 0

n = 5

Using the normal equations;

clip_image049

clip_image051

Putting the values from Table 2o into the above equations, we will get the following;

clip_image053

clip_image055

To solve the above Eq. 1 and Eq. 2 simultaneously, we will multiply the Eq. 1 by 3 and subtract it from Eq. 2

clip_image057

clip_image058clip_image060

clip_image062

clip_image064

Substituting the value b in Eq. 2

clip_image066

clip_image068

clip_image070

clip_image072

clip_image074

By solving the equations simultaneously we get a = 0.9 and b = 1.7

Thus the equation of least square line is clip_image076

For Y’ we put the values of X, given in table, in the least square equation. Like, for X = 1, Y = 0.9 +1.7(1) =2.6, and so on. D stands for deviations. Deviations of Y from the estimated values of Y’ are shown in the Table 20 which are added to give 0.

For Example:

Find the least square line of regression for the following values of X and Y, taking

i. X as independent variable

ii. Y as independent variable
X 1 3 5 6 7 9 10 13
Y 1 2 5 5 6 7 7 8

For obtaining the least square line of regression we will use the following Table 21

Table 21

X Y XY X2 Y2
1 1 1 1 1
3 2 6 9 4
5 5 25 25 25
6 5 30 36 25
7 6 42 49 36
9 7 63 81 49
10 7 70 100 49
13 8 104 169 64
∑X = 54 ∑Y = 41 ∑XY = 341 ∑X2 = 470 ∑Y2 = 263

n = 8

We will solve the problem by using the formulas for constant a, b, c and d.

i.

clip_image078

clip_image080

clip_image082

clip_image084

clip_image086

clip_image088

clip_image090

clip_image092

clip_image094

clip_image096

For X as independent variable, the least square line for regression will be clip_image098

ii.

clip_image100

clip_image102

clip_image104

clip_image106

clip_image108

clip_image110

clip_image112

clip_image114

clip_image116

clip_image118

For Y as independent variable, the least square line of regression is clip_image120