# The F Statistic - Managerial Economics

Both the coefficient of determination, R2, and corrected coefficient of determination, – R2, provide evidence on whether or not the proportion of explained variation is relatively “high” or “low.” However, neither tells if the independent variables as a group explain a statistically significant share of variation in the dependent Y variable. The F statistic provides evidence on whether or not a statistically significant proportion of total variation in the dependent variable has been explained. Like – R2 , the F statistic is adjusted for degrees of freedom and is defined as

Once again, n is the number of observations (data points) and k is the number of estimated coefficients (intercept plus the number of slope coefficients). Also like – R2, the F statistic can be calculated in terms of the coefficient of determination, where

The F statistic is used to indicate whether or not a significant share of the variation in the dependent variable is explained by the regression model. The hypothesis actually tested is that the dependent Y variable is unrelated to all of the independent X variables included in the model. If this hypothesis cannot be rejected, the total explained variation in the regression will be quite small. At the extreme, if R2 = 0, then F = 0, and the regression equation provides absolutely no explanation of the variation in the dependent Y variable. As the F statistic increases from zero, the hypothesis that the dependent Y variable is not statistically related to one or more of the regression’s independent X variables becomes easier to reject. At some point, the F statistic becomes sufficiently large to reject the independence hypothesis and warrants the conclusion that at least some of the model’s X variables are significant factors in explaining variation in the dependent Y variable.

The F test is used to determine whether a given F statistic is statistically significant.

Performing F tests involves comparing F statistics with critical values from a table of the F distribution. If a given F statistic exceeds the critical value from the F distribution table, the hypothesis of no relation between the dependent Y variable and the set of independent X variables can be rejected. Taken as a whole, the regression equation can then be seen as explaining significant variation in the dependent Y variable. Critical values for the F distribution are provided at the 10 percent, 5 percent, and 1 percent significance levels in Appendix C. If the F statistic for a given regression equation exceeds the F value in the table, there can be 90 percent, 95 percent, or 99 percent confidence, respectively, that the regression model explains a significant share of variation in the dependent Y variable. The 90 percent, 95 percent, and 99 percent confidence levels are popular for hypothesis rejection, because they imply that a true hypothesis will be rejected only 1 out of 10, 1 out of 20, or 1 out of 100 items, respectively. Such error rates are quite small and typically quite acceptable. Critical F values depend on degrees of freedom related to both the numerator and denominator of Equation. In the numerator, the degrees of freedom equal one less than the number of coefficients estimated in the regression equation (k – 1). The degrees of freedom for the denominator of the F statistic equal the number of data observations minus the number of estimated coefficients (n k). The critical value for F can be denoted as Ff1,f2, where f1, the degrees of freedom for the numerator, equals k – 1, and f2, the degrees of freedom for the denominator, equals n k. For example, the F statistic for the First National Bank example involves f1 = k – 1 = 2 – 1 = 1, and f2 = n k = 12 – 2 = 10 degrees of freedom. Also note that the calculated F1,10 = 120.86 > 10.04, the critical F value for the _ = 0.01 or 99 percent confidence level. This means there is less than a 1 percent chance of observing such a high F statistic when there is in fact no variation in the dependent Y variable explained by the regression model. Alternatively, the hypothesis of no link between the dependent Y variable and the entire group of X variables can be rejected with 99 percent confidence. Given the ability to reject the hypothesis of no relation at the 99 percent confidence level, it will always be possible to reject this hypothesis at the lower 95 percent and 90 percent confidence levels. Because the significance with which the no-relation hypothesis can be rejected is an important indicator of overall model fit, rejection should always take place at the highest possible confidence level.

As a rough rule of thumb, and assuming a typical regression model including four or five independent Xvariables plus an intercept term, a calculated F statistic greater than three permits rejection of the hypothesis that there is no relation between the dependent Y variable and the X variables at the _ = 0.05 significance level (with 95 percent confidence). As seen in Figure, a calculated F statistic greater than five typically permits rejection of the hypothesis that there is no relation between the dependent Y variable and the X variables at the _ = 0.01 significance level (with 99 percent confidence). However, as seen in the earlier discussion, critical F values are adjusted upward when sample size is small in relation to the number of coefficients included in the regression model. In such instances, precise critical F values must be obtained from an F table, such as that found in Appendix C.