Understanding The Log-Linear Regression Model

Linear regression model is one of the most widely used statistical/machine learning model, because of its simplicity for model interpretation. This post will discuss about how to interpret a linear regression model after transformations. Let’s consider the simple linear regression model first:

y = \beta x + \alpha + \epsilon,  \quad  (1)

where the coefficient \beta gives us directly the change in y for a one-unit change in x. No additional interpretation is required beyond the estimation of \beta .

However, in certain scenarios, transformation is necessary to make the model meaningful. For example, one important assumption under the linear regression model is the normality of the error term \epsilon in (1). If the observations of the response y is skewed (e.g., house price and travel expenses, etc.), then a log transformation is often applied to make it closer to normal. Another reason to apply a log transformation could be the inherent relationship between x and y is just not linear. The figure below shows how the log transformation is used to convert a highly skewed distribution to a close to normal distribution.

Left: histogram of y which is highly skewed; right: histogram of log(y) which is close to normal

When the response y is log-transformed, (1) becomes the so called log-linear model:

log(y) = \beta x + \alpha + \epsilon.   \quad (2)

Note that the log(-) above means the natural log. In this case, what does the value of \beta mean? Let’s consider two observations (x_1, y_1) and (x_1+1, y_2) :

log(y_2)-log(y_1)=\beta (x_1+1) - \beta x_1 + \alpha - \alpha \\[8pt] \Rightarrow log(\frac{y_2}{y_1}) = \beta \\[8pt] \Rightarrow \frac{y_2}{y_1} = e^{\beta} \\[8pt] \Rightarrow \frac{y_2-y_1}{y_1}=e^{\beta}-1 \approx \beta,  \quad when \quad \beta \rightarrow 0.  \quad (3)

Every unit change in x would result in (e^{\beta}-1)\cdot 100\% change in the response y. In addition, when the estimated coefficient \beta is small (close to zero), the percent change in y is approximately \beta \cdot 100\% . For example, if \beta=0.05, e^{0.05} \approx 1.05 , then 1-unit change in x would result in 5% change in y.

In conclusion, coefficient \beta in the log-linear model represents a relative change in the response y for every unit change in x.