Understanding The Log-Linear Regression Model

Linear regression model is one of the most widely used statistical/machine learning model, because of its simplicity for model interpretation. This post will discuss about how to interpret a linear regression model after transformations. Let’s consider the simple linear regression model first:

$y = \beta x + \alpha + \epsilon, \quad (1)$

where the coefficient $\beta$ gives us directly the change in y for a one-unit change in x. No additional interpretation is required beyond the estimation of $\beta$.

However, in certain scenarios, transformation is necessary to make the model meaningful. For example, one important assumption under the linear regression model is the normality of the error term $\epsilon$ in (1). If the observations of the response y is skewed (e.g., house price and travel expenses, etc.), then a log transformation is often applied to make it closer to normal. Another reason to apply a log transformation could be the inherent relationship between x and y is just not linear. The figure below shows how the log transformation is used to convert a highly skewed distribution to a close to normal distribution.

When the response y is log-transformed, (1) becomes the so called log-linear model:

$log(y) = \beta x + \alpha + \epsilon. \quad (2)$

Note that the log(-) above means the natural log. In this case, what does the value of $\beta$ mean? Let’s consider two observations $(x_1, y_1)$ and $(x_1+1, y_2)$:

$log(y_2)-log(y_1)=\beta (x_1+1) - \beta x_1 + \alpha - \alpha \\[8pt] \Rightarrow log(\frac{y_2}{y_1}) = \beta \\[8pt] \Rightarrow \frac{y_2}{y_1} = e^{\beta} \\[8pt] \Rightarrow \frac{y_2-y_1}{y_1}=e^{\beta}-1 \approx \beta, \quad when \quad \beta \rightarrow 0. \quad (3)$

Every unit change in x would result in $(e^{\beta}-1)\cdot 100\%$ change in the response y. In addition, when the estimated coefficient $\beta$ is small (close to zero), the percent change in y is approximately $\beta \cdot 100\%$. For example, if $\beta=0.05, e^{0.05} \approx 1.05$, then 1-unit change in x would result in 5% change in y.

In conclusion, coefficient $\beta$ in the log-linear model represents a relative change in the response y for every unit change in x.