The Granger causality test is a statistical hypothesis test used to determine whether one time series can predict another time series. It assesses whether past values of one variable (the Granger cause) provide any information about future values of another variable, beyond the information contained in past values of that variable alone. In the context of YouTube analysis, we can apply the Granger causality test to determine if past values of comments (the independent variable) can help predict the future number of views (the dependent variable).
Assumptions
- Only works with time series data
- Normal distribution of Errors
- Independent errors
Restricted Model (No Granger Causality)
$Y_t$$=$$α$$+$$\sum$$β_i$$Y_{t-i}$$+$$ε_t$
Unrestricted Model (With Granger Causality)
$Y_t$$=$$α$$+$$\sum$$β_i$$Y_{t-i}$$+$$\sum$$γ_j$$X_{t-j}$$+$$ε_t$
Hypothesis
- H₀: γⱼ = 0 for all j (X does not Granger-cause Y)
- Hₐ: At least one γⱼ ≠ 0 (X does Granger-cause Y)
Steps
- Choose the number of past observations (lags) for each variable to include in the model
- Create two regression models: One model using only the past values of the dependent variable. Another model using both the past values of the dependent variable and the past values of the independent variable
- Compare how well each model explains the dependent variable. The idea is to see if including the independent variables past values significantly improves prediction
- Conduct an F-test to determine if the added independent variables past values significantly improve prediction