Sales Forecast Part 2

Author: Charlie Cao

This is the second of a series of two notebooks on the topic of Sales Forecast. Through this series, we want to showcase one of the many ways that one can follow exloring and forecasting time series data. We encourage you to create your own Jupytor Notebook and follow along. You can also download this notebook along with any accompanying data in the Notebooks and Data GitHub Repository. Alternatively, if you do not have Python or Jupyter Notebook installed yet, you may experiment with a virtual Notebook by launching Binder or Syzygy below (learn more about these two tools in the Resource tab).

Launch Syzygy (UBC)

Launch Syzygy (Google)

Launch Binder

Background


The previous Notebook provides a detailed guide on the exploration and manipulation of data, while this Notebook will be centered around the modeling process itself. Understanding the data is important for analytics, and we recommend that you read the first Notebook (Part 1) prior to diving into modeling in order to gain a better grasp of our large and messy data. The datasets that we are using consist of sales records for a retailer with 45 stores, each containing several departments. They are already included in the GitHub Repository where this Jupyter Notebook is located (please see the “Data” folder), but you can also find them on this Kaggle page.

Let’s first start by importing the libraries we need and loading our data:

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import STL
from statsmodels.graphics.tsaplots import plot_acf
# Prophet is a forecasting package developed by Facebook
from fbprophet import Prophet
# ipywidgets are used to make interactive contents in Jupyter notebooks
from ipywidgets import interact

# There are multiple SettingWithCopyWarnings in this notebook
# Read more here:
# https://www.dataquest.io/blog/settingwithcopywarning/
# https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

# import data
df_sales = pd.read_csv('nb0008_data/sales.csv')
df_features = pd.read_csv('nb0008_data/features.csv')
df_stores = pd.read_csv('nb0008_data/stores.csv')

# converting the date column (initally stored as strings) to dates
df_sales['Date'] = pd.to_datetime(df_sales['Date'], format='%d/%m/%Y')
df_features['Date'] = pd.to_datetime(df_features['Date'], format='%d/%m/%Y')

# converting categorical-numeric columns to categories, which increases run speed
df_sales['Store'] = df_sales['Store'].astype('category')
df_sales['Dept'] = df_sales['Dept'].astype('category')
df_features['Store'] = df_features['Store'].astype('category')
df_stores['Store'] = df_stores['Store'].astype('category')
df_stores['Type'] = df_stores['Type'].astype('category')

Modeling


In this Notebook, we will be using the Prophet forecasting package developed by Facebook. However, there are many other forecasting methods and packages, such as ARIMA models, ETS models, or a combination of explanatory models (linear regression) and time series models; some machine learning models can also be used to predict time series. After reading this Notebook, we encourage you to experiment with different forecasting packages in Python (or R) so that you can find the best fit for your data.

STL Decomposition

Time series data can usually be described with Trend, Cycle, and Seasonality. The book, “Forecasting: Principles and Practice”, does a great job defining these three components in Chapter 2.3. Oftentimes, identifying these components in our data (and understanding our data in general) can help us select the most appropriate model(s). There are many different ways to decompose time series data. STL decomposition (short for Seasonal and Trend decomposition using Loess) is helpful in our case because we have weekly data, while a lot of other decomposition methods are only applicable to monthly or quarterly data. Here, we utilize ipywidgets again (please read its documentation for more information or for tutorials), together with the STL function from statsmodels, to inspect the different trends and seasonalities across stores and departments:

def stl_viz(store_num, dept_num):
    try:
        # Subsetting data according to store and department number
        sales_temp = df_sales[(df_sales.Store==store_num)&(df_sales.Dept==dept_num)]
        # Transform dataframe into time series
        sales_temp_ts = sales_temp.set_index('Date')['Weekly_Sales']
        # Decompose the data
        stl_temp = STL(sales_temp_ts).fit()
        # Set up plot
        plt.rc('figure',figsize=(12,8))
        plot_temp = stl_temp.plot()
        plt.show()
        return
    except:
        # Error message for when store-department combination doesn't exist
        print("ERROR: There is no Department {} in Store {}".format(store_num, dept_num))
        return

display(interact(stl_viz, store_num={n:n for n in range(1,46)}, dept_num={n:n for n in range(1,100)}))
interactive(children=(Dropdown(description='store_num', options={1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: …



<function __main__.stl_viz(store_num, dept_num)>

Prophet

Prophet is a package developed by Facebook. It is fast and fully automatic, which might not always be the best, but it surely is convenient. Prophet is also compatible with weekly data, which is one of the main reasons for why we are using it in this case; many other models, such as ARIMA and ETS models, only take in integer seasonalities (for example: 12), whereas the number of weeks in a year is usually not an integer (approximately 52.14). Prophet allows users to specify holidays in the model, as well, so that unusual observations caused by holidays can also be modeled. Let’s start by building a model for Department 1 in Store 1.

First, we will create a subset of the sales dataframe for rows where the Store and the Dept columns are both equal to 1:

sales_s1d1_df = df_sales[(df_sales.Store==1)&(df_sales.Dept==1)]

# Taking the log of `Weekly_Sales` so that it is less right skewed
sales_s1d1_df.loc[:,'Log_Weekly_Sales'] = np.log(sales_s1d1_df.loc[:,'Weekly_Sales'])

sales_s1d1_df.head()

StoreDeptDateWeekly_SalesIsHolidayLog_Weekly_Sales
0112010-02-0524924.50False10.123607
1112010-02-1246039.49True10.737255
2112010-02-1941595.55False10.635748
3112010-02-2619403.54False9.873211
4112010-03-0521827.90False9.990944

Train/Test Split

Just like a lot of machine learning models, we want to split our data into training and testing sets (commonly 80% and 20%, respectively) before we build our models so that we can evaluate the performance of these models. In our case, even though we do not have a lot of data to work with (two years of data with yearly seasonality), we still want to have at least an entire year of data in our test set so that we can evaluate the performance of our model across seasons. From the first Notebook, we know that the last date in our dataset is ‘2012-10-26’, so we will set the cutoff line exactly a year before that date. Note that you must use a date that exists in the data to keep the weekly intervals consistent. Prophet is very particular about the data that it uses: the date column has to be named ds while the dependent variable should be named y, and we will format our dataframe accordingly.

# Define threshold date and creating a threashold filter
threshold_date = pd.to_datetime('2011-10-28')
threshold = sales_s1d1_df.Date <= threshold_date

# Splitting the original dataframe into training and testing sets for residual diagnosis later
sales_s1d1_df_train = sales_s1d1_df[threshold]
sales_s1d1_df_test = sales_s1d1_df[~threshold]


sales_s1d1_train = (sales_s1d1_df_train[['Date', 'Log_Weekly_Sales']]
                    .rename(columns={'Date':'ds', 'Log_Weekly_Sales': 'y'}))
sales_s1d1_test = (sales_s1d1_df_test[['Date', 'Log_Weekly_Sales']]
                   .rename(columns={'Date':'ds', 'Log_Weekly_Sales': 'y'}))

display(sales_s1d1_train.head())
print('ATTENTION: Training set has {} rows'.format(sales_s1d1_train.shape[0]))
display(sales_s1d1_test.head())
print('ATTENTION: Testing set has {} rows'.format(sales_s1d1_test.shape[0]))

dsy
02010-02-0510.123607
12010-02-1210.737255
22010-02-1910.635748
32010-02-269.873211
42010-03-059.990944
ATTENTION: Training set has 91 rows

dsy
912011-11-0410.593782
922011-11-119.835719
932011-11-189.854857
942011-11-259.948043
952011-12-0210.138302
ATTENTION: Testing set has 52 rows
# Visualize train/test split
plt.rc('figure',figsize=(15,5))
fig, ax = plt.subplots()

sns.lineplot(x='ds', y='y', label='train', data=sales_s1d1_train, ax=ax)
sns.lineplot(x='ds', y='y', label='test', data=sales_s1d1_test, ax=ax)

# A vertical gray dashed line seperating the training and testing sets
ax.axvline(threshold_date, color='grey', linestyle='--')

ax.legend(loc='upper right')
ax.set(title='Weekly Sales', ylabel='');

png

Holidays

To model holidays in Prophet, we have to create a similar time series dataframe, with the names of the holidays in the holiday column and the date, again, in the ds column. Here, we create this dataframe accordingly:

# Subsetting data where the `IsHoliday` column indicates that the observation is a holiday
holidays_s1d1_df = sales_s1d1_df[sales_s1d1_df.IsHoliday==True]
# Create new dataframe with only `ds` and `holiday` columns
holidays_s1d1 = pd.DataFrame({'ds':holidays_s1d1_df['Date'], 'holiday':'holiday'})
display(holidays_s1d1.head())

dsholiday
12010-02-12holiday
312010-09-10holiday
422010-11-26holiday
472010-12-31holiday
532011-02-11holiday

Fitting

Now that we have our datasets ready, we can finally fit the model. One of the parameters of the model is called mcmc_samples, which sepcifies the number of samples to draw using the Markov Chain Monte Carlo method; this parameter has an impact on the confidence interval of our model, but for simplicity, we will not get into further details for the purposes of this Notebook.

model = Prophet(yearly_seasonality=True,
                weekly_seasonality=False,
                daily_seasonality=False,
                holidays=holidays_s1d1,
                interval_width=0.95,
                mcmc_samples=100)

model.fit(sales_s1d1_train)

# Warning messages might appear in our case because of limited data
WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:10 of 200 iterations saturated the maximum tree depth of 10 (5 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation





<fbprophet.forecaster.Prophet at 0x1e5ad4e1d88>

Testing the Model


Now with the model fitted using our training set, we can forecast our test set and compare it with the observed data. In this step, we will be able to analyze the errors of our forecasts and calculate some useful metrics for the model, which can be later used to compare with other models. Note that Prophet also generates retrospective predictions for past data, but we will only be using ‘future’ data, which is our test set.

# Create a new dataframe containing current dates and future dates
# Make sure to use "7D" instead of "W" as the frequency so the intervals stay the same
future = model.make_future_dataframe(periods=sales_s1d1_test.shape[0], freq='7D')
# Generate predictions.
forecast = model.predict(df=future)

# Back-transform the predictions and confidence intervals
forecast.loc[:,'exp_yhat'] = np.exp(forecast.loc[:,'yhat'])
forecast.loc[:,'exp_yhat_upper'] = np.exp(forecast.loc[:,'yhat_upper'])
forecast.loc[:,'exp_yhat_lower'] = np.exp(forecast.loc[:,'yhat_lower'])

# Creating a threshold filter and subset the 'future' forecasted data
threshold_forecast = forecast.ds <= threshold_date
forecast_test = forecast[~threshold_forecast]

display(forecast_test.head())
# Scroll to the right for back-transformed forecasts

dstrendyhat_loweryhat_uppertrend_lowertrend_upperadditive_termsadditive_terms_loweradditive_terms_upperholiday...yearlyyearly_loweryearly_uppermultiplicative_termsmultiplicative_terms_lowermultiplicative_terms_upperyhatexp_yhatexp_yhat_upperexp_yhat_lower
912011-11-049.8273379.72194710.8656599.52880210.0891940.4115300.1491020.6543540.0...0.4115300.1491020.6543540.00.00.010.23886727969.41317052347.46465316679.687463
922011-11-119.8206639.35728810.5870889.49657110.1010780.128629-0.1579320.4158350.0...0.128629-0.1579320.4158350.00.00.09.94929120937.38258139619.96588911582.927667
932011-11-189.8139889.08742910.2212619.47217310.106962-0.158051-0.4207780.1298690.0...-0.158051-0.4207780.1298690.00.00.09.65593715614.21498727481.3120688843.420350
942011-11-259.8073148.97485910.1846619.45432810.116060-0.202560-0.4530420.1070890.0...-0.202560-0.4530420.1070890.00.00.09.60475414835.13955526493.6602397901.904417
952011-12-029.8006399.20733110.4625519.42798210.1273770.063334-0.1955580.3566690.0...0.063334-0.1955580.3566690.00.00.09.86397319225.11996034980.6789919969.955755

5 rows × 25 columns

fig, ax = plt.subplots()
plt.rc('figure',figsize=(20,5))

sns.lineplot(x='Date', y='Weekly_Sales', label='train', data=sales_s1d1_df_train, ax=ax)
sns.lineplot(x='Date', y='Weekly_Sales', label='test', data=sales_s1d1_df_test, ax=ax)
sns.lineplot(x='ds', y='exp_yhat', label='prediction', data=forecast_test, ax=ax)

# visualizing the confidence interval
ax.fill_between(
    x=forecast_test['ds'],
    y1=forecast_test['exp_yhat_lower'],
    y2=forecast_test['exp_yhat_upper'],
    color='green',
    alpha=0.1,
    label='95% CI'
)

plt.ylim(-5000,80000)

ax.axvline(threshold_date, color='grey', linestyle='--')
ax.legend(loc='upper left')
ax.set(title='Weekly Sales', ylabel='');

png

We can see right away that our forecast (green line) is consistently underestimating the observations (orange line), even though the observations are all within the 95% confidence interval of predictions. Combined with the component plots below (log scale), we can also see that the model includes a decreasing trend, and there is a large amount of uncertainty in the trend component as time goes on, as illustrated by the expanding confidence interval (shaded area). Holidays, on the other hand, do not seem to affect weekly sales for Department 1 in Store 1.

model.plot_components(forecast, figsize=(13,7));

png

Residual Diagnosis

For many models, but not all, residuals are the differences between the predicted values and the observed values, and they are useful in checking whether the model has captured a sufficient amount of information from the data. In a good model, the residuals should be uncorrelated and have a mean that is close to 0. Correlated residuals signal that there is information left in the errors that can be further modeled, and a mean far away from 0 suggests that the predictions are biased.

forecast_test['errors'] = forecast_test['exp_yhat']-sales_s1d1_df_test['Weekly_Sales']

# Information on the SettingWithCopywarning that might occur:
# https://www.dataquest.io/blog/settingwithcopywarning/

errors_mean = forecast_test['errors'].mean()
errors_std = forecast_test['errors'].std()

print('Residual mean: {:.2f}'.format(errors_mean))
print('Residual standard deviation: {:.2f}'.format(errors_std))
Residual mean: -6369.44
Residual standard deviation: 5974.84
fig, axes = plt.subplots(2,2,figsize=(15,10))

# Plot residual distribution
sns.distplot(a=forecast_test['errors'], ax=axes[0,0], bins=15, rug=True)
axes[0,0].axvline(x=errors_mean, color='green', linestyle='--', label=r'$\mu$')
axes[0,0].axvline(x=errors_mean + 2*errors_std, color='red', linestyle='--', label=r'$\mu \pm 2\sigma$')
axes[0,0].axvline(x=errors_mean - 2*errors_std, color='red', linestyle='--')
axes[0,0].legend()
axes[0,0].set(title='Model Errors (Test Set)')

# Plot residuals against time
sns.scatterplot(x='ds', y='errors', data=forecast_test, ax=axes[0,1])
axes[0,1].axhline(y=errors_mean, color='green', linestyle='--', label=r'$\mu$ ')
axes[0,1].axhline(y=errors_mean + 2*errors_std, color='red', linestyle='--', label=r'$\mu \pm 2\sigma$')
axes[0,1].axhline(y=errors_mean - 2*errors_std, color='red', linestyle='--')
axes[0,1].legend()
axes[0,1].set(title='Model Errors (Test Set)')

# Plot prediction against observations
sns.regplot(x=sales_s1d1_test['y'], y=forecast_test['yhat'], color='orange', label='test', ax=axes[1,0])
# Generate diagonal line to plot.
d_x = np.linspace(start=sales_s1d1_test['y'].min()-0.1, stop=sales_s1d1_test['y'].max()+0.1, num=100)
axes[1,0].plot(d_x, d_x, color='red', label='yhat=y', ls=':')
axes[1,0].legend()
axes[1,0].set(title='Test Data vs Predictions')

# Plot autocorrelation of residuals
plot_acf(x=forecast_test['errors'], ax=axes[1,1])

plt.show()

png

# Calculate key error metrics
mean_abs_err = np.mean(abs(forecast_test['errors']))
mean_sqrd_err = np.mean(np.square(forecast_test['errors']))
rt_mean_sqrd_err = np.sqrt(mean_sqrd_err)
mean_abs_pct_err = np.mean(abs(forecast_test['errors']/sales_s1d1_test['y']))

display(pd.DataFrame({'MAE': mean_abs_err,
                      'MSE': mean_sqrd_err,
                      'RMSE': rt_mean_sqrd_err,
                      'MAPE': mean_abs_pct_err}, index=['Measurements']))

MAEMSERMSEMAPE
Measurements6698.448687.558195e+078693.788244659.131386

We can see that the mean of our residuals is smaller than 0, and our predictions were frequently underestimating the observed data. The distribution of the residuals is also skewed by some extremely negative forecasting errors (underestimation). At the same time, the plots above show that the variance of our residuals is not constant over time; it decreases from the beginning to the end of the year. Luckily, there is not much significant autocorrelation in the residuals.

In short, our forecast is very biased, but our model is effectively capturing most of the information and patterns. However, this is not to say that we cannot find a better model (or more importantly, collect more data). In our case, we employ Prophet because it is compatible with weekly data. Perhaps there are other parameters that we can fine-tune, or even other models that should be fitted to our data, before we make the final decision on which model to use. Again, for the sake of simplicity, we will go ahead and forecast future sales with this current model.

Forecasting


Now that we have an understanding of the prediction errors in our model, we can move on to build a full model using all the existing data. However, please be careful not to test the model with the data that are used to create the model itself.

sales_s1d1 = (sales_s1d1_df[['Date', 'Log_Weekly_Sales']]
              .rename(columns={'Date':'ds', 'Log_Weekly_Sales': 'y'}))

display(sales_s1d1.tail())

dsy
1382012-09-289.849444
1392012-10-059.994446
1402012-10-1210.032936
1412012-10-1910.093499
1422012-10-2610.217963
model = Prophet(yearly_seasonality=True,
                weekly_seasonality=False,
                daily_seasonality=False,
                holidays=holidays_s1d1,
                interval_width=0.95,
                mcmc_samples=100)

model.fit(sales_s1d1)

# Extend dates and features.
future = model.make_future_dataframe(periods=sales_s1d1_test.shape[0], freq='7D')
# Generate predictions.
forecast = model.predict(df=future)

# Back-transform the predictions and confidence intervals
forecast.loc[:,'exp_yhat'] = np.exp(forecast.loc[:,'yhat'])
forecast.loc[:,'exp_yhat_upper'] = np.exp(forecast.loc[:,'yhat_upper'])
forecast.loc[:,'exp_yhat_lower'] = np.exp(forecast.loc[:,'yhat_lower'])

display(forecast.tail())
# Scroll to the right for back-transformed forecasts
WARNING:pystan:n_eff / iter below 0.001 indicates that the effective sample size has likely been overestimated
WARNING:pystan:Rhat above 1.1 or below 0.9 indicates that the chains very likely have not mixed
WARNING:pystan:8 of 200 iterations saturated the maximum tree depth of 10 (4 %)
WARNING:pystan:Run again with max_treedepth larger than 10 to avoid saturation

dstrendyhat_loweryhat_uppertrend_lowertrend_upperadditive_termsadditive_terms_loweradditive_terms_upperholiday...yearlyyearly_loweryearly_uppermultiplicative_termsmultiplicative_terms_lowermultiplicative_terms_upperyhatexp_yhatexp_yhat_upperexp_yhat_lower
1902013-09-279.9494368.87898310.8630369.00525110.931574-0.069406-0.2200700.0834040.0...-0.069406-0.2200700.0834040.00.00.09.88003019536.29954952210.3575677179.482755
1912013-10-049.9490498.86492710.9066628.98144210.937505-0.066797-0.2232740.0633760.0...-0.066797-0.2232740.0633760.00.00.09.88225219579.76768054538.5036687079.277725
1922013-10-119.9486628.95149111.0340138.95822610.9644810.002060-0.1489020.1366510.0...0.002060-0.1489020.1366510.00.00.09.95072220967.34902561945.6670357719.392897
1932013-10-189.9482759.03464911.2540508.92761510.9992120.1800190.0323230.3004790.0...0.1800190.0323230.3004790.00.00.010.12829325041.58526677191.9098348388.771142
1942013-10-259.9478889.21841511.4756388.90247711.0338230.3763290.2280940.5061460.0...0.3763290.2280940.5061460.00.00.010.32421730461.43556596339.86781510081.072583

5 rows × 25 columns

fig, ax = plt.subplots()
plt.rc('figure',figsize=(15,7))

sns.scatterplot(x='Date', y='Weekly_Sales', label='observations', color='orange', data=sales_s1d1_df, ax=ax)
sns.lineplot(x='ds', y='exp_yhat', label='predictions', color='green', data=forecast, ax=ax)

# visualizing the confidence interval
ax.fill_between(
    x=forecast['ds'],
    y1=forecast['exp_yhat_lower'],
    y2=forecast['exp_yhat_upper'],
    color='green',
    alpha=0.1,
    label='95% CI'
)

# plt.ylim(-5000,80000)

ax.axvline(max(sales_s1d1_df['Date']), color='grey', linestyle='--')
ax.legend(loc='upper left')
ax.set(title='Weekly Sales', ylabel='');

png

Next Steps


Now that you have a basic understanding of our data, as well as forecasting for a single time series in Python using Prophet, we encourage you to:

  1. Read through the documentation for Prophet and fine-tune any additional parameters that you think could improve our model.
  2. Create your own Prophet model with a different dataset. Try to find a time series that is much longer than what we used in this Notebook so that you have more data to work with.
  3. Come up with a strategy to efficiently forecast for multiple stores and departments.