Sales and Demand Forecast Analysis

Sales and Demand Forecast Analysis

Sales and Demand Anomaly Detection

As discussed and explained in my medium post, considering the unprecedented situations of Covid19, where the entire world equilibrium got disrupted, finding the “new normal” will be the major goal of all organizations. So, anomaly detection will be an important skill set to have for all Data Scientists and Data Analysts. In my medium post, I have discussed in depth about what is actually a time series anomaly, why do we need to keep track of time series anomaly and what are the effective ways to deal with time series anomaly. In this article, I am going to present a code walk-through about the three effective ways of doing time-series anomaly detection, which anyone can follow and actually start implementing.

In the medium post, I have talked about broadly three ways to deal with time series anomaly and so here I would be explaining all of these with code example in Python and practical pros and cons in my persepective.

Time Series Anomaly Detection By Predictive Confidence Level Approach

The first approach that we are going to talk about is by using a predictive confidence interval approach. Now, what exactly do we mean by predictive confidence interval? As discussed in the book “Business Forecasting- Practical Problems and Solutions” by Michael Gilliland and co., time series forecasting is not a single point estimate but rather a probabilistic forecast which should lie within a range of probable values and should include both the best case scenario and the worst case scenario.So, when we do time series forecasting using any algorithm (SARIMAX, ARIMA, CNN, LSTM etc.) calculating the error metric or the accuracy of the forecast is also important. Now, using this error metric, when we say, our forecasting might have +/- 10% error, this error rate should be considered as a confidence interval for the model.

Image for post

Now, any time series data point, lying outside this confidence zone, is actually an anomaly as that is something the model was not expecting! Now, let’s see a simple approach of how to do this in Python.

Let’s say we have a time series data like this:

Now, we will use a simple ARIMA method to calculate the forecast and again I am not focusing on the accuracy tuning, but just highlighting the idea using the simple method. For real life use case, the model and the algorithm selection will be very important.

# Using a simple Seasonal ARIMA model to highlight the idea, in the actual world, the model has to best fit the data
train, test = train_test_split(grouped_df, train_size=18)

model = pm.auto_arima(train, seasonal=True, m=4)

forecasts = model.predict(test.shape[0])  # predict N steps into the future
mape = MAPE(test.values.reshape(1,-1)[0], forecasts)

and using MAPE as the error metric, but in the real life case we can use any error metric like SMAPE, RMSE etc.

def MAPE(y_orig, y_pred):
    diff = y_orig - y_pred
    MAPE = np.mean((abs(y_orig - y_pred)/y_orig)  * 100.)
    return MAPE

Let’s say for the forecast time period, what we get is this:

We see that the model is overall good but there is one area, where there is a high difference between actual and predicted. Now let’s build a confidence interval around that point.

# upper bound and lower bound using MAPE
predicted_ub = forecast + (mape * 0.01 * forecast)
predicted_lb = forecast - (mape * 0.01 * forecast)
plot.fill_between(date, predicted_lb, predicted_ub, alpha = 0.3, color = 'red')
plt.plot(date, test.values.reshape(1,-1)[0], color = 'black')
plt.legend(['Actual Test', 'Prediction Band'])
plt.xlabel('Time Period')
plt.xticks(rotation=45)
plt.ylabel('Normalized Profit')

Technically in this case we are just taking a +/- 10% confidence band. But this number might vary based on the nature of the time series and forecast error margin that we want to keep. But now, let’s try to visualize, how to capture anomaly points:

So, anything which lies beyond the confidence band is an anomaly. And a simple piece of code can be written to capture that :

test_actuals = test.values.reshape(1,-1)[0]
anomaly_value = []
anomaly_date = []
for i in range(len(test_actuals)):
    if (abs(test_actuals[i]) > abs(predicted_ub[i]) or abs(test_actuals[i]) < abs(predicted_lb[i])):
        anomaly_value.append(test_actuals[i])
        anomaly_date.append(date[i])
        
print(anomaly_value, anomaly_date)

# upper bound and lower bound using MAPE
predicted_ub = forecast + (mape * 0.01 * forecast)
predicted_lb = forecast - (mape * 0.01 * forecast)
plot.fill_between(date, predicted_lb, predicted_ub, alpha = 0.3, color = 'red')
plt.plot(date, test.values.reshape(1,-1)[0], color = 'black')
plt.plot(anomaly_date, anomaly_value, 'r*')
plt.legend(['Actual Test', 'Anomaly Detected','Prediction Band'])
plt.xlabel('Time Period')
plt.xticks(rotation=45)
plt.ylabel('Normalized Profit')

And, thus can be an effective way to detect anomaly data points. So, as we say it is a very easy way to find anomaly points. And it is one of the biggest advantage of this approach. Also it is pretty useful when the time series data is consistent and less volatile and when the forecast accuracy achieved is very high. But when do you think, this approach might fail?

When we have a highly volatile time series, particularly in some sales and marketing scenario, when the data has constant ups and downs and no particular pattern, this approach might not work. And also, as it is said that “All models are wrong, but some are useful “, this approach is highly dependent on the algorithm and model used for forecasting the time series. If the model is wrong, as in majority cases, this may not be a reliable approach. Finding the local outlier becomes easy with this method, but selection of error metric, and modeling technique is crucial for reducing the false positives and false negatives and because of which this method is inefficient for volatile time series.

pbs.twimg.com/media/EUtS9v4WkAAAJXv.jpg

Time Series Anomaly Detection by Statistical Profiling Approach

As mentioned in medium post, statistical models are the easiest to develop and works well with volatile time series. This can be an extremely controlled method and the results are more explainable which gives rich insights to business stakeholders and empowers them to streamline the business strategies. I would highly recommend this approach to be the baseline approach, so that this can be developed quickly and then we can go for sophisticated models for getting better results.

Image for post

We will see some simple Python code to implement this concept.

def coalesce(value):
    if math.isnan(value):
        return 0.0
    else:
        return value
    

def generate_statistical_profile(series):
    grouped_series = series.groupby(['Time Period'])['Normalized Profit'].sum().reset_index()
    grouped_series['Time Period'] = pd.to_datetime(grouped_series['Time Period'], errors='coerce')
    grouped_series['mm-yy'] =  grouped_series['Time Period'].dt.month * 10**4 + grouped_series['Time Period'].dt.year
    grouped_series['qq-yy'] =  grouped_series['Time Period'].dt.quarter * 10**4 + grouped_series['Time Period'].dt.year
    grouped_series['yy'] =  grouped_series['Time Period'].dt.year

    new_profile_m = pd.DataFrame()

    new_profile_m['max_monthly_qty'] = grouped_series.groupby(['mm-yy'])['Normalized Profit'].max().values
    new_profile_m['min_monthly_qty'] = grouped_series.groupby(['mm-yy'])['Normalized Profit'].min().values    
    new_profile_m['moving_avg_max'] = new_profile_m['max_monthly_qty'].rolling(2, min_periods=1).mean()
    new_profile_m['moving_avg_min'] = new_profile_m['min_monthly_qty'].rolling(2, min_periods=1).mean()

    new_profile_m['total_monthly_qty'] = grouped_series.groupby(['mm-yy'])['Normalized Profit'].sum().values
    new_profile_m['monthly_freq'] = grouped_series.groupby(['mm-yy'])['Normalized Profit'].count().values
    new_profile_m['moving_avg_total_m'] = new_profile_m['total_monthly_qty'].rolling(2, min_periods=1).mean()
    new_profile_m['moving_avg_freq_m'] = new_profile_m['monthly_freq'].rolling(2, min_periods=1).mean()

    new_profile_q = pd.DataFrame()
    new_profile_q['total_quarterly_qty'] = grouped_series.groupby(['qq-yy'])['Normalized Profit'].sum().values
    new_profile_q['quarterly_freq'] = grouped_series.groupby(['qq-yy'])['Normalized Profit'].count().values
    new_profile_q['moving_avg_total_q'] = new_profile_q['total_quarterly_qty'].rolling(2, min_periods=1).mean()
    new_profile_q['moving_avg_freq_q'] = new_profile_q['quarterly_freq'].rolling(2, min_periods=1).mean()

    new_profile_y = pd.DataFrame()
    new_profile_y['total_yearly_qty'] = grouped_series.groupby(['yy'])['Normalized Profit'].sum().values
    new_profile_y['yearly_freq'] = grouped_series.groupby(['yy'])['Normalized Profit'].count().values
    new_profile_y['moving_avg_total_y'] = new_profile_y['total_yearly_qty'].rolling(2, min_periods=1).mean()
    new_profile_y['moving_avg_freq_y'] = new_profile_y['yearly_freq'].rolling(2, min_periods=1).mean()

    
    statiscal_profile = [
        new_profile_m['max_monthly_qty'].median(),
        new_profile_m['moving_avg_max'].mean(),
        new_profile_m['moving_avg_max'].mean() + 0.5 * coalesce(new_profile_m['moving_avg_max'].std()),
        
        new_profile_m['min_monthly_qty'].median(),
        new_profile_m['moving_avg_min'].mean(),
        new_profile_m['moving_avg_min'].mean() - 0.5 * coalesce(new_profile_m['moving_avg_min'].std()),
        
        new_profile_m['moving_avg_total_m'].mean() + 0.5 * coalesce(new_profile_m['moving_avg_total_m'].std()),
        new_profile_m['moving_avg_freq_m'].mean() + 0.5 * coalesce(new_profile_m['moving_avg_freq_m'].std()),
        
        new_profile_m['moving_avg_total_m'].mean() - 0.5 * coalesce(new_profile_m['moving_avg_total_m'].std()),
        new_profile_m['moving_avg_freq_m'].mean() - 0.5 * coalesce(new_profile_m['moving_avg_freq_m'].std()),
        
        new_profile_q['moving_avg_total_q'].mean() + 0.5 * coalesce(new_profile_q['moving_avg_total_q'].std()),
        new_profile_q['moving_avg_freq_q'].mean() + 0.5 * coalesce(new_profile_q['moving_avg_freq_q'].std()),
        
        new_profile_q['moving_avg_total_q'].mean() - 0.5 * coalesce(new_profile_q['moving_avg_total_q'].std()),
        new_profile_q['moving_avg_freq_q'].mean() - 0.5 * coalesce(new_profile_q['moving_avg_freq_q'].std()),
        
        new_profile_y['moving_avg_total_y'].mean() + 0.5 * coalesce(new_profile_y['moving_avg_total_y'].std()),
        new_profile_y['moving_avg_freq_y'].mean() + 0.5 * coalesce(new_profile_y['moving_avg_freq_y'].std()),
        
        new_profile_y['moving_avg_total_y'].mean() - 0.5 * coalesce(new_profile_y['moving_avg_total_y'].std()),
        new_profile_y['moving_avg_freq_y'].mean() - 0.5 * coalesce(new_profile_y['moving_avg_freq_y'].std()),
    ]
    
    statiscal_profile = [round(e,2) for e in statiscal_profile]
    
    return statiscal_profile

So, this is how we can construct a statistical profile of the time series data. Once the profile is created, we can format it if required, which will give us a nice, compact json or a dataframe for the time series:

def format_statistical_profile(profile_value):
    individual_profile = {}
    individual_profile["UpperThreshold"] = max(profile_value[0], profile_value[1], profile_value[2])
    individual_profile["LowerThreshold"] = min(profile_value[4], profile_value[5], profile_value[6])
    individual_profile["UTotalMonth"] = profile_value[6]
    individual_profile["UFreqMonth"] = profile_value[7]
    individual_profile["LTotalMonth"] = profile_value[8]
    individual_profile["LFreqMonth"] = profile_value[9]    
    individual_profile["UTotalQuarter"] = profile_value[10]
    individual_profile["UFreqQuarter"] = profile_value[11]
    individual_profile["LTotalQuarter"] = profile_value[12]
    individual_profile["LFreqQuarter"] = profile_value[13]     
    individual_profile["UTotalYear"] = profile_value[14]
    individual_profile["UFreqYear"] = profile_value[15]
    individual_profile["LTotalYear"] = profile_value[16]
    individual_profile["LFreqYear"] = profile_value[17]
    
    return individual_profile

Example statistical profile :

{'LFreqMonth': 10.89,
 'LFreqQuarter': 10.89,
 'LFreqYear': 10.89,
 'LTotalMonth': -0.35,
 'LTotalQuarter': -0.35,
 'LTotalYear': -0.35,
 'LowerThreshold': -0.71,
 'UFreqMonth': 12.22,
 'UFreqQuarter': 12.22,
 'UFreqYear': 12.22,
 'UTotalMonth': 5.51,
 'UTotalQuarter': 5.51,
 'UTotalYear': 5.51,
 'UpperThreshold': 1.15}

And finally, we can use a simple rule based comparison mechanism to detect anomalies.

anomaly_value = []
anomaly_date = []
date = pd.date_range(start ='1-1-2011', periods = len(test_actuals) , freq ='MS') 
for i in range(len(test_actuals)):
    if (test_actuals[i] > statistical_profile['UpperThreshold'] or test_actuals[i] < statistical_profile['LowerThreshold']):
        anomaly_value.append(test_actuals[i])
        anomaly_date.append(date[i])
        
print(anomaly_value, anomaly_date)
plot.fill_between(date,statistical_profile['LowerThreshold'], statistical_profile['UpperThreshold'], alpha = 0.3, color = 'yellow')
plt.plot(date, train.values.reshape(1,-1)[0], color = 'black')
plt.plot(anomaly_date, anomaly_value, 'r*')
plt.legend(['Actual Test', 'Anomaly Detected','Prediction Band'])
plt.xlabel('Time Period')
plt.xticks(rotation=45)
plt.ylabel('Normalized Profit')

Now, as we said, this is a much more controlled mechanism to detect outliers, but the only challenge is coming up with the profiling metrics that would generate the properties of the time series data. Having said that, usual statistical methods can be easily used to compute the properties of the time series data and hence this method can be so well tuned that, even for highly volatile series it is very easy to reduce false positives.

Another drawback I have seen so far is detecting local outliers. A traditional window based statistical profiling can even solve this problem.

Time Series Anomaly Detection By Clustering Based Unsupervised Approach

Now, as mentioned in the medium post, for time series data, supervised anomaly detection is not a feasible option, as coming up with a labeled time series anomaly dataset for practical scenario can be difficult and an expensive affair. In such cases, we should look for unsupervised approaches, typically clustering techniques which does not require use to pre-define the number of clusters, as that is also not possible to do.

Hence, my preferred approach is by using DBSCAN, an algorithm which works very well when used for outlier detection.

Image for post

It is very easy to perform DBSCAN and the algorithm is very fast as well. Let's see a simple Python code example to do DBSCAN :

from sklearn.cluster import DBSCAN
clustering1 = DBSCAN(eps=0.09, min_samples=6).fit(np.array(ts_dataframe['Normalized Profit']).reshape(-1,1))

labels = clustering1.labels_

outlier_pos = np.where(labels == -1)[0]

x = []; y = [];
for pos in outlier_pos:
    x.append(np.array(ts_dataframe['Normalized Profit'])[pos])
    y.append(ts_dataframe['Normalized Profit'].index[pos])
    
plt.plot(ts_dataframe['Normalized Profit'].loc[ts_dataframe['Normalized Profit'].index], 'k-')
plt.plot(y,x,'r*', markersize=8)  
plt.legend(['Actual', 'Anomaly Detected'])
plt.xlabel('Time Period')
plt.xticks([0, 20, 40, 60, 80, 99],[ts_dataframe.index[0],ts_dataframe.index[20], ts_dataframe.index[40], ts_dataframe.index[60], ts_dataframe.index[80], ts_dataframe.index[99]] ,rotation=45)
plt.ylabel('Normalized Profit')

DBSCAN just requires us to tune two hyper-parameters, there by making it simple to use. But is DBSCAN always accurate?

May be not. In many cases, I have observed high local false positive and false negative. But for inefficient local outlier detection, the best step we can take is to implement a window based approach, that can solve most of the local false outliers.

SUMMARY

With this we come to the end of this coding tutorial. The main idea or purpose of this post is to give code based intuitions of how we can perform the effective time series anomaly detection approaches as mentioned in my medium post. Again, there is no universal right approach to follow, but it mostly depends on the scenario and the time series data. But in general, I found these methods to be really helpful particularly for Sales and Demand anomaly detection!

In you want to look at my other posts about Demand Planning Segmentation or drop a note or any comment or feedback, I would be happy to hear your thoughts! You can also take a look at my github account for a consolidated code presented in a Jupyter notebook.

Tags: , , , , , , , ,

14 Responses

  1. arihant says:

    You’ve not uploaded either the correct code or the output because if your run the notebook outputs and plots are varying largely. Hence, not understandable. If you could please update the same.
    Thanks

    • Aditya says:

      Hello Arihant,

      The purpose was not upload the notebook to try it as it is on the same problem, but to refer this approach and apply it on your own problem and dataset. So, you would have to be very specific about which part you are not able to understand and you would have to share in more details about your problem and where you have tried to apply it. Only then I will be able to help yo in the best possible way.

      Thanks,
      Aditya

  2. sri says:

    Hi Aditya,
    Great stuff. Really appreciate your work. I was following through and i am getting one error can you please let me know what i need to fix?

    # Using a simple Seasonal ARIMA model to highlight the idea, in the actual world, the model has to best fit the data
    train, test = train_test_split(ts_dataframe, train_size=84)

    mape = MAPE(test.values.reshape(1,-1)[0], forecast)

    NameError: name ‘forecast’ is not defined

    thanks
    Sri

    • Aditya says:

      Hey, thanks for reaching out. The forecast variable is taking the time series forecast using SARIMA as the model. But the purpose of the tutorial was to focus on time series anomaly detection methods, and so the forecast method can be anything. Please refer this tutorial only to refer the TS anomaly detection methods and implement the same on your own data!

  3. Noveenaa says:

    Hi,

    I am more curious to know about the unsupervised clustering approach, the DBSCAN algorithm is quite straightforward, by choosing the noise points (-1) as anomalies, but what happens for the other algorithms for k-means, GMM, how could you validate the particular threshold value? Eg. For GMM the most of the cases will be a threshold of 0.95.

    Thanks

    • Aditya says:

      Hi!

      Thanks for reaching out. Usually for K-Means and GMM, you would have to pre-define the number clusters. For a time-series data that is not always feature. One possible way is to just consider two different clusters (anomalous and not anomalous) but not sure how much effective it will be and may depend on the dataset. I would be happy to discuss if you have conducted similar experiments on this line.

      Thanks,
      Aditya

  4. RAFIA AKHTER says:

    Hello,
    I have to do a project where I have to detect bad data points from good data points. Time series data. I want to share more by email. Can you pease give your email address ? my one : akhter.rafia1@gmail.com

  5. Mohammad says:

    Hello Aditya,
    First of all, thank you for your good post.

    in the code under the section statistical profiling approach, may you tell me what ”grouped_series[‘mm-yy’]” means? what does it imply?

    Also, you calculated (10000*month + year) and assigned it to the series which I mentioned above. why did you choose 10000? what does this specific number mean?

    I appreciate it if you refer me to another reference explaining this approach, explaining both theory and code.

    Best regards,
    Mohammad

    • Aditya says:

      Hi Mohammad,

      Thank you for your comment and questions. To answer your questions:
      1. This step completely depends on the data. The data with which I was working on had multiple values for each date. Grouping by and taking the sum based on the date ensures that there is one unique value for a specific date. Say for 1-Jan-2020, I have two values 3 and 6. After grouping by, I will have one value for each unique date i.e. for 1-Jan-2020 it is 9 in this case.
      2. Again the data that I was using had values given in aggregated units, which I wanted to break as complete values. Say, the values are given in thousands like 10K, instead of using the values as 10, I preferred taking as 10*1000 = 10,000.
      3. On top of my mind I don’t think there is any article other than mine, but the idea is to map outliers considering trends and seasonal effects and detecting the statistical upper and lower bounds based on both magnitude and frequency, so that anything beyond these bounds are classified as an anomaly. Please feel free to contact me through various options mentioned here: https://aditya-bhattacharya.net/contact-me/
      I will be happy to discuss further!

      Thanks,
      Aditya

  6. Varad says:

    Hi Aditya, Great article.
    Just had a query regarding the statistical profiling approach.
    For threshold we are considering maximums and minimums between
    a. Median of max monthly values
    b. Mean of Rolling Averages of previous 2 months
    c. Mean of Rolling Avg * 0.5 SD

    In ‘C’ above, any specific reason behind taking 0.5 times the SD??

    As usually we consider 1,2, 3 SD.

    Can you provide some references of books or articles regarding this approach.

    Thanks

    • Aditya says:

      Hi Varad,

      Thanks for reaching out. To answer your question, usually considering Nelson’s rule, any point 3 STD apart is considered as an outlier. But in practice, I have seen that for datasets which are less volatile and have consistent outcome, 3 std is too much! Rather considering any value between 0.5 and 1 std is more appropriate. So, that’s the intuition behind 0.5 std.
      For books I am not sure about any book mentioning this approach as such. But Bollinger bands try to follow a similar approach.

      Hope this helps!

      Thanks,
      Aditya

  7. Austin says:

    Hi Aditya, thank you for sharing this knowledge!

    Can you elaborate further on the window based approach? Is this subsetting the time series data into n different blocks and performing outlier detection on each individual block? If so, how does one decide to partition the time series data?

    • Aditya says:

      Hi Austin,

      Yes, it is like considering a smaller temporal segment of the time series data and estimating anomaly for the temporal segment. For deciding the window period, you would need to have domain knowledge about the seasonal variation. For example, in most countries financial year is considered to be from April to March. So, you would need to select the window to map the timeframe of April to March. Hope this helps!

      Best Regards,
      Aditya

Leave a Reply

Your email address will not be published. Required fields are marked *