Sales and Demand Forecast Analysis
Demand Planning Segmentation
As mentioned in my medium post, in highly volatile situations like Covid19, where the entire world is disrupted and the established equilibrium is dismantled in almost all domains including Manufacturing, Supply Chain chain, Sales and Finance and even in Marketing, organizations needs to be smarter and apply selective segmentation as a part of their business strategies to maximize their profits and with the limited resources which they have. As mentioned in the post, as both resources and opportunities are limited in every segment, it is extremely important that demand planning strategies are directed towards ensuring maximum revenue and maximum utilization of available resources. Thus it enables having a focused ability to convert big opportunities to bigger and mitigate loss due less demand or less supply in the business process!
In this section, I am going to present a coding walk-through of such a simple yet a powerful technique of performing selective segmentation, that works even at a granular level of demand planning and sales forecasting process, which is called ABC-XYZ Analysis.
The entire concept is based on the principle of Pareto, where 80% of the impact is just made by 20% of the total population. The segmentation process has two parts : Segmentation by Importance and Segmentation by Forecastability.
In the first part, we will talk about Segmentation by Importance or ABC analysis. In ABC matrix, we will divide the entire forecast state space (basically all the customers or all the products or any other forecast dimension or even combination) into 3 classes – namely A, B, C, based on the revenue share or profit margin contribution (i.e. their “importance”). As discussed in the medium post we will rank them based on importance and top 20% will be A, next 30% will B and remaining 50% will be of C class. Below we will see the python code example for ABC analysis.
ABC Analysis – Importance Segmentation
def ABC_analysis(df): grouped_df = ( df.loc[:, ['CustomerID','Revenue']] .groupby('CustomerID') .sum() ) grouped_df = grouped_df.sort_values(by=['Revenue'], ascending=False) ## Ranking by importance grouped_df["Rank"] = grouped_df['Revenue'].rank(ascending = False) grouped_df["Importance"] = ' ' grouped_df = grouped_df.reset_index() ## Checking the Importance of the Customers and Categorising into class A,B,C and splitting based on 20-30-50 grouped_df['Importance'][0: int(0.2 * grouped_df['Rank'].max())] = 'A' grouped_df['Importance'][int(0.2 * grouped_df['Rank'].max()) : int(0.5 * grouped_df['Rank'].max())] = 'B' grouped_df['Importance'][int(0.5 * grouped_df['Rank'].max()): ] = 'C' return grouped_df
ABC_groups = ABC_analysis(data) ABC_groups.head()
Next will be segmenting each of these time series groups based on predictability or forecastability. Here also, we will rank all the group elements based on the forecastability metric (like MAPE, SMAPE, COV, RMSE) and select top 20% as X, next 30% as Y and last 50% as Z. Below we will see the python code example for XYZ analysis.
XYZ Analysis – Forecastability Segmentation
def XYZ_analysis(df): XYZ_list = [] for group, value in df.groupby(['CustomerID']): grouped_df = ( value.loc[:, ['Timeperiod','Quantity']].set_index('Timeperiod') ) # Using a simple Seasonal ARIMA model to highlight the idea, in the actual world, the model has to best fit the data train, test = train_test_split(grouped_df, train_size=18) model = pm.auto_arima(train, seasonal=True, m=4) forecasts = model.predict(test.shape[0]) # predict N steps into the future mape = MAPE(test.values.reshape(1,-1)[0], forecasts) XYZ_list.append([group, mape]) XYZ_group = pd.DataFrame(XYZ_list, columns = ['CustomerID', 'MAPE']) XYZ_group = XYZ_group.sort_values(by=['MAPE'], ascending=True) ## Ranking by forecastability XYZ_group["Rank"] = XYZ_group['MAPE'].rank(ascending = True) XYZ_group["Forecastability"] = ' ' ## Checking the Importance of the Customers and Categorising into class A,B,C and splitting based on 20-30-50 XYZ_group['Forecastability'][0: int(0.2 * XYZ_group['Rank'].max())] = 'X' XYZ_group['Forecastability'][int(0.2 * XYZ_group['Rank'].max()) : int(0.5 * XYZ_group['Rank'].max())] = 'Y' XYZ_group['Forecastability'][int(0.5 * XYZ_group['Rank'].max()): ] = 'Z' return XYZ_group
def MAPE(y_orig, y_pred): diff = y_orig - y_pred MAPE = np.mean((abs(y_orig - y_pred)/y_orig) * 100.) return MAPE
XYZ_groups = XYZ_analysis(data) XYZ_groups.head()
Finally when we combine both of these segmentation techniques, we get ABC-XYZ Analysis which is Importance-Forecastability Segmentation. The code snippet in python, for the same is as follows:
ABC-XYZ Analysis – Importance Forecastability Segmentation
abc_xyz = pd.merge(XYZ_groups, ABC_groups, on='CustomerID', how='inner').iloc[:,[0,3,6]] grouped_ABC_XYZ = abc_xyz.groupby(['Importance', 'Forecastability']).count() result = grouped_ABC_XYZ.pivot_table(index = 'Importance', columns= 'Forecastability', values= 'CustomerID') sns.heatmap(result, annot = True, fmt = '', cmap = 'Blues')
Now to summarize, a less complicated yet a powerful method like ABC-XYZ analysis actually helps us to group items in terms of their importance and predictability, so that the business leadership team can take appropriate steps and plan different strategies to increase opportunities and ensure maximum outcome inspite of having limited opportunities resources.
In some cases, like for Sales Forecasting, producing accurate forecasts at a granular level is often difficult as the prediction accuracy might get severely impacted by the volatility of the granular data. So, ABC-XYZ analysis enables us to filter and extract the most important and most predictable items from all global items, so that more focused effort can be given to improve forecast accuracy for items that matter the most.
As a matter of fact, this power approach, can work with any time series data for any domain. For a detailed conceptual over-view, please visit medium post. For a full code notebook, please take a look at my github post.
14 Responses
You’ve not uploaded either the correct code or the output because if your run the notebook outputs and plots are varying largely. Hence, not understandable. If you could please update the same.
Thanks
Hello Arihant,
The purpose was not upload the notebook to try it as it is on the same problem, but to refer this approach and apply it on your own problem and dataset. So, you would have to be very specific about which part you are not able to understand and you would have to share in more details about your problem and where you have tried to apply it. Only then I will be able to help yo in the best possible way.
Thanks,
Aditya
Hi Aditya,
Great stuff. Really appreciate your work. I was following through and i am getting one error can you please let me know what i need to fix?
# Using a simple Seasonal ARIMA model to highlight the idea, in the actual world, the model has to best fit the data
train, test = train_test_split(ts_dataframe, train_size=84)
mape = MAPE(test.values.reshape(1,-1)[0], forecast)
NameError: name ‘forecast’ is not defined
thanks
Sri
Hey, thanks for reaching out. The forecast variable is taking the time series forecast using SARIMA as the model. But the purpose of the tutorial was to focus on time series anomaly detection methods, and so the forecast method can be anything. Please refer this tutorial only to refer the TS anomaly detection methods and implement the same on your own data!
Hi,
I am more curious to know about the unsupervised clustering approach, the DBSCAN algorithm is quite straightforward, by choosing the noise points (-1) as anomalies, but what happens for the other algorithms for k-means, GMM, how could you validate the particular threshold value? Eg. For GMM the most of the cases will be a threshold of 0.95.
Thanks
Hi!
Thanks for reaching out. Usually for K-Means and GMM, you would have to pre-define the number clusters. For a time-series data that is not always feature. One possible way is to just consider two different clusters (anomalous and not anomalous) but not sure how much effective it will be and may depend on the dataset. I would be happy to discuss if you have conducted similar experiments on this line.
Thanks,
Aditya
Hello,
I have to do a project where I have to detect bad data points from good data points. Time series data. I want to share more by email. Can you pease give your email address ? my one : akhter.rafia1@gmail.com
Hi there!
Thanks for reaching you. Feel free to contact me through various options mentioned here: https://aditya-bhattacharya.net/contact-me/
I will be happy to discuss further!
Thanks,
Aditya
Hello Aditya,
First of all, thank you for your good post.
in the code under the section statistical profiling approach, may you tell me what ”grouped_series[‘mm-yy’]” means? what does it imply?
Also, you calculated (10000*month + year) and assigned it to the series which I mentioned above. why did you choose 10000? what does this specific number mean?
I appreciate it if you refer me to another reference explaining this approach, explaining both theory and code.
Best regards,
Mohammad
Hi Mohammad,
Thank you for your comment and questions. To answer your questions:
1. This step completely depends on the data. The data with which I was working on had multiple values for each date. Grouping by and taking the sum based on the date ensures that there is one unique value for a specific date. Say for 1-Jan-2020, I have two values 3 and 6. After grouping by, I will have one value for each unique date i.e. for 1-Jan-2020 it is 9 in this case.
2. Again the data that I was using had values given in aggregated units, which I wanted to break as complete values. Say, the values are given in thousands like 10K, instead of using the values as 10, I preferred taking as 10*1000 = 10,000.
3. On top of my mind I don’t think there is any article other than mine, but the idea is to map outliers considering trends and seasonal effects and detecting the statistical upper and lower bounds based on both magnitude and frequency, so that anything beyond these bounds are classified as an anomaly. Please feel free to contact me through various options mentioned here: https://aditya-bhattacharya.net/contact-me/
I will be happy to discuss further!
Thanks,
Aditya
Hi Aditya, Great article.
Just had a query regarding the statistical profiling approach.
For threshold we are considering maximums and minimums between
a. Median of max monthly values
b. Mean of Rolling Averages of previous 2 months
c. Mean of Rolling Avg * 0.5 SD
In ‘C’ above, any specific reason behind taking 0.5 times the SD??
As usually we consider 1,2, 3 SD.
Can you provide some references of books or articles regarding this approach.
Thanks
Hi Varad,
Thanks for reaching out. To answer your question, usually considering Nelson’s rule, any point 3 STD apart is considered as an outlier. But in practice, I have seen that for datasets which are less volatile and have consistent outcome, 3 std is too much! Rather considering any value between 0.5 and 1 std is more appropriate. So, that’s the intuition behind 0.5 std.
For books I am not sure about any book mentioning this approach as such. But Bollinger bands try to follow a similar approach.
Hope this helps!
Thanks,
Aditya
Hi Aditya, thank you for sharing this knowledge!
Can you elaborate further on the window based approach? Is this subsetting the time series data into n different blocks and performing outlier detection on each individual block? If so, how does one decide to partition the time series data?
Hi Austin,
Yes, it is like considering a smaller temporal segment of the time series data and estimating anomaly for the temporal segment. For deciding the window period, you would need to have domain knowledge about the seasonal variation. For example, in most countries financial year is considered to be from April to March. So, you would need to select the window to map the timeframe of April to March. Hope this helps!
Best Regards,
Aditya