Sales and Demand Forecast Analysis

by Aditya
in Data Science, Time Series Analysis
on July 20, 2020

Business Forecasting and analysis is one of the old industrial problems that has known to exist over a really long time. With the advancement in statistical and business analytical approaches, organization have known to seen a great improvement in the overall business forecasting process. With availability of large volume of curated data, data driven business insights has been a new norm and now with the advancement in machine learning and deep learning, automated algorithms are producing better results and insights than human analysts.

In this series of articles I will be discussing about various techniques and methods that can be leveraged to improve your Business Forecasting process with a coding example either implemented in Python or R. These series of articles are not only targeted for business managers or executives, but also to data scientists, data analysts and academic researchers so that these methods can be leveraged at scale, and the community can work together to solve these age old problems at scale!

Topics :

Demand Planning Segmentation
Sales and Demand Anomaly Detection
Demand Forecasting Metrics
Machine Learning Based Sales and Demand Forecasting
Statistical Algorithms for Demand Planning
Deep Learning driven Business Forecasting

Pages: 123

Tags: ABC XYZ Segmentation, ABC-XYZ Analysis with Python, Covid19 time series sales and demand analysis, Covid19 time series sales and demand anomaly detection, Covid19 time series sales and demand forecast, Sales and Demand Forecast Analysis, sales forecasting, time series data analysis, time series forecasting

14 Responses

arihant says:

March 31, 2021 at 1:57 pm

You’ve not uploaded either the correct code or the output because if your run the notebook outputs and plots are varying largely. Hence, not understandable. If you could please update the same.
Thanks

Reply
- Aditya says:
  
  April 3, 2021 at 8:21 am
  
  Hello Arihant,
  
  The purpose was not upload the notebook to try it as it is on the same problem, but to refer this approach and apply it on your own problem and dataset. So, you would have to be very specific about which part you are not able to understand and you would have to share in more details about your problem and where you have tried to apply it. Only then I will be able to help yo in the best possible way.
  
  Thanks,
  Aditya
  
  Reply
sri says:

May 10, 2021 at 1:59 pm

Hi Aditya,
Great stuff. Really appreciate your work. I was following through and i am getting one error can you please let me know what i need to fix?

# Using a simple Seasonal ARIMA model to highlight the idea, in the actual world, the model has to best fit the data
train, test = train_test_split(ts_dataframe, train_size=84)

mape = MAPE(test.values.reshape(1,-1)[0], forecast)

NameError: name ‘forecast’ is not defined

thanks
Sri

Reply
- Aditya says:
  
  May 18, 2021 at 7:32 am
  
  Hey, thanks for reaching out. The forecast variable is taking the time series forecast using SARIMA as the model. But the purpose of the tutorial was to focus on time series anomaly detection methods, and so the forecast method can be anything. Please refer this tutorial only to refer the TS anomaly detection methods and implement the same on your own data!
  
  Reply
Noveenaa says:

June 25, 2021 at 2:16 am

Hi,

I am more curious to know about the unsupervised clustering approach, the DBSCAN algorithm is quite straightforward, by choosing the noise points (-1) as anomalies, but what happens for the other algorithms for k-means, GMM, how could you validate the particular threshold value? Eg. For GMM the most of the cases will be a threshold of 0.95.

Thanks

Reply
- Aditya says:
  
  June 26, 2021 at 11:09 pm
  
  Hi!
  
  Thanks for reaching out. Usually for K-Means and GMM, you would have to pre-define the number clusters. For a time-series data that is not always feature. One possible way is to just consider two different clusters (anomalous and not anomalous) but not sure how much effective it will be and may depend on the dataset. I would be happy to discuss if you have conducted similar experiments on this line.
  
  Thanks,
  Aditya
  
  Reply
RAFIA AKHTER says:

June 29, 2021 at 10:59 am

Hello,
I have to do a project where I have to detect bad data points from good data points. Time series data. I want to share more by email. Can you pease give your email address ? my one : akhter.rafia1@gmail.com

Reply
- Aditya says:
  
  July 1, 2021 at 4:53 pm
  
  Hi there!
  
  Thanks for reaching you. Feel free to contact me through various options mentioned here: https://aditya-bhattacharya.net/contact-me/
  I will be happy to discuss further!
  
  Thanks,
  Aditya
  
  Reply
Mohammad says:

July 23, 2021 at 6:38 pm

Hello Aditya,
First of all, thank you for your good post.

in the code under the section statistical profiling approach, may you tell me what ”grouped_series[‘mm-yy’]” means? what does it imply?

Also, you calculated (10000*month + year) and assigned it to the series which I mentioned above. why did you choose 10000? what does this specific number mean?

I appreciate it if you refer me to another reference explaining this approach, explaining both theory and code.

Best regards,
Mohammad

Reply
- Aditya says:
  
  July 28, 2021 at 9:53 am
  
  Hi Mohammad,
  
  Thank you for your comment and questions. To answer your questions:
  1. This step completely depends on the data. The data with which I was working on had multiple values for each date. Grouping by and taking the sum based on the date ensures that there is one unique value for a specific date. Say for 1-Jan-2020, I have two values 3 and 6. After grouping by, I will have one value for each unique date i.e. for 1-Jan-2020 it is 9 in this case.
  2. Again the data that I was using had values given in aggregated units, which I wanted to break as complete values. Say, the values are given in thousands like 10K, instead of using the values as 10, I preferred taking as 10*1000 = 10,000.
  3. On top of my mind I don’t think there is any article other than mine, but the idea is to map outliers considering trends and seasonal effects and detecting the statistical upper and lower bounds based on both magnitude and frequency, so that anything beyond these bounds are classified as an anomaly. Please feel free to contact me through various options mentioned here: https://aditya-bhattacharya.net/contact-me/
  I will be happy to discuss further!
  
  Thanks,
  Aditya
  
  Reply
Varad says:

February 8, 2022 at 6:05 pm

Hi Aditya, Great article.
Just had a query regarding the statistical profiling approach.
For threshold we are considering maximums and minimums between
a. Median of max monthly values
b. Mean of Rolling Averages of previous 2 months
c. Mean of Rolling Avg * 0.5 SD

In ‘C’ above, any specific reason behind taking 0.5 times the SD??

As usually we consider 1,2, 3 SD.

Can you provide some references of books or articles regarding this approach.

Thanks

Reply
- Aditya says:
  
  February 20, 2022 at 12:01 am
  
  Hi Varad,
  
  Thanks for reaching out. To answer your question, usually considering Nelson’s rule, any point 3 STD apart is considered as an outlier. But in practice, I have seen that for datasets which are less volatile and have consistent outcome, 3 std is too much! Rather considering any value between 0.5 and 1 std is more appropriate. So, that’s the intuition behind 0.5 std.
  For books I am not sure about any book mentioning this approach as such. But Bollinger bands try to follow a similar approach.
  
  Hope this helps!
  
  Thanks,
  Aditya
  
  Reply
Austin says:

March 8, 2022 at 2:18 am

Hi Aditya, thank you for sharing this knowledge!

Can you elaborate further on the window based approach? Is this subsetting the time series data into n different blocks and performing outlier detection on each individual block? If so, how does one decide to partition the time series data?

Reply
- Aditya says:
  
  March 8, 2022 at 2:52 pm
  
  Hi Austin,
  
  Yes, it is like considering a smaller temporal segment of the time series data and estimating anomaly for the temporal segment. For deciding the window period, you would need to have domain knowledge about the seasonal variation. For example, in most countries financial year is considered to be from April to March. So, you would need to select the window to map the timeframe of April to March. Hope this helps!
  
  Best Regards,
  Aditya
  
  Reply