The concept of Machine Learning is not all that new, though better tools and technology have opened the door to new use cases and applications. It can be daunting, but the best place to start is with an abundance of data. While some data may not seem particularly important, if it is easily accessible it is better to include it in your (or your customer’s) initial gathering.
For the purpose of this blog post, all data was scrambled.
In our recent experience with one of our Dynamics AX Plastics customers, we began with:
- Invoice quantity and date
- Base material
- Customer information
- Product information
The goal was then to build a sales forecast, by base material, 12 months in advance. We have used Azure Machine Learning Studio before and the ease of getting straight into action raises it above other available tools, so it was our natural choice as a Microsoft Partner. From the onset of the project, we took a very open approach to what we might find. Considering the custom nature of the finished products produced, there was never any real expectation of being able to predict anything.
For the overall approach, the first step was to create a time series forecasting model. This means all we were looking at initially was the historical data, without taking into account any external influences. From just this model and visual inspection, we were able to determine our data had a trend and exhibited seasonality:
From there, we broke things down into three components:
While there was a clear seasonality to the data, it needed to be determined whether the random component was significant enough to warrant further analysis. To do this, the random component and seasonal component were compared directly.
Because the seasonal component far outweighed the random, we were able to move on using two different time series models that yielded similar results:
- ARIMA model
- Holt-Winters seasonal model
Other models exist, but these two were the best performing in this case.
Cross-validation is a necessary step in predictive analytics, which allows us to assess how well the model is able to predict, versus simply fitting the data (see the problem of “overfitting”). To perform cross-validation, data is split into 2 sub-sets: Training and Testing. We have decided to use the last 12 months of the data set as a Testing set, while the rest was used as Training.
The results gave us a MAPE (Mean Absolute Percent Error) for each model. The graphs visualize how the predicted values (blue line) fit the actual data:
- Mean (baseline) = 8.52%
- ARIMA = 3.9%
- Holt-Winters = 6.5%
After starting with the notion that the custom nature of the products makes the resource planning unpredictable, our best model came within 3.9% of predicting the base materials used. While in this case 3.9% is very good, how much further you go will vary from situation to situation. When we began this project, there was no specific pressing problem that needed to be solved. In other scenarios, like the Flint, Michigan water crisis, there is.
To learn more about Machine Learning and the tools we used on this project, check out our session in Nashville next month at AXUG Summit 2017!