How to Get Started with Machine Learning in FP&A?

There has been a lot of literature about how artificial intelligence / machine learning (AI/ML) can change the finance function, but not a lot of details about how to get started. This series of articles will cover best practices and prerequisites of using AI/ML in finance.

The first article will cover the financial forecasting process in companies and how AI/ML can help with it. 

Why Do Organizations Need Machine Learning?

To start with, we should use the term machine learning (ML) in this context rather than Artificial intelligence (AI). Machine learning is a subset of Artificial intelligence and is a technique to realize AI.

Finance data has significant seasonal and trends which makes it well suitable for applying machine learning. Machine learning algorithms use statistics to analyse data and patterns to predict the future outcome.

In large companies, financial forecasting is a bottoms-up monthly process, involving the work of hundreds of analysts. This laborious process consists of refining assumptions in Excel, generating base data, aggregation, aligning with budget etc. The accuracy of human-generated forecasts is impacted by the decentralized process of analysing data, unknown and uncertainty of the business environment. Machine learning enables users to analyse big sets of data to find patterns and correlations.

It is important to note that data clean-up is a crucial task for any machine learning project. Fortunately, finance data in large companies is managed in enterprise resource planning software (ERP), which makes the data very structured and clean. ERPs have detailed transactional data, whereas consolidated systems have aggregated balances and income statement level data. 

Data is the basis for a forecasting process and there are several questions that should be answered before starting such a project:

  • What granularity of data is needed?
  • What is an optimal forecast horizon?
  • What models can be used for prediction?
  • How accurate the results can be?

So, let’s explore these questions in more details.

What granularity of data is needed?

Transactional data which comprises of all chart of accounts (COA) has more details but takes a lot of memory and processing power which results in longer duration to generate forecasts. It’s better to start with aggregated data (P&L) which has only critical COA like accounts, legal entities etc. and contains enough details to find seasonality and trends.

For example, let’s consider data from each account in an income statement and process it through the machine learning model to predict the next month’s expense. You can start with airline expense and then run the same prediction process to generate a full income statement.

Since there are an average of 200 revenue & expense accounts, it might take 4–8 hours to generate a full income statement. This time can be significantly reduced by using cloud services and faster processing power. As the next step, you need to add detailed transactional data and compare results. 

How much data is needed? 

For machine learning to analyze seasonality and trend changes meaningfully, there needs to be at least 4 cycles of seasons. For financial data, we need at least 4 years of monthly data which will result in 48 values for each account. Going into much more granular data like days and weeks might not help as most of the financial data is posted in the last few weeks of the month because of sub-ledger and ledger close.

What is an optimal forecast horizon?

Forecast horizon is the future period of time for which a forecast is generated. FP&A prefers to forecast for next month or for 3 months (quarter). Error rate drastically increases for each additional month of horizon. So, initially, it is recommended to start with next month forecasting and slowly examine the accuracy for other months. Accuracy is calculated with a simple formula (forecast – actuals) / actuals.

Which models can be used?

Model selection is an important process in a machine learning project. This process can be automated by various tools available in the market. For example, the following models give good results for time series financial data:

  1. Regression: Predictive model which investigates the linear relation between output and time.
  2. SARIMAX:  SARIMAX forecasts future values based on past values, that is, lags and the lagged forecast errors.
  3. LightBGM: LightGBM is a gradient boosting framework that uses tree-based learning algorithms.
  4. Recurrent Neural networks (RNN): Recurrent neural networks (RNNs) are a class of neural networks well known for processing of time-series data and other sequential data.

A suitable model is selected when it provides consistent accuracy for tests across various periods. 

How accurate can the results be?

Revenue and expense accounts show different trends, seasonality and periodicity behaviour. Generally, in an income statement, there are more expense accounts than revenues. Expenses vary less from month to month than revenues and can be predicted with better accuracy (95%). 

On the other hand, revenue accounts vary a lot from month to month and year to year, resulting in much higher error rates (10%-15%). Based on the above estimates, an ML generated income statement might vary from 85–95% of accuracy.

 I hope these pointers will help you in starting your machine learning journey in finance. In the next article, I will cover more details about various use cases.