February 16, 2023

How DeepFeatTimeGPT is disrupting financial forecasting

Technical report
Use case
Decision Augmentation

Introducing DeepFeatTimeGPT

The field of financial forecasting has long been dominated by traditional methods linear regression models, spreadsheets, and manual work. However, recent advancements in deep learning have opened up new possibilities for more accurate and efficient forecasting. One such innovation is DeepFeatTime, a multimodal, probabilistic forecasting system based on a neural network architecture.

What sets DeepFeatTimeGPT apart from other forecasting models is its ability to incorporate multiple data sources and modalities. The neural network can analyze both structured and unstructured data to generate more accurate predictions. Furthermore, the system is probabilistic, meaning it can provide a range of possible outcomes and their associated probabilities, rather than a single-point estimate and thereby account for uncertainty.

The training data

DeepFeatTimeGPT is built on a massive dataset consisting of nearly 1 billion months of financial history from 638,000 companies in 47 sectors and 744 industries across 100 countries. This data encompasses 78 different measures, from revenue and EBITDA to gross margin and enterprise value.

The scale of the training data used for DeepFeatTime is staggering. To put it into perspective, it would take a human with an average reading speed (ca. 250 words/min) of around 636,900 hours or over 72 years of nonstop reading 24/7 to go through all the data.

Assuming an average of 1,716 hours of work per full-time equivalent (FTE) per year (OECD avg. ), reading through the massive dataset used to train DeepFeatTime would take the equivalent of around 371 working years. If a conservative hourly rate of $250 is used, the cost of hiring a strategy consultant or investment analyst to analyze this data would amount to approximately $159,225,000.

It is important to note that this cost only accounts for the time spent reading the data and does not include the cost of understanding and interpreting the information stored within it.

The state-of-the-art benchmark

To assess the effectiveness of DeepFeatTimeGPT in comparison to traditional methods used by business strategy consultants or investment banks, we conducted a benchmarking study using a dataset of published forecasts by investment banks. The dataset comprises forecasts for three central financial time series commonly used for corporate finance and valuation purposes, which are relevant to our company.

  • Revenue is a key financial metric that represents the total amount of money earned by a company from the sale of its products or services. It is one of the top line items in a company's income statement and is calculated by multiplying the number of goods or services sold by their respective prices. Revenue is an important indicator of a company's performance as it reflects the company's ability to generate income and grow its business.
  • EBITDA (Earnings Before Interest, Taxes, Depreciation, and Amortization) is a financial metric used to measure a company's operating performance. It is calculated by taking the company's revenue and subtracting its operating expenses, excluding interest, taxes, depreciation, and amortization.
  • EBT (Earnings Before Taxes) is another financial metric used to measure a company's operating performance. It is calculated by taking the company's revenue and subtracting all operating expenses, including depreciation and amortization, but before taking into account any taxes paid.

The dataset consists of 407,500 data points for 11,443 public companies across 70 countries, 47 sectors, and 209 industries. The number of investment banks included in the dataset ranges from 2 to 49, with a median of 12. Each forecast is provided for four periods (quarters) into the future, and the average length of the forecasting data is 35.6, with a median of 23 and a maximum of 100. The forecasting data ranges from Q1/1992 up to Q4/2022.

The forecasts provided by analysts are constructed as 95% confidence intervals with an expected (median), lower (5%), and upper (95%) value for every point in time. To evaluate the performance of DeepFeatTime, we created several subsets for analysis. The full dataset includes all 11,443 companies, and a subset comprising only S&P 500 firms. For each subset, we created a set of all available measures, including EBITDA, EBT, and revenue, as well as a revenue-only set. The included investment banks in the benchmarking data are displayed in Table 1.

The benchmarking study provides a comprehensive comparison of DeepFeatTime's forecasting accuracy against traditional methods. The dataset's large size and broad scope allow for a robust evaluation of DeepFeatTime's performance across various sectors, industries, and geographies. The results of the study will help validate the effectiveness of DeepFeatTimeGPT and guide its application in real-world financial forecasting scenarios.

Evaluation metrics

We evaluated the performance of DeepFeatTimeGPT using two central performance measures for forecasting: sMAPE and coverage.


The symmetric mean absolute percentage error (sMAPE) is an accuracy measure based on percentage errors. It is computed as the mean of the absolute differences between the actual and predicted values, divided by the sum of the absolute values of the actual and predicted values, multiplied by 200%. The median forecast is used for calculating the sMAPE.


Coverage, on the other hand, refers to the proportion of times that a prediction interval contains the actual outcome. A prediction interval is a range of values that an event or outcome is expected to fall within with a certain degree of probability. In probabilistic forecasting, coverage is an important metric for assessing the accuracy and reliability of the forecast model.

Evaluation setting

DeepFeatTime is evaluated through zero-shot testing, meaning that the model is tested on a data series it has not encountered previously. During the benchmark study, the model sees the data from the companies for the first time. The purpose of this approach is to evaluate the model's ability to generalize to new data, rather than relying on training the model on the same data that it is evaluated on. This evaluation method helps to determine the model's generalization capability and robustness, which are essential for real-world forecasting applications, where the future data distribution is unknown.


The superior performance of our DeepFeatTime model compared to the collective intelligence of top investment banks can be attributed to its multimodal character and extensive training. The model's ability to learn from a vast amount of data during training has resulted in its superior performance in forecasting across all datasets and metrics. The results of the comparison are displayed in table 2.

DeepFeatTimeGPT's ability for long-horizon forecasting

We conducted our benchmark against 4-period forecasts as that is the maximum available data. In addition, we also evaluated DeepFeatTimeGPT performance in predicting longer time horizons, a critical task for use cases such as business planning, strategy, and M&A valuations. To assess its ability to forecast several years into the future, we backtested the model's performance on time horizons of 4, 8, 12, 18, and 24 periods. Our model delivered outstanding results even for time horizons of several years into the future. Table 3 displays the results, while the detailed distributions are shown in the following figures.

Efficiency and speed benchmarking

Apart from assessing the forecasting accuracy of DeepFeatTime, we also evaluated its speed and efficiency, which are crucial factors in dynamic and uncertain business environments. DeepFeatTimeGPT is integrated into an end-to-end forecasting pipeline that streamlines data collection, processing, and forecasting, enabling the entire financial forecasting process to be executed with just one click. We measured both the inference time required to generate a forecast and the end-to-end time required to collect, clean, process, forecast, and visualize the data. Our results are displayed in Table 4.

To conduct our efficiency benchmarking, we ran the model on a GPU instance (g4dn.xlarge) on AWS, which has a cost of $0.526 per hour. To estimate the cost of an analyst, we used the average cost of $77 for an investment analyst in the US according to Glassdoor (2023). For the time requirements of analysts, we made assumptions based on our own experience working with clients and conducting interviews. We used optimistic estimates of 10 minutes for a single forecast of one financial measure and 60 minutes for the end-to-end process of collecting, preparing, processing, analyzing, and visualizing financial data. It is important to note that although our benchmarking was limited to forecasting 3 metrics, our model is optimized to work out of the box for 78 financial performance measures.

The future of financial forecasting is hybrid

Our DeepFeatTimeGPT model is a powerful tool for financial forecasting, but it is not intended to replace human analysts. Rather, it is designed to complement and enhance their capabilities, enabling them to focus their time and expertise where it is most needed. The model provides accurate and reliable predictions by quickly analyzing vast amounts of financial data, identifying trends and patterns, and making predictions in seconds. This frees up human analysts' time and allows them to interpret the results, apply their business knowledge, and make strategic decisions.

By combining the strengths of our model with the expertise of human analysts, we can achieve superior results in financial forecasting. Our model enables faster, more accurate, and cost-effective predictions, while human analysts can provide the insight and context necessary to apply those predictions to real-world scenarios. With Hybrid Intelligence®, we can drive greater success and growth for businesses and investors, powered by the synergy of human intuition and AI augmentation.