Are AI language models capable of doing financial forecasting?

Considering LLMs are basically massively big statistical machines, a logical assumption would be that they should be well suited for predicting how the economy and the stock market develops. Let’s conduct a small test to see if that holds for the largest generic LLMs of today.

Let’s ask multiple models this: The USD to CAD exchange ratio is as of now 1.3917. Considering everything you already know about what affects the price of USD and CAD, and how it is likely to develop, what do you predict the exchange rate to be in 7, 30 and 90 days? Don’t use ranges in your reply, but give your best estimate with three decimal accuracy.

Additionally, let’s ask all the models that have the capability to do lookups online this follow-up question so they are given a chance to revise their prediction with the most recent data they have access to:

When you include all additional information you can access online, how would you revise the prediction? Give your best estimate with tree decimal accuracy.

Also, to test if the answer is actually based on any kind of rational and consistent thought, let’s repeat prompts in three separate chat sessions to see if the LLM arrives at the same prediction, or if it is just hallucinating and giving random replies.

Chat session	Model	7-day outlook	30-day forecast	90-day forecast
1	ChatGPT 4o	1.394	1.392	1.386
	+ online	1.391	1.396	1.405
2	ChatGPT 4o	1.394	1.398	1.405
	+ online	1.390	1.400	1.410
3	ChatGPT 4o	1.395	1.400	1.410
	+ online	1.396	1.400	1.410
4	Gemini	1.395	1.402	1.410
	+ online	1.395	1.405	1.415
5	Gemini	1.395	1.402	1.410
	+ online	1.400	1.410	1.420
6	Gemini	1.395	1.402	1.410
	+ online	1.398	1.405	1.415
7	Perplexity	1.389	1.395	1.378
	+ revised	1.393	1.398	1.402
8	Perplexity	1.389	1.395	1.378
	+ revised	1.388	1.392	1.378
9	Perplexity	1.389	1.395	1.378
	+ revised	1.392	1.399	1.404
10	Grok 2	1.405	1.412	1.427
	+ revised	1.398	1.405	1.415
11	Grok 2	1.403	1.418	1.428
	+ revised	1.403	1.423	1.435
12	Grok 2	1.405	1.420	1.435
	+ revised	1.403	1.415	1.428
13	Meta AI	1.395	1.388	1.382
	+ revised	1.395	1.393	1.386
14	Meta AI	1.395	1.393	1.386
	+ revised	1.395	1.393	1.385
15	Yi-Lightning	1.390	1.395	1.402
16	Yi-Lightning	1.393	1.397	1.401
17	Yi-Lightning	1.390	1.386	1.385

Analysis

As we can see, ChatGPT, Gemini and Grok predicted a rising trend, while Perplexity and Yi-Lightning were a bit more mixed. Meta AI was the only model with a consistent downward prediction. Almost all the models were surprisingly consistent in the 7-day prediction in different chats. Meta AI had by far the highest consistency, and it kept giving the same replies when asked repeatedly, while ChatGPT and Gemini appeared to converge on a set of predictions and eventually kept repeating the same numbers. The chat sessions were within a couple of hours, so it as an external tester it is hard to say if the models actually became consistent, or if the systems were caching the replies and thus ended giving the same results repeatedly.

I also tested Claude 3.5 Sonnet, but it refused to give any predictions at all, and instead offered to discuss which factors affect the exchange rate. Such an response from Claude is probably the best answer any LLM should give at the moment.

Actual results

Assuming there are no new wars or pandemics, and the existing trends in interest rates, employment rate, oil price and such continue, at least one of the predictions above should turn out to be correct.

Date	USD-CAD	Closest estimate
Nov 11th, 2024	1.3917	ChatGPT (1.391)
Nov 18th, 2024	1.402	Gemini (1.402)
Dec 11th, 2024	1.417	Grok 2 (1.418)
Feb 11th, 2025	1.429	Grok 2 (1.428)

Conclusion

This is of course just a quick test to get a sense of how the LLMs predict, and by no means a reliable study on how well LLMs can be used to predict financial markets.

It is however enough data to show that:

The same model tends to give different predictions when prompted several times. This alone indicates that the models’ probably do not have some kind of latent internal understanding of financial markets, and thus the generic LLMs do not possess hidden superpowers to predict it.
No model was consistently more accurate than others. The LLM’s ChatGPT, Gemini and Grok 2 got one value out of 4 correct, which can be attributed to pure luck.

However, there is nothing preventing a modern day Jim Simons from building an AI specialized on financial data and giving consistent and reliable predictions. And undoubtedly many are already working on this, as the financial incentives are high. It is actually beneficial for the greater good too. How well a market economy works depends largely on how efficiently price arbitrage and allocation of resources happen. If the majority of the world’s capital just goes mechanically into index tracking ETFs, it could lead to massive self-reinforcing asset bubbles. More intelligence is needed for the “invisible hand of markets” to play out properly.

But current generic LLMs, despite being massively big, do not seem to possess this capability. As with most other LLM applications, they seem to be good at generating convincing looking contents, and help humans in finding information and assisting in simple tasks. LLMs might occasionally outperform stupid or lazy people, but to have true progress, we still need humans with original ideas and good judgement. If not otherwise, then at least good judgement to choose which of the LLM generated results are accepted and acted upon.

Analysis

Actual results

Conclusion

Hey if you enjoyed reading the post, please share it on social media and subscribe for notifications about new posts!