Machine Learning, Algorithmic Trading, and Manipulation

Trading in financial markets is increasingly dominated by algorithms. They enable trading at speeds and levels of adaptiveness that are impossible for human beings. A key question for the legal system is whether these algorithms will disrupt the efficiency and integrity of markets, and if they will, whether existing regulation is well-suited to deterring misconduct. A key question for finance is to determine what market structures are most robust to manipulation by new algorithmic trading agents.

In our new working paper, we study the potential consequences of advanced algorithms trading in a financial market. Specifically, we analyze experimentally how algorithms trained through deep reinforcement learning behave when they trade. As our setting, we explore a market in which an algorithm trades directly, but also holds a contract whose price is based on a benchmark calculated from market transactions (such as a contract to sell a large number of corporate shares to an acquirer at the stock’s closing price on a specific day).[1] This is an economically and legally important setting, as trillions of dollars in real-world contracts are based on benchmarks calculated from market trades. We find that algorithms designed with a reward function to maximize profits – but with no other human-designed objective – develop profitable trading strategies, which if engaged in intentionally by a human trader, would likely constitute manipulation. In effect, they learn to “manipulate” without being given any direct instructions to do so. The algorithms trade heavily and unprofitably in the market but materially affect the benchmark’s price, producing a profit from their benchmark positions that exceeds their trading losses.

These results are worth noting for policymakers. Several commentators have now observed that because the law prohibiting manipulation turns on deliberate misconduct, it may be a poor fit for artificially intelligent trading algorithms.[2] Indeed, manipulation’s two core requirements are scienter (intent) and a “manipulative act,” but proving either of these elements for certain artificially intelligent algorithms may be difficult, both in theory and practice.

It has been unclear, however, whether this observation is based on speculation or is a practical, imminent threat. More importantly, if the problem is imminent, then it is important to begin exploring how “autonomously” manipulative algorithms actually work and how the legal system should adapt in responding to them. The latter effort will surely be helped by a more granular understanding of the mechanics and effects of how sophisticated algorithms behave.

We describe our experimental setup in a little more depth before turning to policy implications. To explore the question of what trading strategies algorithms may develop, we use an agent-based simulation to explore the behavior of algorithmic trading agents trained using deep reinforcement learning techniques. In our experimental setting, an agent trades directly in a market and also holds a portfolio of assets benchmarked to prices in that market. The market is modeled as a limit order book in which traders strategically interact. Besides the manipulator, the market includes other agents with private reasons to trade, and in a number of scenarios, a market-making intermediary that profits by connecting traders across time. The benchmark is based on the “volume weighted average price” (VWAP) of transactions in an asset. This is a popular benchmark design with various merits explored in the finance literature. Because it is based on actual transactions, it can also be directly affected by market participants’ trading behavior.

Algorithms are developed in a number of ways, including two algorithms independently trained using qualitatively different deep reinforcement learning techniques. The first is deep Q-network, which observes a continuous environment, selects a discrete action, and uses a deep neural network to develop a value function over state-action pairs. The second is deep deterministic policy gradient, an algorithm that also observes a continuous environment but selects actions from a continuous range; it uses an actor-critic method where the critic learns a value function that informs the actor’s parameter selection.

These algorithms, trained through deep reinforcement learning techniques, autonomously develop trading strategies that are plausibly manipulative. They trade heavily and unprofitably in the market but affect the benchmark’s price, producing a net profit from its benchmark positions. If done intentionally, an individual engaging in such trading would plausibly have committed unlawful securities manipulation, but as noted, the algorithm was not designed to artificially affect prices, only to maximize profits. We also quantify the impact on market welfare of manipulating transaction-based benchmarks in a simulated environment. The total surplus of trading participants in the market actually increases with manipulation, as the manipulator is willing to incur trading losses to affect the benchmark. One consequence of this is that the other traders lack an incentive to report the manipulator (unless they also hold a benchmarked position). The manipulator’s surplus from the benchmark significantly increases, on the other hand, which entails that it is the third parties invested in the benchmark that are the principal victims of the manipulation.

This is the first demonstration of an algorithm automatically learning to manipulate a financial benchmark. In a sense, we thus offer “proof of concept” that algorithms can autonomously develop manipulation-like trading strategies. It is worth emphasizing, however, that the deep-Q network and deep deterministic policy gradient algorithms were developed independently and constitute distinct demonstrations of this possibility. The fact that two successful manipulators were developed using qualitatively different reinforcement learning algorithms is evidence of the robustness of our finding that algorithms can autonomously learn to manipulate.

The International Organization of Securities Commissions and European Union have both developed policies to address the role of benchmarks as financial infrastructure in their Principles for Financial Benchmarks and Benchmarks Regulation, respectively. Both documents stress the importance of best practices in benchmark design. International regulators should thus have substantial interest in results along the lines studied here, as they develop guidance for benchmark design and policies for regulators in determining whether an algorithm has manipulated a benchmark. More concretely, the experimental study of algorithms can pinpoint which features of an algorithm’s design constitute the least costly and intrusive ways to reduce the possibility of manipulation ex ante or to identify an algorithm’s causal impact ex post. Likewise, it would be worthwhile to explore the success of algorithms in manipulating other financial benchmarks that derive their value from transactions based on far more complex calculations than VWAP.

Our approach also provides a complement to prior work. Scholars have employed theoretical models and historical data to study benchmark manipulation in financial markets. Using a simulated market allows us to incorporate complex details of market microstructure, representing the actual mechanics of trade, interactions among market participants, and the structure of the market. We can also consider the response of strategic agents to the presence of a benchmark manipulator and consider a wide range of market settings, benchmark designs, and trading strategy options.

A longer-term ambition for this research agenda is to explore which market structures and benchmark designs are most robust to manipulation by studying how they perform when a sophisticated algorithm attempts manipulation under (simulated) market conditions. This experimental approach can provide an important complement to theoretical work in finance that often has to abstract away from much of the detail of market structure to explore these questions. More broadly, we hope to illustrate the usefulness of an interdisciplinary approach that combines state-of-the-art computer science, finance, and law so as to inform policy.


[1] Financial benchmarks estimate market values or reference rates used in a variety of contexts.

[2] See, e.g., Alessio Azzutti, Wolf-Georg Ringe, H. Siegfried Stiehl, Machine Learning, Market Manipulation and Collusion on Capital Markets: Why the ‘Black Box’ Matters, 43 U. Pa. J. Int’l L. 79 (2021); Gina-Gail S. Fletcher, Deterring Algorithmic Manipulation, 74 Vand. L. Rev. 259 (2021); Tom C.W. Lin, The New Market Manipulation, 66 Emory L.J. 1253 (2017); Yesha Yadav, The Failure of Liability in Modern Markets, 102 Va. L. Rev. 1031, 1073-74 (2016).

This post comes to us from Megan Shearer, Gabriel Rauterberg, and Michael Wellman at the University of Michigan. It is based on their recent paper, “Learning to Manipulate a Financial Benchmark,” available here.