Alternatively, experimenting with further layers to learn such policies autonomously may ultimately yield greater benefits, as indeed may simply altering the number of layers and neurons, or the loss functions, in the current architecture. Maximum drawdown registers the largest loss of portfolio value registered between any two points of a full day of trading. The performance results for the 30 days of testing of the two Alpha-AS models against the three baseline models are shown in Tables 2–5. All ratios are computed from Close P&L returns (Section 4.1.6), except P&L-to-MAP, for which the open P&L is used.


  • DRL is widely used in the algorithmic trading world, primarily to determine the best action to take in trading by candles, by predicting what the market is going to do.
  • However, on 13 of those days Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances.
  • Inventory management is therefore central to market making strategies , and particularly important in high-frequency algorithmic trading.
  • It must explore actions in different states and record how the environment responds in each case.
  • Discover a faster, simpler path to publishing in a high-quality journal.


A notable example is Google’s AlphaGo project , in which a deep reinforcement learning algorithm was given the rules of the game of Go, and it then taught itself to play so well that it defeated the human world champion. AlphaGo learned by playing against itself many times, registering the moves that were more likely to lead to victory in any given situation, thus gradually improving its overall strategies. The same concept has been applied to train a machine to play Atari video games competently, feeding a convolutional neural network with the pixel values of successive screen stills from the games . Gen-AS performs better than the baseline models, as expected from a model that is designed to place bid and ask prices that minimize inventory risk optimally given a set of parameter values that are themselves optimized periodically from market data using a genetic algorithm. The AS algorithm is static in its reliance on analytical formulas to generate bid and ask quotes based on the real-time input values for the market mid-price of the security and the current stock inventory held by the market maker. These formulas have fixed parameters to model the market maker’s aversion to risk and the statistical properties of market orders.



Buy low, sell high: A high frequency trading perspective



At this point the trained neural network model had 10,000 rows of experiences and was ready to be tested out-of-sample against the baseline AS models. The data for the first use of the genetic algorithm was the full day of trading on 8th December 2020. We performed genetic search at the beginning of the experiment, aiming to obtain the values of the AS model parameters that yield the highest Sharpe ratio, working on the same orderbook data. The data on which the metrics for our market features were calculated correspond to one full day of trading . The selection of features based on these three metrics reduced their number from 112 to 22 We’d like to take a moment to thank you for visiting We appreciate your interest and business tremendously. With the help of our highly skilled specialists, you will be able to rapidly sell your property while still making a profit. Thanks to our advertising and networking initiatives, we have attracted an audience of genuine buyers who are eager to close a transaction as soon and profitably as feasible. Because of this, we were able to expand our company. Our seasoned specialists will negotiate the required arrangements and draft the relevant documents to increase the value of your home. We’ll highlight your home’s finest attributes in an online listing that will stay up for a long period and attract serious buyers. While we await additional instructions, please know that we have received your product orders and are trying to fulfill them. Visit



The Alpha-AS agent receives an update of the orderbook every time a market tick occurs. The Alpha-AS agent records the new market tick information by modifying the appropriate market features it keeps as part of its state representation. The agent also places one bid and one ask order in response to every tick.



Performance of the algo-trading system



More recently, Baldacci et al. have studied the optimal control problem for an option market LTC maker with Heston model in an underlying asset using the vega approximation for the portfolio. For more developments in optimal market making literature, we refer the reader to Guéant , Ahuja et al. , Cartea et al. , Guéant and Lehalle , Nyström and Guéant et al. . Indeed, this result is particularly noteworthy as the Avellaneda-Stoikov method sets as its goal precisely to minimize the inventory risk.


day of trading


Double DQN is a deep RL approach, more specifically deep Q-learning, that relies on two neural networks, as we shall see shortly (in Section 4.1.7). In this paper we present a double DQN applied to the market-making decision process. Typically, in the beginning the agent does not know the transition and reward functions. It must explore actions in different states and record how the environment responds in each case.



The figures represent the percentage of wins of one among the models in each group against all the models in the other group, for the corresponding performance indicator. This is obtained from the algorithm’s P&L, discounting the losses from speculative positions. The Asymmetric dampened P&L penalizes speculative positions, as speculative profits are not added while losses are discounted.


For mature, such as the U.S. and Europe, the real-time LOB is event-based and updates at high speed of at least milliseconds and up to nanoseconds. The dataset from the Nasdaq Nordic stock market in Ntakaris et al. contains 100,000 events per stock per day, and the dataset from the London Stock Exchange in Zhang et al. contains 150,000. In contrast, exchanges in the Chinese A-share market publish the level II data, essentially 10-level LOB, every three seconds on average, with 4500–5000 daily ticks. This snapshot data provides us with the opportunity to leverage the longer tick-time interval and make profits using machine learning algorithms. The Avellaneda-Stoikov procedure underpinning the market-making actions in the models under discussion is explained in Section 2. Section 3 provides an overview of reinforcement learning and its uses in algorithmic trading.



More from Open Crypto Trading Initiative



Additionally, the strategy implements an order size adjustment algorithm and its order_amount_shape_factor parameter as described in Optimal High-Frequency Market Making. The strategy is implemented to be used either in fixed timeframes or to be ran indefinitely. A second contribution is the setting of the initial parameters of the Avellaneda-Stoikov procedure by means of a genetic algorithm working with real backtest data. This is an efficient way of arriving at quasi-optimal values for these parameters given the market environment in which the agent begins to operate.



Using the Avellaneda-Stoikov model as an example, we show how dealers can adjust quotes to predictions and thereby capture larger spreads at constant volume. Simulations on historical limit order book data illustrate that our model allows dealers to both increase market making revenues through trade flow-optimized positioning in the order book and reduce adverse selection cost through preempted adverse price movements. The avellaneda stoikov model seems to be way too simplistic to be practical in a lot of products. For example, in products with larger tick size, the queue priority will be significantly more important than distance from price in terms of fill probability.



The ranges of possible avellaneda stoikov of the features that are defined in relation to the market mid-price, are truncated to the interval [−1, 1] (i.e., if a value exceeds 1 in magnitude, it is set to 1 if it is positive or -1 if negative). Balancing exploration and exploitation advantageously is a central challenge in RL. Hasselt, Guez and Silver developed an algorithm they called double DQN.


Should you hedge or should you wait? –

Should you hedge or should you wait?.

Posted: Wed, 24 Aug 2022 07:00:00 GMT [source]


A continuous action space, as the one used to choose spread values in , may possibly perform better, but the algorithm would be more complex and the training time greater. Following the approach in López de Prado , where random forests are applied to an automatic classification task, we performed a selection from among our market features , based on a random forest classifier. We did not include the 10 private features in the feature selection process, as we want our algorithms always to take these agent-related (as opposed to environment-related) values into account.



We avellaneda stoikov the market-agent interplay as a Markov Decision Process with initially unknown state transition probabilities and rewards. If users choose to set the eta parameter, order sizes will be adjusted to further optimize the strategy behavior in regards to the current and desired portfolio allocation. This value is defined by the user, and it represents how much inventory risk he is willing to take. Topics in stochastic control with applications to algorithmic trading. PhD Thesis, The London School of Economics and Political Sciences. And for the stock price dynamics which are provided in each model definition.


  • This is generally achieved by applying various root-finding algorithms that can handle the complexity and high-dimensionality of the equation.
  • The 10 generations thus yield a total of 450 individuals, ranked by their Sharpe ratio.
  • Instead of investing the same proportion consistently, we devise an optimization scheme using the fractional Kelly growth criterion under risk control, which is further achieved by the risk measure, value at risk .
  • The results obtained suggest avenues to explore for further improvement.