Pratik's Portfolio
Pratik's Portfolio is what I myself invest in every month, so it directly showcases my own risk tolerance. To give more context, I am currently 26 years old, debt-free, and don't have too much starting capital to lose. Hence, I want the highest possible return I can get, but I also I don't want to lose any money. This is a tall order, so I use machine learning, the most common type of Artificial Intelligence (AI), to guide my decision-making.
Pratik's Portfolio seeks to invest in what I predict to be the best out of my all portfolios every month -- the Conservative, Moderate, Aggressive, or sometimes something else entirely -- by using machine learning algorithms trained on reliable, diverse datasets updated weekly. Thus, the stock universe for Pratik's Portfolio covers both Sectors ETF's from the Conservative Portfolio and 3x Leveraged Funds from the Aggressive Portfolio. Its volatility, or standard deviation, sits somewhere in between the Moderate and Aggressive Portfolios (think 2x Leveraged Funds). Pratik's Portfolio will get better over time as I continue to improve my models.
​
To demonstrate the goal of Pratik's Portfolio, see this comparison chart:
​
​
​
​
​
The chart shows how the Conservative, Moderate, and Aggressive portfolios have been ‘scaled’ in different ways to accommodate different investor preferences. The Conservative provides great drawdown risk; the Moderate provides great risk-adjusted long-term returns; Aggressive provides great short-term returns. Each portfolio matches a specific risk/ return metric ('Similar') with what is expected of the S&P 500 Index, and tries to outperform the SPY Index on other metrics.
​
Pratik's Portfolio is unique in its mission: to outperform the SPY on every metric. Pratik's Portfolio's Compounded Annual Growth Rate is 16.03% to the SPY's 9.95% since 2008, while having a lower standard deviation of 12.20% to 16.11%. The graph below is provided by PortfolioVisualizer.com and is plotted month-to-month from January 2008 to January 2024, or 16 years of data (data limited by ETF availability).
Machine Learning
It has almost become industry standard in the trading space that there are 3 drivers of market price action: fundamentals, technicals, and investor sentiment. Perhaps this is best encapsulated in the Wall Street classic, Market Wizards (1984). Michael Marcus, the first legendary trader in the book, says:
​
"The best trades are the ones in which you have all three things going for you: fundamentals, technicals, and market tone. First, the fundamentals should suggest that there is an imbalance of supply and demand, which could result in a major move. Second, the chart must show that the market is moving in the direction that the fundamentals suggest. Third, when news comes out, the market should act in a way that reflects the right psychological tone. For example, a bull market should shrug off bearish news and respond vigorously to bullish news. If you can restrict your activity to only those types of trades, you have to make money, in any market, under any circumstances. All my profits come from the trades that met the criteria."
​
When I was working as a quantitative analyst at Citibank, AI was still a nascent field. It had yet to be demonstrated that machine learning could reliably improve predictive performance in any one of these categories. But recently, AI development has absolutely exploded in prevalence and shown great promise in almost every aspect of financial modeling. Prominent voices from academia, hedge funds, and even the Federal Reserve have made a compelling case for the use of ML in modeling and forecasting financial data. And many of these voices have made their research fully accessible to retail traders and students of finance, allowing us to harness the latest in AI R&D to improve our investable portfolios over time.
​
The authors below make the case for the MLM strategies that I use to create Pratik's Portfolio, i.e. to create a portfolio that seeks to outperform the SPY on all the risk and return metrics quoted above.
​
Article 1:
Aaron Smalter Hall, a researcher at the Federal Reserve Bank of Kansas City, shows how machine learning can be used to more accurately forecast the federal unemployment rate, a key co-incident indicator of US recessions. Recessions tend to be accompanied by a significant stock market crash, so this macro-economic, fundamentals - centered perspective should be important to all investors. The predictive power of a sharp rise in the unemployment rate is called the Sahm Rule and it is actively tracked by the Federal Reserve here.
​
Article 2:
Derek Snow is an alumnus of the Alan Turing Institute, the UK's Institute for Artificial Intelligence, and the Oxford-Man Institute of Quantitative Finance at Oxford University. He has led joint research projects with major banks and quantitative hedge funds, and has published several lines of open-source code with explanations on how to take advantage of many different technical anomalies in the stock market. And he specifically demonstrates how his MLMs can improve upon factor investing strategies - Sector Momentum is a type of factor investing, so his research has been invaluable.
​
Article 3:
Dr. Houlihan has founded, architect-ed, and deployed a financial data analytics company, SentiQuant, and spent fifteen years in high technology roles. His work has appeared in Quantitative Finance, Computational Economics, and the Journal of Investing. Notably, Houlihan earned his doctorate in Financial Engineering from Stevens Institute of Technology, where his research focus was sentiment analysis through natural language processing and machine learning techniques. His MLM research is critical for determining the relationship between an option or individual asset's true valuation as a function of the social media information about that asset, allowing us to incorporate even the trickiest of data sources into our complex analysis of stock and ETF returns.
​
Links to the research papers are on the left; direct quotes from the papers themselves are on the right. ​
Aaron Smalter Hall
Fundamentals
Forecasting is challenging, and the wealth of new and accessible data describing economic conditions presents an opportunity to explore more complex models that can capture more of the economic data. The field of machine learning provides a number of methods to address and capitalize on this complexity, both through increasingly complex models as well as methods to control and optimize that complexity.
I compare the performance of consensus, statistical, and machine learning methods for forecasting the monthly U.S. unemployment rate. My analysis shows that a more complex model, when properly controlled and provided with enough data from which to learn, can significantly outperform consensus and simpler statistical forecasting methods. The key to this result is the control of model complexity through regularization, a machine learning technique that yields a model complex enough to avoid underfitting the data but not so complex as to overfit it.
The trading strategy styles in the first three streams of financial machine learning research, Price, Event, and Value, can be split into unique trading themes depending on the data used and the outcome one is trying to predict. Price strategies include Technical, Systematic Global Macro, and Statistical Arbitrage, because of the central role price has to play in the input data and predicted outcomes. Event strategies include Trend, Soft-Event, and Hard-Event themes, because of the need to predict a change. Value strategies include Risk parity, Factor Investing and Fundamental themes, because these measures estimate intermediary values not directly related to the asset price.
Each trading theme can end up using different machine learning frameworks. For example, Technical and Statistical Arbitrage strategies can use a supervised or reinforcement learning approach or a combination of both, and Factor investing strategies can use a supervised or unsupervised learning approach.
Sentiment
Patrick Houlihan and
Germán G. Creamer
This paper evaluates the question of whether sentiment extracted from social media and options volume anticipates future asset return. The research utilized both textual based data and a particular market data derived call-put ratio, collected between July 2009 and September 2012. It shows that: (1) features derived from market data and a call-put ratio can improve model performance, (2) sentiment derived from StockTwits, a social media platform for the financial community, further enhances model performance, (3) aggregating all features together also facilitates performance, and (4) sentiment from social media and market data can be used as risk factors in an asset pricing framework.
ETF Exposure
Pratik's Portfolio's standard deviation, a metric for volatility, is in between the Moderate and Aggressive Portfolios'. Pratik's Portfolio also has exposure to 2x Leveraged ETF's not found in any other portfolio. Both funds below are issued by ProShares, and come with disclaimers about their volatility risk.​
​
The risks of owning leveraged ETF's are twofold. For any holding period other than a day, your return may be higher or lower than the Daily Target determined by the underlying index tracked. These differences may be significant. Smaller index gains/losses and higher index volatility contribute to returns worse than the Daily Target. Larger index gains/losses and lower index volatility contribute to returns better than the Daily Target.
ProShares Ultra QQQ seeks daily investment results, before fees and expenses, that correspond to two times (2x) the daily performance of the Nasdaq-100 Index.
ProShares UltraShort® S&P500 seeks daily investment results, before fees and expenses, that correspond to two times the inverse (-2x) of the daily performance of the S&P 500.
Stats Since Inception
Pratik's Portfolio finds patterns in a comprehensive training dataset that is updated weekly with the latest financial information. This means that backtests for Pratik's Portfolio are slightly different every time, given the slightly different month-to-month choices each time I run ML algos on a constantly expanding dataset. The more data, the better the output, and so Pratik's Portfolio should continue to find even more accurate patterns (and make better predictions) over time. As of now, the statistics below are the some of the best from Jan 2008 to Jan 2024.
Compounded Annual Growth Rate (CAGR)
Higher is Better
The compounded annual growth rate (CAGR) is one of the most accurate ways to calculate and determine returns for any investment that can rise or fall in value over time. A higher CAGR means higher annual returns on average, which is preferable to most investors assuming all else is equal. Investors can compare the CAGR of two or more alternatives to evaluate how well one investment performed relative to another.
Pratik's Portfolio has a higher CAGR (15.98%) than the S&P 500's (9.48%), showing that over the last 16 years, Pratik's Portfolio vastly outperformed the SPY Market Index.
Average Monthly Returns
Higher is Better
Average Monthly Returns is the percent change that an investor should reasonably expect to see their portfolio rise or fall each month. A higher number is preferable to a lower number because it implies higher compounded earnings over time.
Pratik's Portfolio increased 1.31% per month on average, compared to the SPY's 0.86% monthly average over the last 16 years.
Lower is Better
A Maximum Drawdown (MDD) is the maximum observed loss from a peak to a trough of a portfolio, before a new peak is attained. Maximum drawdown is an indicator of downside risk over a specified time period.
​
A low maximum drawdown is preferred as this indicates that losses from investment were small. If an investment never lost a penny, the maximum drawdown would be zero. The worst possible maximum drawdown would be -100%, meaning the investment is completely worthless.
​
Pratik's Portfolio's Maximum Drawdown of -15.12% is significantly better than the S&P 500's Maximum Drawdown of -50.80%, and is even comparable to the Maximum Drawdown of the Conservative Portfolio (though Pratik's Portfolio only has 16 years of backtest data as opposed to the full 24).
Worst Year
Lower is Better
Worst Year is the calendar-aligned year between 2008 and 2024 with the absolute worst performance. Most investors would prefer a lower Worst Year (in absolute terms) to a higher one, since a 'lower trough' implies a portfolio that declined less over a 12-month period than another portfolio would have.
​
Pratik's Portfolio's Worst Year annihilates the S&P 500's: only -0.42.% compared to -36.81%.
Higher is Better
The Sharpe ratio is a mathematical expression that helps investors compare the return of an investment with its risk. To calculate the Sharpe ratio, investors can subtract the risk-free rate of return from the expected rate of return, and then divide that result by the standard deviation (the asset's volatility.) The greater a portfolio's Sharpe ratio, the better its risk-adjusted performance.
​
Pratik's Portfolio's Sharpe ratio of 1.19 eclipses the SPY's Sharpe Ratio of 0.58 over the 16-year time period.
Higher is Better
The Sortino ratio is a variation of the Sharpe ratio that differentiates harmful volatility from total overall volatility by using the asset's standard deviation of negative portfolio returns—downside deviation—instead of the total standard deviation of portfolio returns. Because the Sortino ratio focuses only on the negative deviation of a portfolio's returns from the mean, it is thought to give a better view of a portfolio's risk-adjusted performance, since positive volatility is a benefit.
​
Just like the Sharpe ratio, a higher Sortino ratio result is better. When looking at two similar investments, a rational investor would prefer the one with the higher Sortino ratio because it means that the investment is earning more return per unit of the bad risk that it takes on.
​
Pratik's Portfolio's Sortino Ratio of 2.06 obliterates the SPY's Sortino Ratio of 0.86 over the 16-year time period.