400 lines of code and 1.6 million data points later


And the data is finally in “response variable, explanatory variable 1, explanatory variable 2,… explanatory variable n” format. My code (Java) is very unreadable, but it gets the job done. There were only about 18000 earnings surprise data points. But my other explanatory variables right now, momentum and volatility, require historical price data and so I had to process 10 years worth of daily price data for the 500 stocks in the S&P 500. 

Analysis of data with R is up next. I wonder what the initial results look like compared to the results in the Doyle et al. paper because I only have 3 explanatory variables and I am only testing on a small subset of the stocks they tested.

starting on PEAD (post earnings announcement drift) analysis


I decided to start on my PEAD project, which is to semi-replicate the analysis in the research paper “The Extreme Future Stock Returns Following I/B/E/S Earnings Surprises”. One of the things the researchers do in that paper is construct a multiple regression of 1 year, 2 year, and 3 year returns on earnings surprise %, beta, market cap, momentum, accruals, and a few other explanatory variables. They find that the coefficients for earnings surprise % and accruals are the most significant, followed by market cap, beta, and momentum I believe.

The plan is to construct a multiple regression similar to theirs: for now it is a regression of future intermediate term returns (1 month? 6 months? 1 year? haven’t decided yet) on surprise, historical volatility, and momentum. Earnings estimates will be obtained from IBES, price data from CRSP (I am grateful to be a Wharton student…). Replicating the research done in a research paper really forces you to understand it and actually allows you to see areas for improvement…

It’s funny how the stars kind of aligned on this one. For my statistics class we have to do a final project using something we’ve learned: multiple regression is a big one. I’m running a small fund with a few friends; one of them introduced me to PEAD and wanted to learn more about it as a potential strategy. I proceed to read research on it and find out that in one paper the researchers use a multiple regression model. So now I’m doing this PEAD exploration as a stat class final project, as preliminary research to a potential investment strategy, and as a way to learn and practice R. Talk about killing multiple birds with one stone.

This is way over my self-imposed word count…

performance of the ETFRot strategy

ETFRot is an ETF rotation strategy that I’ve been working on for a while now (almost a year). Essentially it uses a couple of momentum and volatility indicators to rank ETFs in a basket spanning across asset classes, and then trades the top ranked ETF. When the top ranked ETF changes, it “rotates” into the the new one. Simple logic, simple to trade. 

To test the influence of data mining bias, I ran a walk forward optimization, optimizing the parameters on in sample data up to X and then using those optimized parameters to trade and evaluate the strategy in X+1. Repeat, now including X+1 in the in-sample and testing out of sample on X+2. If data mining bias is rampant, we would expect performance of the optimized parameters to be poor out of sample: they would have little predictive power. After doing a walk forward optimization of ETFRot, the parameters seem to be intrinsically predictive rather than just curve-fit:



It starts in 2005 because I used 2003-2005 (before 2003 some ETFs in my basket didn’t exist) as my first in-sample sample. The second chart compares ETFRot performance with the SPY. It seems to have done very well since 2008, which is when the market tanked. Maybe ETFRot is capturing a market regime shift that happened in 2008…

TSLA, Covestor, and potential new project



I saw yet another article on Tesla Motors’ (TSLA) expansion. This time it was about Panasonic’s $30 million investment in Tesla. A few months ago Toyota invested $50 million; last month they inked a partnership with Tesla to use their all-electric powertrains in the new EV RAV4. Daimler has also invested heavily in Tesla’s technology.  Right now, Tesla is burning money. Tesla may not become the next big car manufacturer, but they do have their innovative powertrain technology. If they keep making these strategic alliances, a future where most electric vehicles on the streets use a Tesla powertrain is not that hard to imagine… (disclaimer: I own some TSLA shares).

I’ve switched brokers from Zecco to Interactive Brokers so my tracking portfolio on Covestor will no longer be updated daily (I have to send Covestor my monthly IBrokers statements). On Covestor I track one of my ETF rotation algorithms. 

Although moving on from the Faber project, I still intend to keep using/learning more about R. I’m currently reading the paper “The Extreme Future Stock Returns Following I/B/E/S Earnings Surprises” which analyzes the phenomenon of post earnings announcement drift (PEAD). Essentially, stocks drift upwards months after a positive earnings surprise (a very anti-EMH phenomenon). Getting my hands on the data and replicating the research in R would be fun and educational.