Marc Gerstein
October 08, 2016
Portfolio Strategist specializing in quantitative fundamental equity modeling

Nouvelle Equity ETFs Are Not Delivering

THIS IS A REPRINT OF AN ARTICLE THAT INITIALLY APPEARED ON FORBES.COM  

We’ve been hearing a lot lately about ETFs based on smart beta, factors, models, and so forth. I desperately want to love this development in the industry. It’s a scaled-up version of what I do and allows me to dream of a Gerstein ETF with billions in assets, gobs of glory; and some management fees wouldn’t hurt. But the folks who’ve gone out into the world ahead of me have been blazing a pothole plagued trail.

What the Chic Crowd Has Been Doing

It’s tough to assign simple labels because so many are being misused (especially the phrase “smart beta” ), but essentially, the idea is to identify “factors” one believes to be associated with superior future stock performance (superior return, lesser risk, or some combination of return and risk that is supposedly better than what can be had in a generic index fund) and add that “tilt” to a portfolio. Tilt is a word people use if they want to appear smart (“skew” is another); in real-people-speak, tilt means emphasis. If I add a value tilt to a portfolio, it means I think Value stocks are better and more of your money is being invested on that basis than would be the case with, say, the S&P 500 SPDR ETF ($SPY).

Adding a value tilt to a performance requires three things.

First, I have to define factors. Sticking with Value as an example, I have to determine which ratio or set of rations I use (P/E?). Do I use historical E or estimated E or both? Do I use P/Sales, P/Cash Flow, P/Book, Enterprise Value/Something, etc.? I also have define how low I should go in terms of those ratios. Usually we think in terms of ranks rather than numerical thresholds. Graham and Dodd preferred P/Es under 8 but that was in the 1930s; today, under 15 would be considered good and who knows what we’ll think a year or so from now. Ranking is good for all seasons: Sort all P/Es or combinations of ratios from highest to lowest and pick the bottom 20%, the bottom 50% or whatever.

Second, I have to decide which factor or set of factors I want to use. Value-Quality-Momentum is a popular combination and some use it. Others do just one, others pick different combinations. Growth is a popular factor in the marketplace although not so much in the research community (strong growth in the past, the only kind of growth we can definitively identify, is not associated with ongoing strong growth in the future in a systemic way).

Finally, I have to decide how to implement my choices. This is where misnamed smart beta often comes in. Rather than market cap weighting, I can give higher portfolio weights to stocks that rank favorably under my value-quality-momentum ranking system (overweighting). This appears to be the method of choice for many ETFs. The other approach, the one I use because I’m a small fry and need not allocate billions of dollars, is to use a model for inclusion/exclusion decisions. If a stock passes muster, it’s in. If it doesn’t meet my threshold, it’s out. Being small, I can equally weight those that get in. Larger funds and ETFs often have to make adjustments to accommodate liquidity. For example they may take a generally cap weighted portfolio but add a bit more to the weightings of favorably ranked stocks and underweight those with poor rankings.

The Scorecard

Running the kinds of tests I like to run hasn’t been easy because the “universe” has been a moving (and growing) target. Many such ETFs didn’t exist 10 years ago. In fact, many such ETFs didn’t exist one year ago. And of those that do exist, it appears to my unscientific estimation that a bigger portion of them now pursue the quality tilt (often labeled low-volatility or something like that), which is not surprising since the market is looking a lot more iffy than it has in a long time (given that the Fed’s 35-year market-pumping stomp-down on interest rates is obviously over). 

But I test as best I can, as I feel I have no choice but to do so since the funds want your money now, not five years from now after they’ve had opportunity to complete an open out-of-sample beta test.

I’m going to test this universe over the last year, the last three years and the last ten years; obviously, the size of the universe is not constant but the smallest it’s been (at the start of the 10-year test) is 23 ETFs. Now, it’s 96. (Note: For today, I’m excluding model-based ETFs that focus on a specific sector.

Table 1 compares total return (stock appreciation-depreciation plus dividends) performance of this ETF universe to $SPY not necessarily because I think $SPY is the greatest benchmark in the world but because it’s the psychological/emotional default that continues to be selected by many investors who just want decision-free equity exposure.

Table 1 – All Nouvelle ETFs

 

Annual Return %

Standard Deviation %

 

ETFs – 10 Yrs.

6.63

16.86

SPY – 10 Yrs.

7.16

15.86

 

ETFs – 3 Yrs.

8.70

10.71

SPY – 3 Yrs.

10.72

11.73

 

ETFs – 1 Yr.

9.42

12.37

SPY – 1 Yr.

12.18

12.16

Not impressive! Obviously, we need to find a way to pick and choose.

Table 2 restricts the universe by eliminating the ETFs that trade least (dollar traded averaged less than $250,000 over past 60 days).

Table 2  –Nouvelle ETFs Eliminating Those Least Traded

 

Annual Return %

Standard Deviation %

 

ETFs – 10 Yrs.

7.43

17.35

SPY – 10 Yrs.

7.16

16.86

 

ETFs – 3 Yrs.

7.85

10.94

SPY – 3 Yrs.

10.72

11.73

 

ETFs – 1 Yr.

8.94

12.54

SPY – 1 Yr.

12.18

12.16

Still not impressed. The ETFs outperformed in the ten-year test but the degree amount of excess is too trivial to take seriously.

Let’s further restrict the universe to those that are the most popular, the big names, the ones whose 60-day dollars traded daily averages more than $1 million. 

Table 3  – Most Heavily Traded Nouvelle ETFs

 

Annual Return %

Standard Deviation %

 

ETFs – 10 Yrs.

6.15

16.82

SPY – 10 Yrs.

7.16

15.86

 

ETFs – 3 Yrs.

7.79

11.25

SPY – 3 Yrs.

10.72

11.73

 

ETFs – 1 Yr.

10.39

13.18

SPY – 1 Yr.

12.18

12.16

Scratch that. Popularity, at least in terms of trading, means nothing.

I’m going to stick with the highest-liquidity group (given that there appears to be no meaningful benefit to reducing liquidity, I see no reason to do it) but confine myself to a portfolio of 10 ETFs, those that rank highest in a very simple momentum model based on academic research and one I’ve used successfully in stock selection models (six month price change excluding the latest week).

Table 4  – Top 10 (per Momentum) Heavily Traded Nouvelle ETFs

 

Annual Return %

Standard Deviation %

 

ETFs – 10 Yrs.

3.69

16.78

SPY – 10 Yrs.

7.16

15.86

 

ETFs – 3 Yrs.

7.05

10.83

SPY – 3 Yrs.

10.72

11.73

 

ETFs – 1 Yr.

15.01

10.64

SPY – 1 Yr.

12.18

12.16

Hooray! Finally, we got something. The one-year performance of the most heavily traded ETFs that were in the top 10 for momentum worked!

Are you wondering what would happen if I cut the portfolio size to 5 ETFs (it does seem weird to hold 10 ETFs). Let’s see.

Table 5  – Top 5 (per Momentum) Heavily Traded Nouvelle ETFs

 

Annual Return %

Standard Deviation %

 

ETFs – 10 Yrs.

0.89

17.36

SPY – 10 Yrs.

7.16

15.86

 

ETFs – 3 Yrs.

6.39

11.17

SPY – 3 Yrs.

10.72

11.73

 

ETFs – 1 Yr.

17.84

10.52

SPY – 1 Yr.

12.18

12.16

Ouch! Again, the one-year test was OK, but the others were dreadful. An admittedly quick glance suggests to me that the one-year success was related to favorable performance of low-volatility strategies in general (new ETFs favor that approach) and strength in electric utilities, which are prominent in equity-income ETFs 

These aren’t the only tests I ran. I tried changing the rebalancing interval, the method of ranking the ETFs, the number I’d hold. I’ll spare you the gory details. So far, nothing I’ve done has worked.

I could continue. Sooner or later, I’ll probably come up with something. But will it be legit?

It’s not supposed to be that way. I’m not supposed to be hammering away at every possible concoction I can conceive until after thousands or more trials, I finally hit on something. What I should be doing is what I actually did: Come up with some ideas that make sense and try them out. If they work, even with some fine-tuning to refine the expression of the idea, that would be great. But at some point, research has to admit that an idea isn’t happening legitimately and call it for what it is, a dud 

Are The Tests Fair?

Yes.

Admittedly, I didn’t plug harder to separate different kinds of nouvelle ETFs from one another and seek out sub-groups that may be worthy (for what it’s worth, I do like the low-volatility and equity income ETFs), or try to analyze the underlying holdings which, in the case of ETFs, are released daily 

I didn’t do any of that because I shouldn’t have to. ETFs (and other kinds of funds) exist in order to spare investors the burden of heavy research and decision-making. I already know that the generic ETFs, $SPY and others like it, do what they say they do. So far, though, if I want to use ETFs for equity exposure, I’m yet to see a case for bypassing the big generics and opting for something nouvelle. And I don’t think any investor should have to develop a case for such an ETF by trying to get a research grant from a university or foundation and launching a major study. That’s not why we buy ETFs.

I had not intended to make a pitch for my platform, portfolio123, here. I really wanted to love this group of ETFs. But from what I’m seeing, if you want to add some additional zip to your equity exposure, bypass the fancy ETFs that suggest they can do this and subscribe to portfolio123 on your own and build your own model or use one of the ones you can find on the site , or find a manager who can and will do that or something similar. At the very least, you’ll get perfect transparency and accountability.

Why Aren’t The Nouvelle ETFs Working?

This is a huge topic.

One possibility is that it can’t work on a scaled up basis. Indexers all over the place love to talk about how by definition, investors as a whole have to be average. Their marketers (Hi Vanguard!) use this to try to convince you that you can’t be good so you should throw up your hands (or maybe even just throw up) and park all of your assets with them. The flip side  is that if you think you are or want to be good, you should work at yourself of find a manager who is good.

So twisting the indexer‘s mantra, by definition, some participants in the market have to be above average. There’s no law that says you and/or you manager can’t be among them (and with gridlock in D.C., the chances of getting such a law passed are slim to none).

Another concern is a bit geeky. A lot of quant work today is based on a Fama French type framework which, although quite famous (at least among geeks) may not really all that good. I addressed it a bit here in terms of value ETFs and again in terms of dividends , and more generally here .
More from Marc Gerstein