ETFsAdvancedFactor-Based

R-Squared Selectivity

A sophisticated strategy that combines R² (regression fit) with alpha to identify ETFs with genuine active management skill. Low R² indicates returns unexplained by common factors—"selectivity"—while high alpha indicates outperformance. Together, they predict future performance.

Model
Carhart 4-Factor
Lookback
12-36 Months
Holding
1 Month
Selection
Low R², High α

Overview

The R-squared selectivity strategy identifies ETFs where returns are less explained by common factors (low R²) but still generate positive alpha. Low R² means high "selectivity"—the fund's returns diverge from passive factor exposure, indicating active management decisions.

The key insight from Amihud and Goyenko (2013) is that low R² predicts future outperformance. Funds in the lowest R² quintile combined with highest alpha quintile produced annual alpha of 3.8%. The intuition: truly skilled managers make idiosyncratic bets that deviate from benchmarks.

For ETFs, Garyn-Tal (2014) extended this methodology and found that the R² approach successfully predicts ETF alpha. The strategy sorts ETFs by R² into quintiles, then sub-sorts by alpha. Buy ETFs with lowest R² and highest alpha; avoid those with highest R² and lowest alpha.

Key Insight

Selectivity = 1 - R²
Measures active management
3.8% Annual Alpha
Low R² + High α quintile
Double Sort
R² quintiles × α sub-quintiles

R² vs Alpha Selection Matrix

ETFs plotted by R² (x-axis, inverted) and alpha (y-axis).Buy zone: low R² (high selectivity) + high alpha.

Buy Zone ETFs
3
Low R², High α
Avg Selectivity
33%
1 - R² for buy zone
Avg Alpha
+2.0%
Buy zone annual α
Buy (Low R², High α)
Neutral
Avoid (High R², Low α)

Research

The R-squared approach to fund selection emerged from research showing that funds diverging from benchmark factors (low R²) while generating alpha are more likely to persist in outperformance. This captures genuine active management skill.

The Mathematics

In Plain English

The math behind this strategy is straightforward. Here's what you're actually doing:

  1. 1
    Run a regression of each ETF's returns on the Carhart four factors (MKT, SMB, HML, MOM)
  2. 2
    Extract two statistics: alpha (α) and R-squared (R²)
  3. 3
    Calculate selectivity as 1 - R² (higher means more active management)
  4. 4
    Double sort ETFs: first by R² quintiles, then by alpha within each quintile
  5. 5
    Buy ETFs in the lowest R² quintile with highest alpha
  6. 6
    Avoid/short ETFs in the highest R² quintile with lowest alpha

That's it. The formulas below just express this process precisely.

Technical Formulas

1
Carhart Four-Factor Regression

Formula
R_i(t) = \alpha_i + \beta_{1,i} MKT(t) + \beta_{2,i} SMB(t) + \beta_{3,i} HML(t) + \beta_{4,i} MOM(t) + \epsilon_i(t)

Regress ETF excess returns on market (MKT), size (SMB), value (HML), and momentum (MOM) factors. This produces alpha (α) and residuals (ε).

2
R-Squared Calculation

Formula
R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{t=1}^{N} \epsilon_i^2(t)}{\sum_{t=1}^{N} (R_i(t) - \bar{R}(t))^2}

R² measures what fraction of return variance is explained by the factors. Lower R² means more idiosyncratic (unexplained) returns—higher selectivity.

3
Selectivity Measure

Formula
Selectivity = 1 - R^2

Selectivity is simply 1 minus R². Higher selectivity indicates the fund's returns diverge more from common factors, suggesting active management decisions.

4
Selection Rule

Formula
Buy: \text{Quintile}(R^2) = 1 \text{ AND } \text{SubQuintile}(\alpha) = 5

First sort into 5 groups by R² (lowest = 1). Within each R² quintile, sort by alpha (highest = 5). Select ETFs in lowest R² quintile with highest alpha sub-quintile.

Estimation PeriodNote

Typical estimation uses 12-36 months of daily or weekly returns. Longer periods give more stable R² estimates but slower adaptation to changes in fund management style.

Factor DataNote

The Carhart four factors include Fama-French three plus momentum (MOM). All are available from Kenneth French's data library. Use the same frequency as your return data.

Strategy Rules

Universe & Data

  1. Use actively managed ETFs or sector/thematic ETFs with potential for alpha
  2. Collect at least 12 months of return data (daily or weekly)
  3. Download Carhart four-factor data (MKT, SMB, HML, MOM)
  4. Exclude pure index-tracking ETFs (they have R² ≈ 1 by design)
  5. Consider minimum AUM and liquidity filters

Regression Analysis

  1. 1Run OLS regression of excess returns on four factors
  2. 2Use rolling 12-36 month window for estimation
  3. 3Extract alpha (intercept) and R² from each regression
  4. 4Record t-statistics for significance assessment
  5. 5Update regressions monthly with new data

Double Sorting

  1. Sort all ETFs into 5 quintiles by R² (Q1 = lowest R²)
  2. Within each R² quintile, sort by alpha into 5 sub-quintiles
  3. This creates 25 groups (5 × 5 matrix)
  4. Select from Q1 R² × Q5 alpha cell (low R², high alpha)
  5. For long/short, also short Q5 R² × Q1 alpha cell

Risk Management

  1. 1Monitor idiosyncratic volatility (low R² means higher specific risk)
  2. 2Diversify across multiple ETFs in the target cell
  3. 3Set maximum position sizes to limit concentration
  4. 4Review for style drift in selected ETFs
  5. 5Consider volatility scaling for position sizing

Implementation Guide

Implementing the R-squared strategy requires running regressions and sorting ETFs into quintiles. This is more complex than simple momentum but identifies genuine active management skill.

1

Define ETF Universe

Start with a universe of ETFs that have potential for active management alpha. This could include active ETFs, smart-beta ETFs, sector ETFs, or thematic ETFs.

Tips
  • Active ETFs: ARK funds, Avantis, Dimensional
  • Smart-beta: multifactor, quality, low-vol ETFs
  • Sector/thematic: potential for manager skill in selection
  • Exclude pure passive index trackers (SPY, IVV, VTI)
2

Gather Return and Factor Data

Collect daily or weekly returns for each ETF going back 12-36 months. Download the Carhart four-factor data from Kenneth French's library.

Tips
  • Use adjusted close prices for total returns
  • Match frequency: daily returns with daily factors
  • Kenneth French library: mba.tuck.dartmouth.edu/pages/faculty/ken.french/
  • Factors: Mkt-RF, SMB, HML, and Mom (momentum)
3

Run Factor Regressions

For each ETF, regress excess returns on the four factors. Extract alpha and R² from each regression. Store results for the sorting step.

Tips
  • Excess return = ETF return - risk-free rate
  • Python: statsmodels.OLS or sklearn
  • Save: alpha, R², t-stat for alpha, regression residuals
  • Check for sufficient observations (N > 30)

ETFs with very high R² (>0.95) are essentially passive and should be excluded from the active management analysis.

4

Sort into Quintiles

First sort all ETFs by R² into 5 groups. Then within each R² group, sort by alpha into 5 sub-groups. This creates a 5×5 matrix of 25 cells.

Tips
  • Q1 R² = lowest 20% by R² (most active)
  • Q5 α = highest 20% by alpha within each R² group
  • Target cell: Q1 R² × Q5 α = low R², high alpha
  • Avoid cell: Q5 R² × Q1 α = high R², low alpha
5

Construct Portfolio & Rebalance

Equal-weight ETFs from the target cell (Q1 R² × Q5 α). Rebalance monthly by re-running the regressions and re-sorting.

Tips
  • Typical target cell may contain 2-5 ETFs
  • Equal-weight for simplicity
  • Monthly rebalancing on last trading day
  • Track turnover—some persistence expected

Technical Requirements

This strategy requires regression analysis capability (Python, R, or Excel). You'll need to automate the monthly sorting process. Execution is straightforward with any broker offering ETF trading.

Helpful Tools & Resources

Factor Data
Kenneth French Data Library (free)
Regression Analysis
Python (statsmodels), R, Excel
ETF Screening
ETF.com, Morningstar, VettaFi

Strategy Variations

Explore different ways to implement this strategy, each with its own trade-offs and benefits.

Active ETF Focus

Apply the strategy exclusively to actively managed ETFs (ARK, Avantis, etc.) where manager skill is explicitly expected. May yield stronger signals.

Smaller universe but cleaner signal.

Fama-French Five Factor

Use the Fama-French five-factor model instead of Carhart four-factor. Adds profitability and investment factors for potentially better fit.

May change R² rankings slightly.

Rolling vs. Expanding Window

Compare rolling (fixed lookback) vs. expanding (all available data) regression windows. Expanding may give more stable R² but less responsive to regime changes.

Trade-off between stability and adaptability.

Conditional R²

Estimate R² separately in up and down markets. Some managers show skill only in certain regimes. Select based on regime-specific R² and alpha.

More complex but potentially more predictive.

Combined with Momentum

Add a momentum filter on top of the R²/alpha sort. Only buy ETFs that also have positive recent returns. Combines skill identification with trend following.

Multiple signals may improve consistency.

Consider combining multiple variations or testing them against your specific investment goals and risk tolerance.

Risks & Limitations

High(2)
Medium(4)
Idiosyncratic VolatilityHigh

Low R² means high idiosyncratic (non-systematic) risk. Selected ETFs may experience significant swings unrelated to market factors, leading to higher volatility.

Impact:
Skill vs. LuckHigh

Low R² with high alpha could be skill or could be luck. With small samples, distinguishing genuine skill from random outperformance is statistically challenging.

Impact:
Small UniverseMedium

After filtering for low R² and high alpha, the target cell may contain very few ETFs. This creates concentration risk and may require looser selection criteria.

Impact:
Style DriftMedium

ETF managers may change their approach over time. A fund that had low R² may become more benchmark-hugging, invalidating the selection signal.

Impact:
Factor Model LimitationsMedium

R² depends on which factors are included. An ETF may appear "active" (low R²) simply because it loads on factors not in the model, not because of true selectivity.

Impact:
Data Mining RiskMedium

The double-sort creates 25 cells; picking one cell ex-post may overfit historical data. Out-of-sample testing is essential before implementation.

Impact:
Understanding these risks is essential for proper position sizing and portfolio construction. Consider combining with other strategies to mitigate individual risk factors.

References

  • Amihud, Y. & Goyenko, R. (2013). Mutual Fund's R² as Predictor of Performance. The Review of Financial Studies, 26(3), 667-694 [Link]
  • Garyn-Tal, S. (2014). Explaining and Predicting ETFs Alphas: The R² Methodology. Journal of Index Investing, 4(4), 19-32 [Link]
  • Garyn-Tal, S. (2014). An Investment Strategy in Active ETFs. Journal of Index Investing, 4(1), 12-22 [Link]
  • Ferson, W. & Mo, H. (2016). Performance Measurement with Selectivity, Market and Volatility Timing. Journal of Financial Economics, 121(1), 93-110 [Link]

ETF trading involves risk of loss. The R-squared strategy involves statistical estimation which is subject to error. Low R² implies higher idiosyncratic volatility. Past performance and R² statistics do not guarantee future results. This is educational content, not investment advice.

Ready to explore more ETF strategies?

Discover additional tactical ETF strategies including sector momentum, alpha rotation, and factor timing approaches.

Browse ETF Strategies