June 20, 2017 – mmr
Innovations in Research: Shapley Value Regression
I’m going to start by asking you to accept a statement at face value…
“We’re living in a time of rapid change in our industry!”
Change is affecting nearly ALL areas of research, including the sometimes less visible, highly technical regions of multivariate analysis.
In this particular arena, a key driver of change has been increased computing power, which has enabled approaches such as Choice-Based Conjoint w/ Hierarchical Bayes and MaxDiff analysis, among others.
What we’re also going to see in Drivers analyses, we believe, is a steady decline in the use of traditional multiple regression analysis and bi-variate Correlation analysis (cases where there is one dependent variable with multiple independent/predictor variables). These approaches will be steadily replaced by Shapley Value Regression (SVR). MMR Research has been investigating and testing SVR for the past year on real data, with real client studies and our conclusion is that:
SVR should be your preferred approach for Drivers/Importance Analyses, whenever possible.
The reason is that SVR is consistently superior in handling the Multi-Collinearity that we all know is present in most derived importance analyses.
SVR is considered by most to be far superior to standard regression (OLS or PLS) or Correlation analysis …
The approach simply makes SENSE… it crunches through all possible combinations of relationships between predictors and dependent variables to derive relative importance of each element and is not subject to order effects and other similar problems.
…In Practice (in multiple head-to-head comparisons we’ve executed)
It provides more statistically reliable results – our analysis shows that if you run SVR and a standard best practice approach like PLS (Partial Least Squares) regression side by side on a very large data set, pulling out random sub-samples within that data set, SVR provides MUCH more stable, consistent results.
It provides more intuitive results – in EVERY case that we’ve tested to date, when showing comparative results (which would lead to differing recommendations) to clients, they have preferred the SVR results.
Ok, but if it’s that great, why haven’t I heard about it before?
So, first, as we noted, SVR is relatively new…or more accurately, it’s actually an OLD approach evolved and made NEW again.
Originally developed in the 1950s by Game Theorists for very different applications, SVR has been reinvented/evolved over time by different people, including Kruskal in the ‘90s.1 It has slowly gained traction as it has been explored and refined by skeptical PhDs over the past 20(ish) years. So, SVR is much like the rock star who grinds it out for 20 years in honky tonks and bars and then, suddenly, it’s their turn, they are “discovered.”
Second, SVR suffers from some flaws and barriers that have slowed its adoption.
1) SVR is HARD to set up and requires LOTS of computing power
As one PhD user describes it, “the software that performs SVR is user-hostile.” Today, SVR is performed mainly with the use of “R” (a language with statistical packages rather than a statistical software package). Many practitioners would naturally prefer to continue to use the software that they have been using effectively for 20+ years (SAS/SPSS). However, once set up, it is easy to execute and doesn’t cost any more to do.
2) It requires scalar data
ALL the predictor variables must be scalar… so regressions that include “yes/no” variables cannot generally use this approach. (Note: This is why SVR first penetrated the Customer Satisfaction arena, where Multi-Collinearity is probably at its worst and well-ordered scalar data is the rule.)
3) It’s hard to gain agreement among PhDs and the Sales approaches to-date may have been sub-optimal
Researchers are VERY skeptical people already, but experts in multivariate analytics are probably the MOST skeptical and least likely to come to clear decisions and agreement on new methods.The sales approaches we’ve seen have done a good job describing the theoretical underpinnings, and statistical pros and cons, but have done less well describing what DIFFERENCE it makes to decision-making.
4) Other new approaches have also been emerging at the same time
SEM (Structural Equation Modeling) and other methodologies for dealing w/ MultiCollinearity and Complex relationships have also evolved during the same period. Some of these, arguably, have been believed to have even greater potential due to their greater flexibility.
What did we (MMR) learn from comparing SVR to OLS/PLS Regression?
First the GOOD News… regardless of which approach you use, you are very likely to get the same answer as to what are the STRONGEST and WEAKEST drivers. (Frankly, if we got completely different answers, it wouldn’t make sense.)
However, in the middle… the answers can definitely be VERY different. We have seen elements that are Tier 2 drivers in a PLS approach fall to the bottom of an SVR analysis, and vice-versa. (Note: These findings of differences emerge regardless which specific measures you are using for comparison.)
So, the BAD news is that, in most cases, our clients already know what is most and least important… what they are most often trying to learn is “What is NEXT?” or “How do I nuance communications?”
And in that area, SVR can definitely lead to different insights. That’s why we believe that it should be the preferred approach for Drivers Analysis… it will make a difference.