Q&A: Statistics analyst talks prestige bias in March Madness

The story of Scotland Leman’s March Madness study begins in a Blacksburg, Va., bar.

That’s where the Virginia Tech statistics professor and a few colleagues went to watch then-coach Seth Greenberg’s press conference shortly after the Hokies (22-12, 9-7 Atlantic Coast Conference in 2010) had been snubbed of an at-large NCAA tournament bid.

Greenberg openly wondered whether North Carolina would have made the tournament given Virginia Tech’s resume. Leman, a college basketball fan, paid Greenberg a visit to say the marquee bias Greenberg spoke of in his media appearances could be statistically measured.

It was all the ammo Greenberg needed. The next day, he issued a press release indicating Leman and his colleague Leanna House would analyze the phenomena of a marquee effect in NCAA tournament selection.

Reporters and fans dialed Leman incessantly, asking to know his conclusion. One problem: He hadn’t even started yet.

The result was “Life on the bubble: Who’s in and who’s out of March Madness,” recently published in the Journal of Quantitative Analysis in Sports, in which Leman and several co-authors examined more than 10 years of NCAA tournament selection. They focused specifically on bubble teams and found in many cases that a team’s success in recent years could serve as the final push in rounding out the bracket.

Leman spoke with the Daily Bruin’s Andrew Erickson about his study and the use of analytics in March Madness.

Daily Bruin: How did the study come about after Seth Greenberg’s comments in 2010?

Scotland Leman: To make a long story longer, Greenberg, a week later, he sends me a FedEx envelope filled with 2,000 sheets of paper, and it’s got all kinds of statistics. It has the gory details broken down into team-by-team, all kinds of game-by-game analyses, so on and so forth. Anyway, we get the data and found there is a measurable bias. This is something that’s lingering on (tournament selection committee members’) minds, and it seems to have an influence. Given all the data you end up looking at and the gory details, that’s not good enough to predict their decision-making.

DB: In the span you measured, UCLA has the end of the Steve Lavin era, Ben Howland’s Final Four years and then a few years where UCLA’s success tailed off a bit. Were you able to find any specific data on UCLA?

SL: So this is where it gets tricky. There are many different types of marquee effects. What do you call strong? I was thinking about this with UCLA. People would consider UCLA a marquee team, especially during the Wooden years. I mean UCLA has more championships than anybody. We don’t measure that. Would UCLA’s prestige influence the committee? I don’t even know how you could measure that. What I am confident in saying is there are these lingering human effects when it comes to people making decisions in committees. They don’t act like robots. They’ve got information and for whatever reason there are biases.

DB: What were the human effects you saw were most prevalent?

SL: So they’re all kind of measures of “how strong are you?” And you have to be very strong. Since Greenberg described (North) Carolina, we just said, “Let’s look at teams like Carolina.” There’s a mathematical way to do it where you create a density plot and you ask, “Who’s like Carolina?” for the last two of three years. That has to be imposed. So my metric of two of three years where you look like this strong type of team, like a Duke or Carolina, during the last 20 years, then I might denote you as marquee.

We looked at another criteria: If you were a No. 1 seed in the NCAA tournament. We ended up imposing something like that for the last two of three years (prior to a given NCAA tournament). You can imagine there aren’t many teams who are like that. If a team ended up having very, very, very strong characteristics over the last few years – two of three years or three of five years – would that help them out in a moment of crisis when they’re on the bubble? It turns out they get kind of a nudge that all the metrics and the gory details don’t explain. And a nudge can be quite a lot.

DB: Was there a subset of teams nudged in the opposite direction as a result of the marquee bias or a lack of marquee?

SL: Not off the top of my head, but you would basically say everyone else on the bubble vying for that last spot. So the non-marquee teams that found themselves on the bubble have a legitimate gripe. The only one that I actually focused on was Virginia Tech. There was never a team that was marquee that was nudged in the opposite direction from what I saw.

DB: Either when you first published the study or now that it has been receiving more press, have you found yourself getting criticism from other scholars or from fans?

SL: Never scholars. You have to understand this went through peer review. There were a lot of academics who are highly skilled in quantitative methods. They gave me good criticism, and so I answered those questions in the paper. That all gets adapted.

But I’ll tell you, there are fans and reporters everywhere that love to debate this stuff. I’ve kind of always just said, “If you don’t like my analysis, tell me what it is in the model you dispute.” The way analysts have the conversation is a lot different than the way the general public has the conversation. I like sports just as much as anybody so I like to debate just as much as anybody, but if I’m going to go on record, I’m going to say very clearly what my assumptions are.

DB: Have you noticed other academics now looking at statistical models like this in college basketball, or are sports still catching on in the academic community?

SL: I noticed it last year with the whole billion dollar prize. So the question they want to know is, “What is the probability team A beats team B? And so can you build a model to predict a very, very good bracket?” When there’s betting involved – and NCAA March Madness is a cesspool for it – predictive modeling and analytical tools, those have a lot of utility. That’s where you see a lot of the research right now.

DB: As a watcher of college basketball and as a statistician, are you willing to make a prediction for this year’s winner, or is that too far off in the future?

SL: It’s way too far off. We did make some nice predictions last year. We did predict the Duke-Mercer upset (using a statistical model in 2014), which is something I never would’ve guessed with my heart. There was something in our model that told us it was a bad matchup for Duke. Seeing those successes, that’s a lot of fun. But for the most part, I just like to sit back and watch the games and not overanalyze it sometimes. A lot of these teams haven’t transferred information in the last couple weeks.

For instance, Kentucky. Hottest team in the country. What do we know about them? Really not a whole lot. They haven’t played really hard teams in a while. And so who do I think is going to win? It’s irrelevant. I could tell you but it’s not an analytical decision.

Compiled by Andrew Erickson, Bruin Sports senior staff.

Q&A: Statistics analyst talks prestige bias in March Madness

Leave a comment

Cancel reply