“Marathon, Not a Sprint” and Small Sample Sizes
I am an ardent supporter of advanced baseball statistics. They do have some limitations, while sometimes a high FIP can unmask a low ERA and guy who just keeps getting out of innings somehow, some people might say that ability to get out of innings is the very hallmark of said pitcher. Then again sometimes a pitcher with a high FIP really hasn’t been that bad, take Bartolo Colon yesterday. He wasn’t great, but he allowed three solo shots having his FIP for the game swell to 11.17. But at the end of the day that was three pitches left over the plate out of 83 thrown, was he really 11.17 bad? The problem with that analysis of course is those three words us sabermetrics fans love to bandy about: “small sample size”.
For sabermetrics supporters, and even those with a preference for traditional stats, people will cite small sample size early and often. Albert Pujols is hitting .194? Well he only has 114 plate appearances it’ll all even out. -0.8 WAR? It’s just the first week of May. Every year there is a plethora of “the Orioles are in first is this their year?” articles, or insert the name of some other surprise team off to a hot April. While the sample sample size is seldom used to discount Matt Kemp‘s ability to keep up his .392/.474/.825 pace or his 12 home runs in just 116 plate appearances, there are those who will tell you to watch out for him too – and fairly so. But there is a problem with ignoring small sample sizes as well. These are statistics that do count. Pujols’ April and first week of May will go into his final tally come October.
Think about flipping a coin. You have 100 flips of that coin. We expect that the end result will be 50/50. But if you start with a run of 10 tails, that doesn’t mean that we now as we prepare to flip it an eleventh time expect that 55.6% of the time going forward it will be heads to make up for the 10 straight tails. We expect it to be a 50/50 chance it ends up heads or tails, meaning that we anticipate our final tally will be 45 heads to 55 tails. Just because something needs to regress to the mean, doesn’t mean it overcompensates or that what has happened suddenly doesn’t count.
I was reminded of this when I put up a post here and on Athletics Nation regarding whether or not the A’s should consider demoting Jemile Weeks. The basis of my argument was that his BABIP was decidedly out of sorts, so far outside the bounds of normal to be statistically significant and that perhaps it was a sign of a burgeoning problem. As I fully expected my hypothesis and potential remedy were met with full-throated disagreement. I could understand the reaction to my potential remedy, there are many ways to deal with a slumping player and one can argue that sending them back to the minors is a needlessly dramatic move, but I was a little surprised with the quite strong sentiment that the BABIP I cited essentially told us nothing because BABIP doesn’t because reliable over anything even as high as 650 plate appearances so therefore to a degree my argument was without merit. I had started this discussion about his BABIP in a post written by my TarpTalk co-host Alan Torres about how we just have to wait Weeks out, all is OK. He cited swing %, contact rate, line drive, ground ball rate etc, all things that become reliable between 50-150 plate appearances and written at a time when Weeks had 104.
But the thing is baseball can’t afford to wait 650 plate appearances for my hypothesis to be proven faulty. Walk rate, ground ball rate these things you need 200 plate appearances. On base percentage the holy grail of sabermetrics as a fundamental building block of a winning team you must wait for 500 plate appearances before it becomes useful. BB% for pitchers, 550 batters faced. That is why there is scouting. Teams can’t wait until a player acquires 500 plate appearances to know if the on base percentage they are seeing is real, only six Athletics hit that mark last season.
So far this year we have seen several players come and go, with very small sample sizes of work: Brandon Allen, Andrew Carignan, Fautino De Los Santos, Josh Donaldson, Graham Godfrey, Luke Hughes, Adam Rosales and Rich Thompson all have been at the very least demoted, with several even being thrown out there for any team to claim based on very little data. Baseball is indeed a marathon, not a sprint, but what we see in the short term shouldn’t be discounted either. Lengthy track records do matter, someone other than Pujols might have been benched by Mike Scioscia by now, but at the same time so do recent results. Sabermetrics isn’t a science where you can control variables and figure out results, the data set you get is incomplete, it is up to the whims of others, we can best sort of figure it out from what we can cobble together. So sample sizes do count for something, that already have happened and can’t be erased, and shouldn’t be ignored.
A near exact post has been cross-posted on Athletics Nation. I encourage all my readers to go there to join in on the discussion and debate.