Small Sample Sizes

Baseball, more than any other sport, produces an incredible amount of statistical data that we are able to analyze and study. This data helps us to quantify how effective a player or team is, and gives us some perspective to their relative performance to the league average, league leaders, or historical records. With 25 active players on each team, 30 teams in the league, and 162 games in a season, these statistics are necessary if we are to come to any sort of objective conclusion about a player’s value.

Baseball statistics are fact. Unless one is being purposefully dishonest, when MLB statistics are quoted you can be certain that they are true. This is important, because personal prejudice and belief is weeded out when we look at statistics. If fan A really likes player A, but fan B thinks he is terrible, but fan A and fan B view the same set of statistical data, they should be able to come to the same conclusion (if they are able to think rationally).

The point at which this system can be flawed is when individuals attempt to reverse the proper order of things. Instead of using statistics and facts as the basis for our beliefs, it can be easy to begin with what we believe to be true, and then seeking out only those statistics that support the belief. This leads us to isolate certain components of the data, and prevents us from viewing all of the available statistics that would tell the whole story. This does a disservice to those viewing the results, in that it shows only part of the story, and is not an accurate depiction of a player’s performance, or of an expectation of their future results.

For instance, if I believe the Cliff Lee was a great pitcher in 2011, I may choose to only show you his performance in the months of June, August, and September. Those would look like this:

119.2 IP, 12-1, 0.68 ERA, 0.81 WHIP

These kinds of numbers would clearly show that Cliff Lee is one of the greatest pitchers in baseball. However, if I thought Cliff Lee was a disappointment last year, and wanted to prove it statistically, I may only show you his performance in April, May, and July:

112.3 IP, 5-7, 4.25 ERA, 1.31 WHIP

Are either of these statistical sub-segments an appropriate method for analyzing the 2011 season for Cliff Lee? Overall, Lee finished with this line:

232.2 IP, 17-8, 2.40 ERA, 1.03 WHIP

This is an outstanding year, but you can see how if I pick and choose certain time frames of Lee’s year, I could build an argument that either he was one of the best pitchers in 2011, or one of the most overpaid pitchers.

This is part of the reason why I am rarely swept up in the hot new player who comes into the league completely on fire, especially when it is unexpected. This is going on right now with Jeremy Lin in the NBA. It happened earlier in 2011 with Sam Fuld. It happened a few years ago with Jay Bruce. These young players entered their leagues and appeared to be dominant game-changers for a moment in time. Then, apparently they seem to drop off drastically and disappear. In reality, all that is happening is that they are regressing to the mean (average). The initial 2-3 week sample is not broad enough to assume that those players will be able to maintain that high level of performance, and it is unfair to place that expectation on them.

This is why it is so important for teams to research players beyond the numbers. Scouting will always have a place in the game, because it is rare for there to be a large enough sample of data available to make meaningful projections of a player’s potential. Any such projections will have significant risk of inaccuracy. Quality scouting and player development will allow teams to have a more complete understanding of what to expect from a player. Placing more confidence in a player’s tools and skills than in his numbers makes it easier for a team to endure brief slumps, or disappointing debuts. However, it is worth noting that scouting practices have a history of producing unrealistic projections as well due to separate issues like human error and personal bias. The best solution, which is employed by most teams today and will likely continue to in the future, is to implement an appropriate blend of using statistical analysis and scouting information.

When you evaluate a particular player’s value, be certain that you tell the whole story, and watch out for when others try to divert your attention from all of the information in order to paint only the picture that supports their original belief. Above all, keep in mind that the front offices of all 30 MLB clubs have more information available to them than you do, and are composed of incredibly smart and talented individuals. As much fun as it is to play armchair GM, it would be a greater benefit to have confidence in the front office of our team and pay attention to what their information is telling them on certain players. We are given two ears and one mouth; in all things it is important to use them proportionately.

Peter Ellwood is a staff writer for ShutDownInning. You can email him at or reach him on Twitter @peter_ellwood
Peter Ellwood

Leave a Reply