clock menu more-arrow no yes mobile

Filed under:

What Does a Stat Say and What Should a Stat Say?

Getty Images

I mentioned earlier in one of my introductory pieces that I would occasionally drop an explanatory article that went into some of the thinking process behind the sabemetric stats that are commonly used on this site. Of course, sabermetrics is not so much about advanced statistics as it is about a way of thinking about baseball and many other subjects. In fact, while people may think proponents of sabermetrics believe they have all the answers, sabermetrics is really more about asking questions.

With that in mind, there are two questions that I want you to consider whenever you choose to use a statistic for whatever reason. These two questions should guide how you use any stats, whether they be things you see on ESPN.com or the stuff you see on FanGraphs. These two questions are important any time you want to discuss something about a player or a team.

1) What question am I trying to answer?

2) What stat or observation answers that question? 

These may sound simple, but they are at the heart of any argument about almost anything, including our favorite national pastime. Let us consider an example.

Question: What player gets the most hits on average?

If this is the question you want answered, the answer should not be difficult to find. There happens to be a statistic, batting average, that answers this exact question, provided we define some of the questions parameters. Let us assume that this question wants to find out which player got the most hits given a certain number of opportunities between 2009 and 2011. Among players with at least 1000 plate appearances (PA), here are your top three batting averages:

1) Joe Mauer (.333)
2) Miguel Cabrera (.332)
T3) Ryan Braun (.318)
T3) Joey Votto (.318)

That is a simple question with a simple answer. Now, let us try a different question.

Question: What player makes the fewest outs on average?

Now this question is a different one, and you would want to use a different stat to answer it. Can you use batting average? No, because while it does to an extent measure number of outs, it also does not take into account how players can avoid outs in other ways. Instead of batting average, you would want to use on-base percentage (OBP), as that accounts for walks and hit-by-pitches, which are two other ways that players can avoid outs.

Using OBP as a way to answer this question, here are the players with the lowest out rates (measured as 1 - OBP) since 2009.

1) Miguel Cabrera (.579)
2) Joey Votto (.582)
3) Joe Mauer (.590)

Again, because we wanted to answer a different question, we used a different statistic. Now let us answer a far more difficult question.

Question: Which player was the most productive offensively on average?

This is a significantly more difficult question. Can we use batting average to answer this question? No, because we know that batting average treats does not handle certain ways to avoid outs, and outs are important to how productive a player is. Then can we use OBP to answer the question? No, because OBP, like batting average, treats every hit in the same fashion, and we know that not all hits are worth the same. Can we use slugging percentage (SLG) then? No, because like in the case of batting average, SLG does not take into account walks and other ways to avoid outs.

But if each of these stats covers the others' faults, can we use all three at the same time? This is what we commonly refer to as the "triple slash" of AVG/OBP/SLG. It does give a better idea of which players have done well, but it is difficult to compare because we have no relative value between the numbers. How important is a point of OBP versus a point of SLG? Where does AVG fit into the picture? The triple slash is primarily descriptive -- that is, it answers the question of how a player produced his performance, but not necessarily how much that player produced.

So what can we do to measure how much a player produced? For that, we have to take answer questions about how we measure success in baseball.

Side Note: Success is measured in...?

How does a team measure success? Success is not measured in hits. It's not measured in walks. It isn't based on home runs. It is not based on strikeouts. However, those things all contribute to what ultimately determines a team's success: wins. Wins are the currency teams use to measure success, so why would we measure player success any differently? If we can find a way to turn player performance into a measure of wins, we can determine how much a player contributed to his team.

How are wins acquired? Well, teams generally need to score and prevent runs in order to get wins. Thus, if we can turn a player's offensive and defensive production into runs, we can use that as a proxy for wins.

As a a result, the takeaway here is the following: if you want to measure how productive a player is (and this is generally what people mean when they ask about how "good" a player is), you need to find a measure that converts all of his offensive and defensive performances into runs, which can then be converted into wins.

Back to the question

To answer the question of which player was the most productive offensively, we would a need a statistic that measures this quality in runs. This is especially useful because, if we know how to measure a player's contribution in runs, we can actually measure how much better a player was compared to another.

How do we do that? We will get to that question next week, but for now the takeaway from this article should be the following:

1. Always ask yourself what question am I trying to answer and what stat can I use to answer that question before discussing.

2. To measure the productivity of a player, you need a stat that measures wins or runs; this allows you to say how much a player produced instead of just how a player produced.