clock menu more-arrow no yes mobile

Filed under:

The Marlins, Justin Ruggiano, and Small Sample Sizes

ST. LOUIS, MO - JULY 7: Justin Ruggiano #20 of the Miami Marlins hits a two-run home run against the St. Louis Cardinals at Busch Stadium on July 7, 2012 in St. Louis, Missouri.  (Photo by Dilip Vishwanat/Getty Images)
ST. LOUIS, MO - JULY 7: Justin Ruggiano #20 of the Miami Marlins hits a two-run home run against the St. Louis Cardinals at Busch Stadium on July 7, 2012 in St. Louis, Missouri. (Photo by Dilip Vishwanat/Getty Images)
Getty Images

One of the big questions of the 2012 season for the Miami Marlins is the curious case of Justin Ruggiano's success. Ruggiano was a midseason trade acquisition from the Houston Astros who was picked up primarily to provide outfield depth in the wake of the Emilio Bonifacio injury. At that stage of Ruggiano's career, it seemed he was a well-known quantity: he was a quintessential Quad-A player with All-Star caliber numbers in Triple-A but never enough skill to make it at the big-league level. He displayed good-not-great power in Triple-A (career .194 ISO) and kept his batting average afloat with good BABIP (career .379) rather than avoiding strikeouts (career 25 percent rate).

But this season, it seems he has been reborn as some sort of hitting machine, as Ruggiano has taken the Marlins by storm with his bat. He is hitting .385/.452/.736 (.489 wOBA) in 106 PA. He is posting his highest walk rate of his admittedly short major league career (11.6 percent) while also striking out at the lowest rate of his career (22.6 percent). His .352 ISO seems like a video game number, as 18 of his 35 hits have gone for extra bases.

But this brings up the question of "is this for real?" Can Ruggiano continue to hit like this, or is this a small sample size flash-in-the-pan? It is hard to say, but we can go a long way towards determining that by looking at a little research done by some of my former Baseball Prospectus colleagues.

Small Sample Size and "Stabilizing" Stats

A few years ago, former BP colleague Russell Carleton composed one of the more intriguing sabermetrics studies in a long time. In his piece, he looked at how large a sample size it would take for various hitting and pitching statistics to "stabilize" and better predict future performance and true talent level.

The piece was a hit. In fact, it was so much of a hit that, when former BP colleague and my old boss Derek Carty of BP Fantasy re-did the study with improved methodology, he referenced Carleton's piece as:

perhaps the most-referenced study in our little corner of the internet.

He would probably be right. Stat geeks like myself love to try and predict the future, and the tools that Carleton gave us and that Carty and others have since improved have been tremendous for our attempts at guessing which small sample size miracles are "for real" and which are all sorts of fluky.

So how do we use the data that Carty has provided, and what does this have to do with Justin Ruggiano?

In the above linked piece by Carty, he provides a table with various important statistics that help determine a hitter's skill. He also provides the sample size at which the study showed a 0.5 correlation in split-half studies of previous data.

Hitters

Stat

Denominator

Stabilizes

Years

K

PA-IBB-HBP

100

0.2

UIBB

PA-IBB-HBP

168

0.3

IBB

PA

253

0.4

HBP

PA-IBB

501

0.8

1B

PA-HBP-K-BB-HR-ROE

959

2.1

2B+3B

PA-HBP-K-BB-HR-ROE

833

1.8

2B

2B+3B

48

1.5

3B

2B+3B

48

1.5

1B+2B+3B (BABIP)

PA-HBP-K-BB-HR-ROE

1126

2.4

HR

PA-K-BB-HBP

143

0.3

HR (HR/FB)

OF FB [MLBAM]

62

0.5

HR (HR/FB)

OF FB [RS]

65

0.5

GB [MLBAM]

GB+OF+IF+LD

109

0.2

GB [RS]

GB+OF+IF+LD

116

0.2

OF FB [MLBAM]

GB+OF+IF+LD

182

0.4

OF FB [RS]

GB+OF+IF+LD

189

0.4

IF FB [MLBAM]

GB+OF+IF+LD

194

0.4

IF FB [RS]

GB+OF+IF+LD

233

0.5

LD [MLBAM]

GB+OF+IF+LD

795

1.7

LD [RS]

GB+OF+IF+LD

979

2.1

SB%

SB+CS

Inconclusive*

SBA%

1B+UIBB+HBP+ROE+FC

39

0.3

Keep in mind what those sample sizes mean. Those samples, as written in the third column of the table, show how much time it would take for you to regress a player's performance halfway to the mean, which we will take to be the league average for convenience. That means that, for example, if you only knew Justin Ruggiano's strikeout rate (the first row of the table) and the league average strikeout rate and he had 100 PA not ending in intentional walks or hit-by-pitches, your best guess for his "true" strikeout rate is halfway between his strikeout rate and the league average.

The fourth column shows a number of years that it would approximately take for a player to play full-time and reach that sample. Immediately, you can see how fast certain numbers "stabilize" or reach the halfway point of regression or predictability and how some others are much slower. For example, strikeouts and walks become halfway predictable very quickly, in less than a third of a full season. That is why, for example, I was willing to discuss changes in approach by players like Jose Reyes and John Buck who had changed their strikeout and walk rates significantly. Other things, such as singles, doubles, and BABIP in general, take a much longer amount of time. When I say BABIP takes a long time to deem as "not fluky," it is because it takes almost two and a half seasons for a players batting average on balls in play to halfway predict his skill.

Ruggiano's Sample

Let's look at a table with Ruggiano's sample of the data provided above. For the purposes of including more numbers into his sample, we will look at his 2011 and 2012 stats combined.

Ruggiano, Stat Sample 2010-2011 Regressed True
K% 217 23.0 22.0
UIBB% 217 7.4 7.7
HBP 217 0.0 0.055
1B% 139 24.4 22.5
2B+3B% 139 12.2 8.2
2B/(2B+3B) 17 94.1 91.1
3B/(2B+3B) 17 5.9 8.9
BABIP (%) 139 36.7 30.5
HR% 151 6.6 5.2

What does this tell you? As expected, Ruggiano's strikeout and walk rates are already more or less where they are expected to be; over the last two seasons, he has been a bit below the league average in both those categories and is likely to remain that way. On the other hand, most of his hit statistics should be regressed a decent amount, in particular his rate of doubles and triples per ball in play. We have seen Ruggiano smack 12 doubles this season in short order, and that rate should definitely go down going forward. As a result, his hits on balls in play should also go down, as you can see his batting average on balls in play of .367 had to be regressed a ton down to a .305 mark, much closer to the 2012 average of .297.

Using these above rates, we can construct an estimate of Ruggiano's true talent level. If we take those rates and fill in a batting line for him, we would estimate that Ruggiano has a true-talent level of .261/.322/.445, To get an idea of what that batting line is worth, players with similar batting lines in 2012 include Justin Morneau (.257/.318/.450, .327 wOBA) and Nelson Cruz (.262/.318/.433, .324 wOBA). As a corner outfielder and occasional (and decent) center fielder, that sort of projection is not bad at all.

And guess what? ZiPS projects a .267/.325/.445 line (.336 wOBA) going forward the rest of the season. Our estimated line was an almost perfect match. ZiPS projects Ruggiano to be worth 0.9 Wins Above Replacement (WAR) in his next 161 PA, and my guess is he gets closer to 250 PA and 1.4 WAR the rest of the year. Sure, the Marlins are not likely to get the ridiculous level of play they received thus far this season, but Ruggiano has proven enough in 100 PA to show that he is no slouch or Quad-A guy anymore.