I had been wanting to write something on the Marcel projections for the Marlins, but I kept running into the problem of actually having to explain just what a Marcel projection entails, and in a fairly clear manner. If you've read... well, basically anything I've written here, you know clarity isn't necessarily my strong point.
But today, like a belated Christmas gift, Dave Studeman did it for me over at The Hardball Times. So before we begin, check out Studes' piece on regression analysis as it pertains to projections.
...welcome back. If you were a good little boy or girl, you read all the way to the end, where Studes even went through the trouble of explaining Marcel for me. You may recall a bit about Marcel showing a correlation of .66 last year, compared to the .74 of BP's PECOTA system. What Studes fails to mention (if I remember) is that the very best projection system can only expect to correlate at r=.77 -- that is, even if you know players' true skill level, and even if they face the same pitchers in the same parks, you can only expect a stat like OPS to show a correlation of .77 from year to year. The very fact that we have terms "fluke years" and "career years" should help you understand that random fluctuations in performance prevent any projection system from a perfect 1.0 correlation. I doubt you're interested in the math that shows the max r at .77, but just in case, it is begun here and cleaned-up/updated here.
deep breath - as I said, clarity's not my strong point
Ok, so for the part you actually care about. What does simple regression say about the 2007 Florida Marlins? Let's find out. Actually (there's that lack of clarity again), I should point out two things:
- The first thing you should look at is a player's projected PAs. These are regressed just like everything else (sometimes to amazing results) and serve as the basis for the counting stats. So if you think a guy's PA total is above or below what it will probably be (due to things we know that Marcel doesn't), mentally adjust the numbers accordingly.
- Next to a guy's name, I'll put the "reliability" rating on that projection. For example, Olivo's rating is .78. This means that the projection is essentially 78% Olivo's performance and 22% regression to the mean. A .00 would mean the projection is essentially a slightly age-adjusted mean. The highest reliability in this year's projections is .88 -- guys like Tejada, Ichiro, Abreu, etc. who have so many PAs in the last three years that their performance has only been regressed 12% towards the mean.
Olivo (0.78) 454 PAs .254/.295/.431 55 runs 16 HRs 57 RBI 6 SB
Jacobs (0.72) 471 PAs .274/.338/.494 56 runs 20 HRs 70 RBI 4 SB
Uggla (0.74) 542 PAs .284/.345/.476 81 runs 20 HRs 71 RBI 6 SB
Cabrera (0.87) 606 PAs .329/.406/.563 97 runs 27 HRs 103 RBI 6 SB
Ramirez (0.75) 550 PAs .298/.361/.482 90 runs 14 HRs 53 RBI 33 SB
Willingham (0.72) 489 PAs .277/.355/.472 56 runs 19 HRs 62 RBI 3 SB
Amezaga (0.65) 390 PAs .255/.324/.357 46 runs 6 HRs 31 RBI 15 SB
Abercrombie (0.54) 340 PAs .243/.309/.387 46 runs 8 HRs 35 RBI 6 SB
Hermida (0.62) 379 PAs .273/.353/.429 47 runs 10 HRs 42 RBI 6 SB
As you can see, I figure it's worth showing both Amezaga and Reggie's projections. But no, yeah, we're totally fine with in-house CF options Larry... le sigh. Also of note, Hermida would be the obvious PA adjustment guy, if we presume he's done with the whole cascading injuries thing.
I know what you're wondering right now. "But Dan, can Marcel project pitchers as well?" Why yes, yes it can. The thing to note here would be that I'll give two ERAs. The first is good old ERA (he says, holding back a groan). The second is a component ERA which, if I know my sabermetrics, is technically DIPS 3.0. You can read all about it here.
Willis (0.83) 195 IP 3.65/3.97 145 Ks 62 BBs
Johnson (0.65) 130 IP 3.46/3.74 110 Ks 54 BBs
Olsen (0.69) 152 IP 4.03/3.97 137 Ks 60 BBs
Nolasco (0.61) 117 IP 4.62/4.62 87 Ks 37 BBs
Sanchez (0.56) 115 IP 3.44/3.60 81 Ks 44 BBs
Petit (0.23) 41 IP 5.49/5.05 32 Ks 14 BBs
Pinto (0.25) 40 IP 4.05/4.50 35 Ks 20 BBs
Mitre (0.53) 67 IP 5.10/4.97 49 Ks 27 BBs
Kensing (0.34) 45 IP 4.80/4.60 40 Ks 19 BBs
Messenger (0.49) 59 IP 5.03/4.88 45 Ks 26 BBs
Tankersley (0.32) 45 IP 3.80/4.20 40 Ks 20 BBs
Obviously you have to be wary of those bullpen projections, with so few IPs of background leading to low reliability scores. Guys like Owens and Martinez weren't worth posting, with reliability scores in the 0.0x style.
On the other hand, if you said I could lock in those starters' ERAs right now, I'd be all over it. Unfortunately, such a deal has not yet been presented to me.
To finish off, I want to go back to the idea that these are the most basic projections one can look at. PECOTA is, for all intents and purposes, the most advanced -- or at least, has approached the limit of how much we can project. And rest assured, someone will do up a similar look at those when they come out in a few weeks. But there are many projection systems in between, which you can look at right now. Craig linked to the ZIPS projections at the end of this item. "Chone" Smith has worked out his own system, which you can get here. In the pay category, you can pick up the Bill James Handbook to find his work (and a lot more), or Ron Shandler's projections, which happen to be the most highly correlative of the systems not named after a utility infielder.
What I'm getting at is that these probably aren't the lines the 2007 Marlins will put up. But they are a start -- indeed, the start. And more importantly, they give us something to talk about besides trailer hitches (joke!).