Beane

So you think somebody other than Barry Bonds should be the 2004
National League MVP? You must be a Dodgers fan. Or, more likely,
you’re unaware that there’s been a revolution in the statistical
analysis of baseball, which clearly puts Bonds head and shoulders
over the rest of the NL field.
So you think somebody other than Barry Bonds should be the 2004 National League MVP? You must be a Dodgers fan. Or, more likely, you’re unaware that there’s been a revolution in the statistical analysis of baseball, which clearly puts Bonds head and shoulders over the rest of the NL field. (A third possibility is that you’re disturbed by the growing body of evidence that Bonds used steroids to put up his amazing numbers … but, hey, let’s stick to the story line here. It’s the numbers, not the ethical considerations, until further notice.)

Now we don’t expect Dodgers fans to be aware of this stuff, but most dedicated followers of baseball know that the new baseball statisticians – or “sabermatricians” as they’re sometimes called … “stat geeks” is another less flattering name – have changed the way we evaluate players’ past performance and predict their future exploits. And here’s what we know about Bonds’ amazing 2004 season: He had a RC27 rate, of 20.21, nearly twice that of his closest competitor, Colorado’s Todd Helton, at 11.04. His IsoP, .450, was more than 100 points higher than the second-place guy, Edmonds, at .341. OPS? 1.422 to 1.088 (Helton). SecA? 1.086 to .554 (Edmonds again). And that’s not even getting into Win Shares and VORP.

Huh? What’s this alphabet soup we’re peddling, you ask? We’ll get to that, but first a quick look back at how higher math replaced the “gut feeling” as the tool of choice in judging ballplayers.

There’s no place like home

Back in 1977, a fellow from Kansas named Bill James published a newsletter he called “The Bill James Baseball Abstract.” It was a stat freak’s dream, full of tables and charts detailing things like which pitcher/catcher combos gave up the most stolen bases. James’ specialty was taking an accepted “fact,” like the idea that attendance at ballgames rises when a star pitcher is on the mound, and exploding it with a careful examination of the available evidence. In the case of the star pitcher-attendance myth, he simply compared attendance figures (available at the bottom of a boxscore) on dates top pitchers like Nolan Ryan pitched and dates that they didn’t, and found no correlation to increased attendance whatsoever.

James was not a great writer, in the sense that readers of thrilling, mythologizing sports writing had become accustomed to since the days of Grantland Rice and Damon Runyon (though he got a lot better over the years). Nor was he the first to really attack the statistics with an open mind and a mathematician’s vigor. Earnshaw Cook’s influential “Percentage Baseball” came out in the 1960’s. And Henry Chadwick, the grandaddy of the stat geeks, invented – way back in the 1860’s – the box score that is still largely in use today, as well as batting average (BA) and earned run average (ERA).

But James, by plugging away with and expanding his annual “Baseball Abstracts” each year, somehow touched off a statistical revolution of the sort that hadn’t been seen since Chadwick himself had essentially invented sports statistics. Many of today’s sabermatricians (the name is derived from “SABR,” the Society for American Baseball Research) credit James for inspiring their own interest in the scientific study of baseball. Nor did James only inspire young people who loved sports first and math second. He also helped to legitimize the use of baseball statistics in probability studies at top university math departments like MIT, Harvard and Stanford.

It would not be much of a stretch to say that James played Albert Einstein to Chadwick’s Sir Isaac Newton. Chadwick’s statistics, like Newton’s model of the universe, were an inspired and mostly accurate picture of reality; James, like Einstein, built upon the older theories to create a more refined and accurate view.

By the late 1990’s, stat geeks were in ascendence. They had publications devoted almost entirely to statistical analysis like The Baseball Prospectus (www.baseballprospectus.com). They were taken seriously by mainstream sports media outlets like ESPN, which had hired James protogé Rob Neyer as a regular columnist. And they were even starting to make inroads into professional baseball itself – Oakland A’s General Manager Billy Beane was listening to his resident stat geek Paul DePodesta more than his old school advisers, and would soon be building winning teams on the cheap as a result.

The ‘Moneyball’ era

“To a great extent, Major League Baseball has been insulated from many of the competitive pressures that other businesses face every day,” write the editors of The Baseball Prospectus. “Modern management techniques have been slow to arrive in MLB front offices. The intense pressure that drove millions of businesses to invest and focus on improvement has been absent, or at least barely noticeable in baseball circles.

“Not anymore. the information revolution has finally arrived in baseball.”

If one man can be said to have ushered in this new information age, it’s Oakland’s Billy Beane. As described in Michael Lewis’ “Moneyball,” an account of the A’s remarkable success despite one of MLB’s lowest payrolls, Beane was the first GM to take the new stats to heart in making player-acquisition decisions. He was able, by favoring stats like on-base percentage over the traditionally valued batting average, to build a cheap roster of cast-offs from other teams that somehow defied the conventional predictions, and won. A lot.

Beane’s success, and the success of the best-selling “Moneyball,” turns out to have been a bit of a double-edged sword. Whereas he was once able to trade for undervalued players or pick them up off the scrap heap, Beane now finds himself competing with a number of MLB front offices that take sabermetrics just as seriously as he does. His own protegés – Paul DePodesta, now GM of the Dodgers, and J.P. Ricciardi, GM of the Toronto Blue Jays – now compete with him for the undervalued players to whom he once had exclusive rights.

The statistical revolution has more than begun, it’s in full swing. Front offices from Boston (where Bill James himself is a special adviser) to San Diego are in on the game. The window of opportunity for a Billy Beane to build a champion on the cheap is closing fast – when the New York Yankees are able to throw their massive treasure chest behind a scientific study of the ballplayers they’d like to acquire, it seems inevitable that poorer teams like the A’s will be swept aside.

Not so fast. For one thing, despite the huge payroll disparities, teams still have to play the games. All 162 of them in a baseball season. Because the season is so long in baseball, even slight edges in hitting, pitching and defense can mean the difference in winning the division by three games or losing it by five. For ballclubs without the massive payrolls of the Yankees, Red Sox or Dodgers, the key to success will be in finding those edges by hook, crook and better statistical analysis.

Bonds for Superman

Which brings us back to that alphabet soup of statistics that show how much greater Barry Bonds is than any other player in the majors. Take his 2004 RC27 rate, the projected number of runs a batting lineup composed of nine Barry Bondses would score per game. It’s an astonishing 20.21 runs per game.

Just take a moment to reflect on that. Sure, a team of nine Bondses probably wouldn’t play shortstop very well, let alone pitch, but … 20 runs a game! How insane is that? Considering that Bonds plays half his games in a pitcher’s park, the distance between him and the second-best RC27 guy, Todd Helton, who plays at the launching pad that is Coors Field, is even more astonishing.

And if you don’t buy into RC27, there are a host of other stats that show how Bonds is in a different league, compared to his peers (for an explanation of some of those stats, see box).

In fact, there are probably too many stats, at least for the casual fan. Part of the challenge of understanding baseball in the information age is to be able to make an informed decision on which statistics to really value, because all have their proponents and all have their detractors. Some statistics seem to tell us a great deal but don’t (like batting average and RBIs, for example), while others seem completely counterintuitive yet correlate very closely to the metastatistic – who won and who lost – we all really care about.

When it comes to a Barry Bonds, it really doesn’t matter which stats we look at. Whether it’s the old “Triple Crown” stats – BA, home runs and RBI – or the new stats like RC27 and Win Shares, he’s going to stand out. But what about two utility infielders your team is trying to choose between in spring training? One might look superior according to one set of stats, while the other outshines his rival by another set of standards. Which set of stats inform your team’s decision?

If you’re a fan of a team without the payroll leeway to make a few mistakes and still win, you’d better hope it’s the right one.

DIPs and WHIP and VORP, oh my!

You probably know about batting average, RBIs and ERA. Here are a few of the newfangled statistics out there, with a brief explanation of their uses:

BATTING STATISTICS

On-base percentage (OBP)

(H + BB + HBP) divided by (AB + BB + HBP + SF)

This stat tells us the rate at which a hitter reaches base, arguably the most important thing he can do, because if a lineup reaches base 100 percent of the time, it scores an infinite amount of runs.

Slugging percentage (SLG)

Total bases divided by ABs

Is the hitter a dinky singles guy or a masher who knocks doubles into the alleys and homers into the cheap seats? Slugging percentage tells us at a glance.

On-base percentage plus slugging percentage (OPS)

OBP plus SLG

A simple, if imperfect method of determining the value of a hitter by simply adding his two most important percentages together.

Runs created (RC)

[(H + BB + HBP – CS – GIDP) times (TB + .26[BB – IBB + HBP] + .52[SH + SF + SB])] divided by (AB + BB + HBP + SH+ SF)

The theoretical number of runs a hitter “created,” or earned for his team, over the course of a season.

Runs created per 27 outs (RC27)

A complicated formula that estimates the theoretical number of runs per game a given hitter would score in a lineup composed entirely of him.

PITCHING STATISTICS

Walks plus Hits per Innings Pitched (WHIP)

BB plus H divided by IP

A great stat for determining the effectiveness of a pitcher beyond his ERA or won-loss record.

Run support (RS)

Runs scored by a pitcher’s team (average, per 9 innings pitched) while he was pitcher of record. Very important for determining whether a good pitcher’s poor won-loss record was really the result of bad luck and/or a poor offense on the days he pitched.

Defense Independent Pitching Statistics ERA (DIPS ERA)

A pitcher’s ERA, independent of the defense behind him. This formula, based on essays by Voros McCracken, assumes that pitchers have no control over whether non-HR balls put in play against them fall for hits or outs. Probably the most controversial and counterintuitive bit of statistical analysis to come along in years, DIPS has nevertheless stood up against many attempts to discredit it.

OTHER STATS AND FACTORS TO CONSIDER

Sample Size

Stats are only valuable if you have enough to work with. For example, it would be foolish to think that a hitter who goes 2 for 4 in the first game of the season projects as a .500 hitter for the rest of the year.

Value Over Replacement Player (VORP)

A complicated formula purporting to show how a given player or pitcher compares to a “replacement player” at his position (e.g. an average shortstop picked up off the waiver wire during the course of a season). This stat is valuable because it indicates how much better (or worse) your team’s shortstop or closer is than a cheap journeyman picked to fill a roster spot; in other words, how much value your team is actually getting from the guy costing it $5 million a year against the league minimum.

Win Shares (WS)

A Bill James stat that tries to show exactly how many wins a given player or pitcher is worth to a team; 10+ wins per season is exceptional.

Park Factor (PF)

A stat that attempts to qualify ballpark effects on hitters, pitchers and defenses. Dodger Stadium is famously a very tough park for hitters, while Coors Field, home of the Colorado Rockies, is very tough on pitchers.

KEY – AB: At Bats; BB: Bases on Balls; CS: Caught Stealing; ERA: Earned Run Average; GIDP: Ground into Double Play; HBP: Hit by Pitch; IBB: Intentional Bases on Balls; IP: Innings Pitched; SB: Stolen Bases; SF: Sacrifice Flies; SH: Sacrifice Hits; TB: Total Bases

Previous articleState props, local issues
Next articleFacts and stats about bike helmets

LEAVE A REPLY

Please enter your comment!
Please enter your name here