2021 Mets play-by-play data opened as a data-frame in RStudio

Guide: Play-by-Play Data from Retrosheet (Windows)

Part 1: Set Up

January 30, 2022

If we are looking for MLB play-by-play data from any game in the last century, retrosheet.org is the place to look. The team there has cataloged data from every MLB game since 1901 and made it freely available to the public. Unfortunately, you cannot download the data you need as a .csv, or other common data frame format. That’s where this guide comes in.

Step One: Tools

We will start by creating dedicated directory for Retrosheet data and tools on our computer. This can be anywhere on our computer, for this guide I created a new folder in my documents and named it “Retrosheet”.

With our directory set let’s install some tools we’ll need later for reading the files and converting them to a common format. We can navigate over to Retrosheet.org. Once there, we choose Play-by-play files under Data downloads in the navigation bar. On that page click the Software Tools link.

On the Software Tools page, we will click the link labeled BEVENT.EXE to download the zip archive called bevent.zip, which contains an executable file called BEVENT.EXE. This program is used for formatting and saving play-by-play event logs. Two other programs on this page, BOX.EXE, and BGAME.EXE, work similarly to create box scores and game info printouts respectively. We will not use these in this guide.

Once that finishes we open it and drag BEVENT.EXE to the “Retrosheet” directory we set up. If we run BEVENT.EXE now nothing happens because we need to run it using the command prompt. I’ll get to that in Part 2.

Step Two: Data

We must decide from the scope of our question what data we need. For the sake of example, let’s try to find all Home Runs hit by Mets batters on 3-0 counts, at home, in games with a margin of 5 or more runs in either direction during the 2021 regular season. (Note: different databases have information that better fit the scope of different questions, for questions that only require season totals and not play-by-play data the Sean Lahman baseball database may be a better alternative)

With the data we need in mind we can direct our attention back to Play-by-play files under Data downloads in the navigation bar on retrosheet.org and scroll down. You are presented with a several headings. For play-by-play data, we are interested in Event Files, not Box Score Event Files. This leaves several options: Regular Season Event Files, All-Star Game Event Files, and Post-Season Event Files. Which we choose depends on our question: since in this case we are looking for regular season Mets home games, we will find the heading for Regular Season Event Files. Under our subheading of choice, we download the zip archive with data from the season, all-star game, or postseason we are hoping to gleam insight from. For our example we download 2021eve.zip by clicking the 2021 link under the Regular Season Event Files subheading.

With your data downloaded, we extract it to our “Retrosheet” directory, where BEVENT.EXE is already. Now, in addition to our BEVENT.EXE, our Retrosheet folder should contain .ROS and .EVA or .EVN files for each participating team.

Next Steps

Just like that we are all set up. Stay tuned for Part 2, where we'll learn how to use BEVENT.EXE in the command prompt to create the files we want.