STA 112, Data Science, Statcast

dc.contributor.author

Coleman, Jake

dc.contributor.author

Rundel, Colin

dc.date.accessioned

2016-12-12T15:20:53Z

dc.date.available

2016-12-12T15:20:53Z

dc.date.issued

2016-12-12

dc.description.abstract

In this Data Exploration, students were introduced to baseball dataset Statcast, downloaded from baseballsavant.mlb.com, that included every pitch thrown in the first week of the 2016 season, with 21 characteristics. The students were tasked with using R packages dplyr and ggplot2 to answer data exploration and summarizion questions. The exercises challenged them to use information about the data as well as newly acquired computation skills. The Statcast data is owned by MLB Advanced Media, L.P. and was downloaded from a search performed on baseballsavant.mlb.com for all pitches from 4/1/16 to 4/7/16. Statcast is a relatively new dataset (introduced in 2015), including all pitch characteristics from its precurser PitchF/X (such as pitch movement, type, start and end velocity, etc.). Statcast alsoadded tracking of the ball during the entirety of the play, as well as tracking for all elders. Full Statcast data is not yet available to the public, but Baseball Savant allows the public to have access to Statcast-added batted ball variables such as launch angle and batted ball speed. Dplyr is an extremely powerful tool for exploring data, using simple structure to perform complex data management tasks. Students were introducted to dplyr in a previous lecture, and used the Statcast data to gain hands-on experience working with data. Their tasks ranged from simple summaries to sophisticated manipulation (as real data is rarely in perfect form for desired analysis). They also integrated the R package ggplot2 to visualize some of their findings and draw further conclusions.

dc.description.sponsorship

iID, Data Expeditions

dc.identifier.uri

https://hdl.handle.net/10161/13261

dc.subject

R

dc.subject

dplyr

dc.subject

ggplot2

dc.subject

statcast

dc.subject

Data science

dc.title

STA 112, Data Science, Statcast

dc.type

Dataset; Learning object; Report

Files

Original bundle

Now showing 1 - 3 of 3
No Thumbnail Available
Name:
statcast.csv
Size:
1.05 MB
Format:
Description:
Statcast Data Set
Loading...
Thumbnail Image
Name:
Statcast_Data_Dictionary.pdf
Size:
118.28 KB
Format:
Adobe Portable Document Format
Description:
Data Dictionary
Loading...
Thumbnail Image
Name:
Expedition_Final.pdf
Size:
148.66 KB
Format:
Adobe Portable Document Format
Description:
Final Report

Collections