The Effects of Spin Rate on an MLB Pitcher's Performance: A Case Study on the 2015 - 2023 Seasons | Andrew CernosekΒΆ
Introduction ΒΆ
Iβve always been a huge baseball fan, but over the last few years Iβve also become a big baseball stats fan. Iβll spend hours digging through stats and often wondering all the various factors that influence those numbers.
During my junior year in high school, my AP Research class assigned a year-long project that could be about anything I wanted as long as it would lead to a new understanding. Itβs no surprise that I decided to focus on something related to baseball analytics.
I started to do some research on various ideas and quickly saw that spin rate was a really hot topic, but with very little related research. I also noticed plenty of misconceptions, such as websites claiming that spin rate directly led to a pitch being βbetterβ. I wanted to find if there was any sort of validity in that statement. The goal in doing this project was/is to see if there is any sort of correlation between the change in spin rate and the outcome of a pitcherβs performance.
While I was really proud the work I did on that project, I was also motivated to expand on what I had learned. I had originally relied on manually importing StatCast stats from Baseball Savant, manipulating data in Excel, and using StatCrunch to report my findings. After completing an online course by the University of Michigan titled "Foundations of Sports Analytics: Data, Representation, and Models in Sports" I realized there was a 'better' way.
Using many of the coding principals from that course along with pybaseball, a Python package for baseball data analysis, I was able to streamline my workflow as well as make it much easiser to expand the analysis across multiple seasons.
Set Up ΒΆ
The first thing we need to do is to import Python packages which allow us to collect, analyze, and visualize data. I've learned that these packages are pretty standard within the data analytics community.
# normal imports
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import seaborn as sns
import warnings
# disable warnings
warnings.filterwarnings('ignore')
As I mentioned above, pybaseball is a really useful Python package. It allows you to import stats into a dataframe directly from popular sites, including Baseball Savant (which has the StatCast stats that I wanted to use for my analysis). Unfortunately, pybaseball didn't have a built-in function that grabbed all of the columns of data that I had previously used when importing things manually. However, by looking at how the existing pybaseball functions were defined, I was able to create a custom function that did exactly what I needed:
def statcast_pitcher_pitch_spin(year: int, minP: int = 100) -> pd.DataFrame:
url = f"https://baseballsavant.mlb.com/leaderboard/custom?year=\
{year}&type=pitcher&filter=&sort=4&sortDir=asc&min=\
{minP}&selections=k_percent,bb_percent,p_era,batting_avg,exit_velocity_avg,whiff_percent,\
groundballs_percent,flyballs_percent,popups_percent,fastball_avg_spin,breaking_avg_spin,\
n_breaking_formatted,offspeed_avg_spin,n_offspeed_formatted&csv=true"
res = requests.get(url, timeout=None).content
data = pd.read_csv(io.StringIO(res.decode('utf-8')))
data = sanitize_statcast_columns(data)
return data
After doing that, we can import this new function and use it to create a dataframe with the stats that we want. The years are set from 2015 (the start of the StatCast Era) to 2023 (the current year as of publishing this), and the minimum amount of plate appearances against the pitcher is set to 100. The results of my custom function are listed on the columns below.
# pybaseball
from pybaseball import statcast_pitcher_pitch_spin
data_spin_all = statcast_pitcher_pitch_spin('2023,2022,2021,2020,2019,2018,2017,2016,2015', minP=100)
print(data_spin_all.columns.tolist())
data_spin_all
['last_name', 'first_name', 'player_id', 'year', 'k_percent', 'bb_percent', 'p_era', 'batting_avg', 'exit_velocity_avg', 'whiff_percent', 'groundballs_percent', 'flyballs_percent', 'popups_percent', 'fastball_avg_spin', 'breaking_avg_spin', 'n_breaking_formatted', 'offspeed_avg_spin', 'n_offspeed_formatted']
| last_name | first_name | player_id | year | k_percent | bb_percent | p_era | batting_avg | exit_velocity_avg | whiff_percent | groundballs_percent | flyballs_percent | popups_percent | fastball_avg_spin | breaking_avg_spin | n_breaking_formatted | offspeed_avg_spin | n_offspeed_formatted | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Colon | Bartolo | 112526 | 2015 | 16.7 | 2.9 | 4.16 | 0.281 | 88.9 | 14.4 | 44.1 | 23.2 | 5.7 | 2161 | 2164.0 | 10.0 | 1727.0 | 7.4 |
| 1 | Hawkins | LaTroy | 115629 | 2015 | 21.0 | 4.3 | 3.26 | 0.286 | 89.7 | 17.0 | 55.4 | 18.2 | 5.8 | 2051 | 2072.0 | 16.6 | 1698.0 | 8.0 |
| 2 | Wolf | Randy | 150116 | 2015 | 17.4 | 9.3 | 6.23 | 0.319 | 89.0 | 16.2 | 46.2 | 17.9 | 5.1 | 2032 | 2176.0 | 40.2 | 1669.0 | 11.1 |
| 3 | Marquis | Jason | 150302 | 2015 | 17.1 | 6.5 | 6.46 | 0.330 | 90.4 | 21.3 | 48.8 | 17.1 | 3.7 | 1782 | 1977.0 | 19.4 | 1239.0 | 21.2 |
| 4 | Burnett | A.J. | 150359 | 2015 | 20.5 | 7.0 | 3.18 | 0.275 | 89.8 | 21.1 | 55.0 | 14.1 | 4.0 | 2009 | 2023.0 | 29.4 | 1678.0 | 8.8 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4309 | Woo | Bryan | 693433 | 2023 | 25.9 | 7.3 | 4.75 | 0.242 | 87.2 | 27.8 | 38.8 | 28.9 | 7.9 | 2179 | 2358.0 | 11.8 | 1713.0 | 3.6 |
| 4310 | Elder | Bryce | 693821 | 2023 | 17.6 | 7.9 | 3.46 | 0.238 | 89.5 | 23.7 | 53.0 | 20.6 | 4.8 | 1998 | 2396.0 | 36.7 | 2032.0 | 12.1 |
| 4311 | Pfaadt | Brandon | 694297 | 2023 | 20.5 | 6.6 | 6.91 | 0.303 | 90.4 | 24.6 | 34.1 | 30.1 | 8.5 | 2445 | 2667.0 | 32.9 | 1946.0 | 14.7 |
| 4312 | Shuster | Jared | 694363 | 2023 | 13.0 | 11.4 | 5.00 | 0.250 | 89.8 | 21.3 | 35.2 | 29.0 | 12.4 | 2136 | 2204.0 | 34.2 | 1492.0 | 21.7 |
| 4313 | Hartwig | Grant | 701643 | 2023 | 17.1 | 11.4 | 4.94 | 0.253 | 88.1 | 20.9 | 45.2 | 19.2 | 4.1 | 2156 | 2429.0 | 33.1 | 1792.0 | 4.5 |
4314 rows Γ 18 columns
A huge part of this entire study is that I want to isolate both the fastball's average spin rate and also the secondary pitch's spin rate. However, there are three types of pitches listed on StatCast: fastballs, breaking balls, and offspeed. To simplify things, I wanted to group both breaking balls and offspeed together. When originally looking at them separately, I noticed a high amount of outliers where some pitchers threw a pitch labeled as "offspeed" with a lot more spin than typically seen by anyone else's pitches of that type. Because of this, I used a weighted average formula (breaking % * breaking spin + offspeed % * offspeed spin) to determine the exact average spin rate of a secondary pitch thrown by a pitcher, now labeled as `pitch2_avg_spin`.
# massage data - change NaN values to zero, formula to calculate secondary pitch average spin based on pitch usage percentages
data_spin_all = data_spin_all.fillna(0)
data_spin_all["pitch2_avg_spin"] = (data_spin_all['n_breaking_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['breaking_avg_spin']) + \
(data_spin_all['n_offspeed_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['offspeed_avg_spin'])
# change NaN to zero again & drop pitchers without a secondary pitch
data_spin_all = data_spin_all.fillna(0)
data_spin_all = data_spin_all[data_spin_all['pitch2_avg_spin'] != 0 ]
data_spin_all["pitch2_avg_spin"] = data_spin_all["pitch2_avg_spin"].astype('int')
data_spin_all = data_spin_all.drop(columns=['player_id','breaking_avg_spin','offspeed_avg_spin','n_breaking_formatted','n_offspeed_formatted'])
data_spin_all
| last_name | first_name | year | k_percent | bb_percent | p_era | batting_avg | exit_velocity_avg | whiff_percent | groundballs_percent | flyballs_percent | popups_percent | fastball_avg_spin | pitch2_avg_spin | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Colon | Bartolo | 2015 | 16.7 | 2.9 | 4.16 | 0.281 | 88.9 | 14.4 | 44.1 | 23.2 | 5.7 | 2161 | 1978 |
| 1 | Hawkins | LaTroy | 2015 | 21.0 | 4.3 | 3.26 | 0.286 | 89.7 | 17.0 | 55.4 | 18.2 | 5.8 | 2051 | 1950 |
| 2 | Wolf | Randy | 2015 | 17.4 | 9.3 | 6.23 | 0.319 | 89.0 | 16.2 | 46.2 | 17.9 | 5.1 | 2032 | 2066 |
| 3 | Marquis | Jason | 2015 | 17.1 | 6.5 | 6.46 | 0.330 | 90.4 | 21.3 | 48.8 | 17.1 | 3.7 | 1782 | 1591 |
| 4 | Burnett | A.J. | 2015 | 20.5 | 7.0 | 3.18 | 0.275 | 89.8 | 21.1 | 55.0 | 14.1 | 4.0 | 2009 | 1943 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4309 | Woo | Bryan | 2023 | 25.9 | 7.3 | 4.75 | 0.242 | 87.2 | 27.8 | 38.8 | 28.9 | 7.9 | 2179 | 2207 |
| 4310 | Elder | Bryce | 2023 | 17.6 | 7.9 | 3.46 | 0.238 | 89.5 | 23.7 | 53.0 | 20.6 | 4.8 | 1998 | 2305 |
| 4311 | Pfaadt | Brandon | 2023 | 20.5 | 6.6 | 6.91 | 0.303 | 90.4 | 24.6 | 34.1 | 30.1 | 8.5 | 2445 | 2444 |
| 4312 | Shuster | Jared | 2023 | 13.0 | 11.4 | 5.00 | 0.250 | 89.8 | 21.3 | 35.2 | 29.0 | 12.4 | 2136 | 1927 |
| 4313 | Hartwig | Grant | 2023 | 17.1 | 11.4 | 4.94 | 0.253 | 88.1 | 20.9 | 45.2 | 19.2 | 4.1 | 2156 | 2352 |
4309 rows Γ 14 columns
Linear Regression ΒΆ
After getting all of the data in the right format, I then needed to run a linear regression analysis on the entire data set to view the correlation between both "Fastball Spin" & "Pitch 2 Spin" and all of the other variables listed above (K%, BB%, etc.). Linear regression is essentially a statistical test that you can run in order to compare two variables to each other β an independent variable (i.e. average spin rate) and a dependent variable (i.e. all the various pitching & batting stats). One can derive a lot of components from the regression results, but the most important takeaway is the p-value. The p-value describes how likely the results would have occured by random chance. When the p-value is below 0.05, it means that there is a correlation between the two variables.
As shown below, there are a vast number of these statistics that have p-values below 0.05, meaning that there is a correlation between a lot of these statistics and the average spin of both fastballs and secondary pitches.
# calculate p-value per year
from scipy.stats import pearsonr
# method to use for 'corr' function to return p-value
# https://stackoverflow.com/questions/52741236/how-to-calculate-p-values-for-pairwise-correlation-of-columns-in-pandas
def pearsonr_pval(x,y):
return pearsonr(x,y)[1]
# change settings for prettier output of p-value correlations
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
# list of years to cycle through for finding specific year p-values
years = [2023,2022,2021,2020,2019,2018,2017,2016,2015]
# all years
with pd.option_context('display.float_format', '{:0.6f}'.format):
data_spin_pv = data_spin_all.drop(columns=['last_name', 'first_name', 'year'])
corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
print("All Years\n{}\n".format(corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))
# individual years
for y in years:
with pd.option_context('display.float_format', '{:0.6f}'.format):
data_spin_year = data_spin_all.loc[data_spin_all['year'] == y]
data_spin_pv = data_spin_year.drop(columns=['last_name', 'first_name', 'year'])
corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
corr2 = corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]
print("\n{}: P-Values\n{}\n".format(y,corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))
# just print the statistically significant correlations (< 0.05)
corr2 = corr2[corr2 < .05].unstack().transpose()\
.sort_values( ascending=True).dropna()
print(corr2)
All Years
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.000000 0.000000 0.000000 0.014151 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000000 0.000000 0.000000 0.000000 0.008010 0.000000 0.271558 0.000524 0.026414
2023: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.017246 0.011853 0.000000 0.046336 0.000000 0.000002 0.000003 0.000000
pitch2_avg_spin 0.003373 0.256899 0.076596 0.018759 0.103078 0.077300 0.323860 0.211900 0.663560
whiff_percent fastball_avg_spin 0.000000
k_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000002
flyballs_percent fastball_avg_spin 0.000003
k_percent pitch2_avg_spin 0.003373
p_era fastball_avg_spin 0.011853
bb_percent fastball_avg_spin 0.017246
batting_avg pitch2_avg_spin 0.018759
exit_velocity_avg fastball_avg_spin 0.046336
dtype: float64
2022: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.040666 0.000008 0.000000 0.334567 0.000000 0.000000 0.000001 0.000000
pitch2_avg_spin 0.000019 0.001861 0.003260 0.000008 0.068946 0.000427 0.826508 0.863145 0.088943
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000001
p_era fastball_avg_spin 0.000008
batting_avg pitch2_avg_spin 0.000008
k_percent pitch2_avg_spin 0.000019
whiff_percent pitch2_avg_spin 0.000427
bb_percent pitch2_avg_spin 0.001861
p_era pitch2_avg_spin 0.003260
bb_percent fastball_avg_spin 0.040666
dtype: float64
2021: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.000238 0.000233 0.000000 0.246282 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000000 0.001348 0.073900 0.000000 0.053847 0.000000 0.350141 0.803763 0.002977
whiff_percent fastball_avg_spin 0.000000
k_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
k_percent pitch2_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
whiff_percent pitch2_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
batting_avg pitch2_avg_spin 0.000000
p_era fastball_avg_spin 0.000233
bb_percent fastball_avg_spin 0.000238
pitch2_avg_spin 0.001348
popups_percent pitch2_avg_spin 0.002977
dtype: float64
2020: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.047310 0.010269 0.000000 0.266034 0.000000 0.000007 0.000281 0.000073
pitch2_avg_spin 0.000148 0.356114 0.000515 0.000172 0.285386 0.071479 0.310271 0.046058 0.542162
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000007
popups_percent fastball_avg_spin 0.000073
k_percent pitch2_avg_spin 0.000148
batting_avg pitch2_avg_spin 0.000172
flyballs_percent fastball_avg_spin 0.000281
p_era pitch2_avg_spin 0.000515
fastball_avg_spin 0.010269
flyballs_percent pitch2_avg_spin 0.046058
bb_percent fastball_avg_spin 0.047310
dtype: float64
2019: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.016385 0.035657 0.000001 0.005820 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000000 0.001347 0.130905 0.002535 0.242583 0.000465 0.146435 0.061935 0.257985
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
k_percent pitch2_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000001
whiff_percent pitch2_avg_spin 0.000465
bb_percent pitch2_avg_spin 0.001347
batting_avg pitch2_avg_spin 0.002535
exit_velocity_avg fastball_avg_spin 0.005820
bb_percent fastball_avg_spin 0.016385
p_era fastball_avg_spin 0.035657
dtype: float64
2018: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.135721 0.001833 0.000000 0.110512 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000013 0.274898 0.052871 0.001341 0.530645 0.000180 0.382741 0.242414 0.114129
whiff_percent fastball_avg_spin 0.000000
k_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
k_percent pitch2_avg_spin 0.000013
whiff_percent pitch2_avg_spin 0.000180
batting_avg pitch2_avg_spin 0.001341
p_era fastball_avg_spin 0.001833
dtype: float64
2017: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.017546 0.000023 0.000000 0.664940 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000035 0.112367 0.000112 0.000114 0.026909 0.001338 0.583682 0.547332 0.716795
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
p_era fastball_avg_spin 0.000023
k_percent pitch2_avg_spin 0.000035
p_era pitch2_avg_spin 0.000112
batting_avg pitch2_avg_spin 0.000114
whiff_percent pitch2_avg_spin 0.001338
bb_percent fastball_avg_spin 0.017546
exit_velocity_avg pitch2_avg_spin 0.026909
dtype: float64
2016: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.008719 0.010021 0.000000 0.108696 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.000001 0.292678 0.007418 0.000043 0.065793 0.000203 0.002991 0.156506 0.107725
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
popups_percent fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
k_percent pitch2_avg_spin 0.000001
batting_avg pitch2_avg_spin 0.000043
whiff_percent pitch2_avg_spin 0.000203
groundballs_percent pitch2_avg_spin 0.002991
p_era pitch2_avg_spin 0.007418
bb_percent fastball_avg_spin 0.008719
p_era fastball_avg_spin 0.010021
dtype: float64
2015: P-Values
k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent
fastball_avg_spin 0.000000 0.415631 0.000970 0.000000 0.016411 0.000000 0.000000 0.000000 0.000000
pitch2_avg_spin 0.007602 0.225946 0.015076 0.004982 0.068573 0.454784 0.393288 0.266614 0.589300
popups_percent fastball_avg_spin 0.000000
groundballs_percent fastball_avg_spin 0.000000
k_percent fastball_avg_spin 0.000000
whiff_percent fastball_avg_spin 0.000000
flyballs_percent fastball_avg_spin 0.000000
batting_avg fastball_avg_spin 0.000000
p_era fastball_avg_spin 0.000970
batting_avg pitch2_avg_spin 0.004982
k_percent pitch2_avg_spin 0.007602
p_era pitch2_avg_spin 0.015076
exit_velocity_avg fastball_avg_spin 0.016411
dtype: float64
Pretty Pictures ΒΆ
After obtaining the p-values, I then wanted to graph these relationships on scatterplots while also showing the regression line (or line of best fit). I overlayed each year ontop of each other using the "hue" function, and then graphed each relationship using the pitch's spin as the x-value and the various statistics as the y-value. After that, I wanted to create individual, year-by-year graphs for the statistics I deem to be most important in there being a correlation. This is all layed out below.
# Graph Pitching Stats
custom_palette=sns.color_palette("Paired",9)
sns.set_theme(style="white",palette=custom_palette)
plt.rc('legend',fontsize=16, title_fontsize=16,markerscale=3.0)
pp_pitching = sns.pairplot(data=data_spin_all,y_vars=["k_percent", "bb_percent", "p_era", "batting_avg"],\
x_vars=["fastball_avg_spin", "pitch2_avg_spin"], kind='reg',markers='.',hue='year',height=2,aspect=2)
pp_pitching = pp_pitching.map(plt.scatter)
xlabels,ylabels = [],[]
for ax in pp_pitching.axes[-1,:]:
xlabel = ax.xaxis.get_label_text()
xlabels.append(xlabel)
for ax in pp_pitching.axes[:,0]:
ylabel = ax.yaxis.get_label_text()
ylabels.append(ylabel)
for i in range(len(xlabels)):
for j in range(len(ylabels)):
pp_pitching.axes[j,i].xaxis.set_label_text(xlabels[i],visible=True)
pp_pitching.axes[j,i].yaxis.set_label_text(ylabels[j],visible=True)
for ax in pp_pitching.axes.flat:
ax.tick_params(axis='both', labelleft=True, labelbottom=True)
pp_pitching.fig.subplots_adjust(top=.95)
pp_pitching.fig.suptitle("All Years, Pitching Stats")
plt.subplots_adjust(wspace=0.3, hspace=0.9)
plt.show()
# Graph Batting Stats
custom_palette=sns.color_palette("Paired",9)
sns.set_theme(style="white",palette=custom_palette)
plt.rc('legend',fontsize=16, title_fontsize=16,markerscale=3.0)
pp_batting = sns.pairplot(data=data_spin_all,y_vars=["exit_velocity_avg", "whiff_percent", "groundballs_percent", "flyballs_percent", "popups_percent"],\
x_vars=["fastball_avg_spin", "pitch2_avg_spin"], kind='reg',markers='.',hue='year',height=2,aspect=2)
pp_batting = pp_batting.map(plt.scatter)
xlabels,ylabels = [],[]
for ax in pp_batting.axes[-1,:]:
xlabel = ax.xaxis.get_label_text()
xlabels.append(xlabel)
for ax in pp_batting.axes[:,0]:
ylabel = ax.yaxis.get_label_text()
ylabels.append(ylabel)
for i in range(len(xlabels)):
for j in range(len(ylabels)):
pp_batting.axes[j,i].xaxis.set_label_text(xlabels[i],visible=True)
pp_batting.axes[j,i].yaxis.set_label_text(ylabels[j],visible=True)
for ax in pp_batting.axes.flat:
ax.tick_params(axis='both', labelleft=True, labelbottom=True)
pp_batting.fig.subplots_adjust(top=.95)
pp_batting.fig.suptitle("All Years, Batting Stats")
plt.subplots_adjust(wspace=0.3, hspace=0.9)
plt.show()
# drill down into a relationship across multiple years; this one is 'fastball_avg_spin' & k_percent'
lm=sns.lmplot(x='fastball_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & K%", fontsize=14)
Text(0.5, 0.98, 'Fastball Spin & K%')
# 'pitch2_avg_spin' & 'k_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & K%", fontsize=14)
Text(0.5, 0.98, 'Pitch-2 Spin & K%')
# 'fastball_avg_spin' & 'p_era'
lm = sns.lmplot(x='fastball_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
for ax in lm.axes.flat:
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & ERA", fontsize=14)
Text(0.5, 0.98, 'Fastball Spin & ERA')
# 'pitch2_avg_spin' & 'p_era'
lm = sns.lmplot(x='pitch2_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
for ax in lm.axes.flat:
ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & ERA", fontsize=14)
Text(0.5, 0.98, 'Pitch-2 Spin & ERA')
# 'fastball_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='fastball_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
for ax in lm.axes.flat:
ax.yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Batting Average Against", fontsize=14)
Text(0.5, 0.98, 'Fastball Spin & Batting Average Against')
# 'pitch2_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='pitch2_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
for ax in lm.axes.flat:
ax.yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Batting Average Against", fontsize=14)
Text(0.5, 0.98, 'Pitch-2 Spin & Batting Average Against')
# 'fastball_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='fastball_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Whiff %", fontsize=14)
Text(0.5, 0.98, 'Fastball Spin & Whiff %')
# 'pitch2_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
fig = lm.fig
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Whiff %", fontsize=14)
Text(0.5, 0.98, 'Pitch-2 Spin & Whiff %')
Conclusion ΒΆ
In conclusion, based on the p-values and subsequent graphs throughout the years 2015-2023, there is a correlation between the change in both pitch type's spin and the change in opponent batting average, K%, BB%, ERA, Whiff Rate, flyball rate, and popup rate. With groundball rate, there is only a correlation with the fastball's average spin, and not Pitch 2.
However, there is a gap here - the data that's available isn't as granular as I would like. Not much data exists for correlating spin rate and pitch-by-pitch analysis. The data shown here isn't linked to the pitch type, but instead linked to the pitcher. Moreover, the concept of spin rate is very new and will continue to develop and deepen over time, which will certainly help improve these findings.
While there are limitations here, the correlation still is valid and shows that there is a connection between the change in spin rate and the performance of a pitcher.