The Effects of Spin Rate on an MLB Pitcher's Performance: A Case Study on the 2015 - 2023 Seasons | Andrew CernosekΒΆ

  • Introduction
  • Set Up
  • Linear Regression
  • Pretty Pictures
  • Conclusion

Introduction ΒΆ

I’ve always been a huge baseball fan, but over the last few years I’ve also become a big baseball stats fan. I’ll spend hours digging through stats and often wondering all the various factors that influence those numbers.

During my junior year in high school, my AP Research class assigned a year-long project that could be about anything I wanted as long as it would lead to a new understanding. It’s no surprise that I decided to focus on something related to baseball analytics.

I started to do some research on various ideas and quickly saw that spin rate was a really hot topic, but with very little related research. I also noticed plenty of misconceptions, such as websites claiming that spin rate directly led to a pitch being β€œbetter”. I wanted to find if there was any sort of validity in that statement. The goal in doing this project was/is to see if there is any sort of correlation between the change in spin rate and the outcome of a pitcher’s performance.

While I was really proud the work I did on that project, I was also motivated to expand on what I had learned. I had originally relied on manually importing StatCast stats from Baseball Savant, manipulating data in Excel, and using StatCrunch to report my findings. After completing an online course by the University of Michigan titled "Foundations of Sports Analytics: Data, Representation, and Models in Sports" I realized there was a 'better' way.

Using many of the coding principals from that course along with pybaseball, a Python package for baseball data analysis, I was able to streamline my workflow as well as make it much easiser to expand the analysis across multiple seasons.

Set Up ΒΆ

The first thing we need to do is to import Python packages which allow us to collect, analyze, and visualize data. I've learned that these packages are pretty standard within the data analytics community.

InΒ [Β ]:
# normal imports
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import seaborn as sns
import warnings

# disable warnings
warnings.filterwarnings('ignore')

As I mentioned above, pybaseball is a really useful Python package. It allows you to import stats into a dataframe directly from popular sites, including Baseball Savant (which has the StatCast stats that I wanted to use for my analysis). Unfortunately, pybaseball didn't have a built-in function that grabbed all of the columns of data that I had previously used when importing things manually. However, by looking at how the existing pybaseball functions were defined, I was able to create a custom function that did exactly what I needed:

def statcast_pitcher_pitch_spin(year: int, minP: int = 100) -> pd.DataFrame:
    url = f"https://baseballsavant.mlb.com/leaderboard/custom?year=\
        {year}&type=pitcher&filter=&sort=4&sortDir=asc&min=\
        {minP}&selections=k_percent,bb_percent,p_era,batting_avg,exit_velocity_avg,whiff_percent,\
        groundballs_percent,flyballs_percent,popups_percent,fastball_avg_spin,breaking_avg_spin,\
        n_breaking_formatted,offspeed_avg_spin,n_offspeed_formatted&csv=true"
    res = requests.get(url, timeout=None).content
    data = pd.read_csv(io.StringIO(res.decode('utf-8')))
    data = sanitize_statcast_columns(data)
    return data

After doing that, we can import this new function and use it to create a dataframe with the stats that we want. The years are set from 2015 (the start of the StatCast Era) to 2023 (the current year as of publishing this), and the minimum amount of plate appearances against the pitcher is set to 100. The results of my custom function are listed on the columns below.
InΒ [Β ]:
# pybaseball
from pybaseball import statcast_pitcher_pitch_spin

data_spin_all = statcast_pitcher_pitch_spin('2023,2022,2021,2020,2019,2018,2017,2016,2015', minP=100)
print(data_spin_all.columns.tolist())
data_spin_all
['last_name', 'first_name', 'player_id', 'year', 'k_percent', 'bb_percent', 'p_era', 'batting_avg', 'exit_velocity_avg', 'whiff_percent', 'groundballs_percent', 'flyballs_percent', 'popups_percent', 'fastball_avg_spin', 'breaking_avg_spin', 'n_breaking_formatted', 'offspeed_avg_spin', 'n_offspeed_formatted']
Out[Β ]:
last_name first_name player_id year k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent fastball_avg_spin breaking_avg_spin n_breaking_formatted offspeed_avg_spin n_offspeed_formatted
0 Colon Bartolo 112526 2015 16.7 2.9 4.16 0.281 88.9 14.4 44.1 23.2 5.7 2161 2164.0 10.0 1727.0 7.4
1 Hawkins LaTroy 115629 2015 21.0 4.3 3.26 0.286 89.7 17.0 55.4 18.2 5.8 2051 2072.0 16.6 1698.0 8.0
2 Wolf Randy 150116 2015 17.4 9.3 6.23 0.319 89.0 16.2 46.2 17.9 5.1 2032 2176.0 40.2 1669.0 11.1
3 Marquis Jason 150302 2015 17.1 6.5 6.46 0.330 90.4 21.3 48.8 17.1 3.7 1782 1977.0 19.4 1239.0 21.2
4 Burnett A.J. 150359 2015 20.5 7.0 3.18 0.275 89.8 21.1 55.0 14.1 4.0 2009 2023.0 29.4 1678.0 8.8
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4309 Woo Bryan 693433 2023 25.9 7.3 4.75 0.242 87.2 27.8 38.8 28.9 7.9 2179 2358.0 11.8 1713.0 3.6
4310 Elder Bryce 693821 2023 17.6 7.9 3.46 0.238 89.5 23.7 53.0 20.6 4.8 1998 2396.0 36.7 2032.0 12.1
4311 Pfaadt Brandon 694297 2023 20.5 6.6 6.91 0.303 90.4 24.6 34.1 30.1 8.5 2445 2667.0 32.9 1946.0 14.7
4312 Shuster Jared 694363 2023 13.0 11.4 5.00 0.250 89.8 21.3 35.2 29.0 12.4 2136 2204.0 34.2 1492.0 21.7
4313 Hartwig Grant 701643 2023 17.1 11.4 4.94 0.253 88.1 20.9 45.2 19.2 4.1 2156 2429.0 33.1 1792.0 4.5

4314 rows Γ— 18 columns


A huge part of this entire study is that I want to isolate both the fastball's average spin rate and also the secondary pitch's spin rate. However, there are three types of pitches listed on StatCast: fastballs, breaking balls, and offspeed. To simplify things, I wanted to group both breaking balls and offspeed together. When originally looking at them separately, I noticed a high amount of outliers where some pitchers threw a pitch labeled as "offspeed" with a lot more spin than typically seen by anyone else's pitches of that type. Because of this, I used a weighted average formula (breaking % * breaking spin + offspeed % * offspeed spin) to determine the exact average spin rate of a secondary pitch thrown by a pitcher, now labeled as `pitch2_avg_spin`.

InΒ [Β ]:
# massage data - change NaN values to zero, formula to calculate secondary pitch average spin based on pitch usage percentages
data_spin_all = data_spin_all.fillna(0)
data_spin_all["pitch2_avg_spin"] = (data_spin_all['n_breaking_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['breaking_avg_spin']) + \
                                   (data_spin_all['n_offspeed_formatted'] / (data_spin_all['n_breaking_formatted'] + data_spin_all['n_offspeed_formatted']) * data_spin_all['offspeed_avg_spin'])

# change NaN to zero again & drop pitchers without a secondary pitch
data_spin_all = data_spin_all.fillna(0)
data_spin_all = data_spin_all[data_spin_all['pitch2_avg_spin'] != 0 ]

data_spin_all["pitch2_avg_spin"] = data_spin_all["pitch2_avg_spin"].astype('int')
data_spin_all = data_spin_all.drop(columns=['player_id','breaking_avg_spin','offspeed_avg_spin','n_breaking_formatted','n_offspeed_formatted'])
data_spin_all
Out[Β ]:
last_name first_name year k_percent bb_percent p_era batting_avg exit_velocity_avg whiff_percent groundballs_percent flyballs_percent popups_percent fastball_avg_spin pitch2_avg_spin
0 Colon Bartolo 2015 16.7 2.9 4.16 0.281 88.9 14.4 44.1 23.2 5.7 2161 1978
1 Hawkins LaTroy 2015 21.0 4.3 3.26 0.286 89.7 17.0 55.4 18.2 5.8 2051 1950
2 Wolf Randy 2015 17.4 9.3 6.23 0.319 89.0 16.2 46.2 17.9 5.1 2032 2066
3 Marquis Jason 2015 17.1 6.5 6.46 0.330 90.4 21.3 48.8 17.1 3.7 1782 1591
4 Burnett A.J. 2015 20.5 7.0 3.18 0.275 89.8 21.1 55.0 14.1 4.0 2009 1943
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4309 Woo Bryan 2023 25.9 7.3 4.75 0.242 87.2 27.8 38.8 28.9 7.9 2179 2207
4310 Elder Bryce 2023 17.6 7.9 3.46 0.238 89.5 23.7 53.0 20.6 4.8 1998 2305
4311 Pfaadt Brandon 2023 20.5 6.6 6.91 0.303 90.4 24.6 34.1 30.1 8.5 2445 2444
4312 Shuster Jared 2023 13.0 11.4 5.00 0.250 89.8 21.3 35.2 29.0 12.4 2136 1927
4313 Hartwig Grant 2023 17.1 11.4 4.94 0.253 88.1 20.9 45.2 19.2 4.1 2156 2352

4309 rows Γ— 14 columns

Linear Regression ΒΆ

After getting all of the data in the right format, I then needed to run a linear regression analysis on the entire data set to view the correlation between both "Fastball Spin" & "Pitch 2 Spin" and all of the other variables listed above (K%, BB%, etc.). Linear regression is essentially a statistical test that you can run in order to compare two variables to each other β€” an independent variable (i.e. average spin rate) and a dependent variable (i.e. all the various pitching & batting stats). One can derive a lot of components from the regression results, but the most important takeaway is the p-value. The p-value describes how likely the results would have occured by random chance. When the p-value is below 0.05, it means that there is a correlation between the two variables.

As shown below, there are a vast number of these statistics that have p-values below 0.05, meaning that there is a correlation between a lot of these statistics and the average spin of both fastballs and secondary pitches.

InΒ [Β ]:
# calculate p-value per year

from scipy.stats import pearsonr

# method to use for 'corr' function to return p-value
# https://stackoverflow.com/questions/52741236/how-to-calculate-p-values-for-pairwise-correlation-of-columns-in-pandas
def pearsonr_pval(x,y):
    return pearsonr(x,y)[1]

# change settings for prettier output of p-value correlations
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

# list of years to cycle through for finding specific year p-values
years = [2023,2022,2021,2020,2019,2018,2017,2016,2015]

# all years
with pd.option_context('display.float_format', '{:0.6f}'.format):
    data_spin_pv = data_spin_all.drop(columns=['last_name', 'first_name', 'year'])
    corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
    print("All Years\n{}\n".format(corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))

# individual years
for y in years:
    with pd.option_context('display.float_format', '{:0.6f}'.format):
        data_spin_year = data_spin_all.loc[data_spin_all['year'] == y]
        data_spin_pv = data_spin_year.drop(columns=['last_name', 'first_name', 'year'])
        corr = data_spin_pv.corr(method=pearsonr_pval,numeric_only=True)
        corr2 = corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]
        print("\n{}: P-Values\n{}\n".format(y,corr.loc[['fastball_avg_spin','pitch2_avg_spin'], ~corr.columns.isin(['fastball_avg_spin','pitch2_avg_spin'])]))
        # just print the statistically significant correlations (< 0.05)
        corr2 = corr2[corr2 < .05].unstack().transpose()\
            .sort_values( ascending=True).dropna()
        print(corr2)
All Years
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.000000 0.000000     0.000000           0.014151       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.000000 0.000000     0.000000           0.008010       0.000000             0.271558          0.000524        0.026414


2023: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.017246 0.011853     0.000000           0.046336       0.000000             0.000002          0.000003        0.000000
pitch2_avg_spin     0.003373    0.256899 0.076596     0.018759           0.103078       0.077300             0.323860          0.211900        0.663560

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000002
flyballs_percent     fastball_avg_spin   0.000003
k_percent            pitch2_avg_spin     0.003373
p_era                fastball_avg_spin   0.011853
bb_percent           fastball_avg_spin   0.017246
batting_avg          pitch2_avg_spin     0.018759
exit_velocity_avg    fastball_avg_spin   0.046336
dtype: float64

2022: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.040666 0.000008     0.000000           0.334567       0.000000             0.000000          0.000001        0.000000
pitch2_avg_spin     0.000019    0.001861 0.003260     0.000008           0.068946       0.000427             0.826508          0.863145        0.088943

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000001
p_era                fastball_avg_spin   0.000008
batting_avg          pitch2_avg_spin     0.000008
k_percent            pitch2_avg_spin     0.000019
whiff_percent        pitch2_avg_spin     0.000427
bb_percent           pitch2_avg_spin     0.001861
p_era                pitch2_avg_spin     0.003260
bb_percent           fastball_avg_spin   0.040666
dtype: float64

2021: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.000238 0.000233     0.000000           0.246282       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.001348 0.073900     0.000000           0.053847       0.000000             0.350141          0.803763        0.002977

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000000
popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
whiff_percent        pitch2_avg_spin     0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          pitch2_avg_spin     0.000000
p_era                fastball_avg_spin   0.000233
bb_percent           fastball_avg_spin   0.000238
                     pitch2_avg_spin     0.001348
popups_percent       pitch2_avg_spin     0.002977
dtype: float64

2020: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.047310 0.010269     0.000000           0.266034       0.000000             0.000007          0.000281        0.000073
pitch2_avg_spin     0.000148    0.356114 0.000515     0.000172           0.285386       0.071479             0.310271          0.046058        0.542162

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000007
popups_percent       fastball_avg_spin   0.000073
k_percent            pitch2_avg_spin     0.000148
batting_avg          pitch2_avg_spin     0.000172
flyballs_percent     fastball_avg_spin   0.000281
p_era                pitch2_avg_spin     0.000515
                     fastball_avg_spin   0.010269
flyballs_percent     pitch2_avg_spin     0.046058
bb_percent           fastball_avg_spin   0.047310
dtype: float64

2019: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.016385 0.035657     0.000001           0.005820       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000000    0.001347 0.130905     0.002535           0.242583       0.000465             0.146435          0.061935        0.257985

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000000
batting_avg          fastball_avg_spin   0.000001
whiff_percent        pitch2_avg_spin     0.000465
bb_percent           pitch2_avg_spin     0.001347
batting_avg          pitch2_avg_spin     0.002535
exit_velocity_avg    fastball_avg_spin   0.005820
bb_percent           fastball_avg_spin   0.016385
p_era                fastball_avg_spin   0.035657
dtype: float64

2018: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.135721 0.001833     0.000000           0.110512       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000013    0.274898 0.052871     0.001341           0.530645       0.000180             0.382741          0.242414        0.114129

whiff_percent        fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000013
whiff_percent        pitch2_avg_spin     0.000180
batting_avg          pitch2_avg_spin     0.001341
p_era                fastball_avg_spin   0.001833
dtype: float64

2017: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.017546 0.000023     0.000000           0.664940       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000035    0.112367 0.000112     0.000114           0.026909       0.001338             0.583682          0.547332        0.716795

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
p_era                fastball_avg_spin   0.000023
k_percent            pitch2_avg_spin     0.000035
p_era                pitch2_avg_spin     0.000112
batting_avg          pitch2_avg_spin     0.000114
whiff_percent        pitch2_avg_spin     0.001338
bb_percent           fastball_avg_spin   0.017546
exit_velocity_avg    pitch2_avg_spin     0.026909
dtype: float64

2016: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.008719 0.010021     0.000000           0.108696       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.000001    0.292678 0.007418     0.000043           0.065793       0.000203             0.002991          0.156506        0.107725

k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
popups_percent       fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
k_percent            pitch2_avg_spin     0.000001
batting_avg          pitch2_avg_spin     0.000043
whiff_percent        pitch2_avg_spin     0.000203
groundballs_percent  pitch2_avg_spin     0.002991
p_era                pitch2_avg_spin     0.007418
bb_percent           fastball_avg_spin   0.008719
p_era                fastball_avg_spin   0.010021
dtype: float64

2015: P-Values
                   k_percent  bb_percent    p_era  batting_avg  exit_velocity_avg  whiff_percent  groundballs_percent  flyballs_percent  popups_percent
fastball_avg_spin   0.000000    0.415631 0.000970     0.000000           0.016411       0.000000             0.000000          0.000000        0.000000
pitch2_avg_spin     0.007602    0.225946 0.015076     0.004982           0.068573       0.454784             0.393288          0.266614        0.589300

popups_percent       fastball_avg_spin   0.000000
groundballs_percent  fastball_avg_spin   0.000000
k_percent            fastball_avg_spin   0.000000
whiff_percent        fastball_avg_spin   0.000000
flyballs_percent     fastball_avg_spin   0.000000
batting_avg          fastball_avg_spin   0.000000
p_era                fastball_avg_spin   0.000970
batting_avg          pitch2_avg_spin     0.004982
k_percent            pitch2_avg_spin     0.007602
p_era                pitch2_avg_spin     0.015076
exit_velocity_avg    fastball_avg_spin   0.016411
dtype: float64

Pretty Pictures ΒΆ

After obtaining the p-values, I then wanted to graph these relationships on scatterplots while also showing the regression line (or line of best fit). I overlayed each year ontop of each other using the "hue" function, and then graphed each relationship using the pitch's spin as the x-value and the various statistics as the y-value. After that, I wanted to create individual, year-by-year graphs for the statistics I deem to be most important in there being a correlation. This is all layed out below.

InΒ [Β ]:
# Graph Pitching Stats
custom_palette=sns.color_palette("Paired",9)
sns.set_theme(style="white",palette=custom_palette)
plt.rc('legend',fontsize=16, title_fontsize=16,markerscale=3.0)
pp_pitching = sns.pairplot(data=data_spin_all,y_vars=["k_percent", "bb_percent", "p_era", "batting_avg"],\
                  x_vars=["fastball_avg_spin", "pitch2_avg_spin"], kind='reg',markers='.',hue='year',height=2,aspect=2)
pp_pitching = pp_pitching.map(plt.scatter)
xlabels,ylabels = [],[]

for ax in pp_pitching.axes[-1,:]:
    xlabel = ax.xaxis.get_label_text()
    xlabels.append(xlabel)
for ax in pp_pitching.axes[:,0]:
    ylabel = ax.yaxis.get_label_text()
    ylabels.append(ylabel)

for i in range(len(xlabels)):
    for j in range(len(ylabels)):
        pp_pitching.axes[j,i].xaxis.set_label_text(xlabels[i],visible=True)
        pp_pitching.axes[j,i].yaxis.set_label_text(ylabels[j],visible=True)

for ax in pp_pitching.axes.flat:
    ax.tick_params(axis='both', labelleft=True, labelbottom=True)

pp_pitching.fig.subplots_adjust(top=.95)
pp_pitching.fig.suptitle("All Years, Pitching Stats")

plt.subplots_adjust(wspace=0.3, hspace=0.9)
plt.show()
No description has been provided for this image
InΒ [Β ]:
# Graph Batting Stats
custom_palette=sns.color_palette("Paired",9)
sns.set_theme(style="white",palette=custom_palette)
plt.rc('legend',fontsize=16, title_fontsize=16,markerscale=3.0)

pp_batting = sns.pairplot(data=data_spin_all,y_vars=["exit_velocity_avg", "whiff_percent", "groundballs_percent", "flyballs_percent", "popups_percent"],\
                  x_vars=["fastball_avg_spin", "pitch2_avg_spin"], kind='reg',markers='.',hue='year',height=2,aspect=2)
pp_batting = pp_batting.map(plt.scatter)
xlabels,ylabels = [],[]

for ax in pp_batting.axes[-1,:]:
    xlabel = ax.xaxis.get_label_text()
    xlabels.append(xlabel)
for ax in pp_batting.axes[:,0]:
    ylabel = ax.yaxis.get_label_text()
    ylabels.append(ylabel)

for i in range(len(xlabels)):
    for j in range(len(ylabels)):
        pp_batting.axes[j,i].xaxis.set_label_text(xlabels[i],visible=True)
        pp_batting.axes[j,i].yaxis.set_label_text(ylabels[j],visible=True)

for ax in pp_batting.axes.flat:
    ax.tick_params(axis='both', labelleft=True, labelbottom=True)

pp_batting.fig.subplots_adjust(top=.95)
pp_batting.fig.suptitle("All Years, Batting Stats")

plt.subplots_adjust(wspace=0.3, hspace=0.9)
plt.show()
No description has been provided for this image
InΒ [Β ]:
# drill down into a relationship across multiple years; this one is 'fastball_avg_spin' & k_percent'
lm=sns.lmplot(x='fastball_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')
# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & K%", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Fastball Spin & K%')
No description has been provided for this image
InΒ [Β ]:
# 'pitch2_avg_spin' & 'k_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='k_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & K%", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Pitch-2 Spin & K%')
No description has been provided for this image
InΒ [Β ]:
# 'fastball_avg_spin' & 'p_era'
lm = sns.lmplot(x='fastball_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
for ax in lm.axes.flat:
    ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & ERA", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Fastball Spin & ERA')
No description has been provided for this image
InΒ [Β ]:
# 'pitch2_avg_spin' & 'p_era'
lm = sns.lmplot(x='pitch2_avg_spin', y='p_era', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
for ax in lm.axes.flat:
    ax.yaxis.set_major_formatter(FormatStrFormatter('%.2f'))
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & ERA", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Pitch-2 Spin & ERA')
No description has been provided for this image
InΒ [Β ]:
# 'fastball_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='fastball_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
for ax in lm.axes.flat:
    ax.yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Batting Average Against", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Fastball Spin & Batting Average Against')
No description has been provided for this image
InΒ [Β ]:
# 'pitch2_avg_spin' & 'batting_avg'
lm = sns.lmplot(x='pitch2_avg_spin', y='batting_avg', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
for ax in lm.axes.flat:
    ax.yaxis.set_major_formatter(FormatStrFormatter('%.3f'))
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Batting Average Against", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Pitch-2 Spin & Batting Average Against')
No description has been provided for this image
InΒ [Β ]:
# 'fastball_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='fastball_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Fastball Spin & Whiff %", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Fastball Spin & Whiff %')
No description has been provided for this image
InΒ [Β ]:
# 'pitch2_avg_spin' & 'whiff_percent'
lm = sns.lmplot(x='pitch2_avg_spin', y='whiff_percent', data=data_spin_all, col='year',col_wrap=3,height=2.75,line_kws={'color':'red'},markers='.')

# Add a title
fig = lm.fig 
fig.subplots_adjust(top=.9)
fig.suptitle("Pitch-2 Spin & Whiff %", fontsize=14)
Out[Β ]:
Text(0.5, 0.98, 'Pitch-2 Spin & Whiff %')
No description has been provided for this image

Conclusion ΒΆ

In conclusion, based on the p-values and subsequent graphs throughout the years 2015-2023, there is a correlation between the change in both pitch type's spin and the change in opponent batting average, K%, BB%, ERA, Whiff Rate, flyball rate, and popup rate. With groundball rate, there is only a correlation with the fastball's average spin, and not Pitch 2.

However, there is a gap here - the data that's available isn't as granular as I would like. Not much data exists for correlating spin rate and pitch-by-pitch analysis. The data shown here isn't linked to the pitch type, but instead linked to the pitcher. Moreover, the concept of spin rate is very new and will continue to develop and deepen over time, which will certainly help improve these findings.

While there are limitations here, the correlation still is valid and shows that there is a connection between the change in spin rate and the performance of a pitcher.


Special Thanks

  • Pitching Ninja
  • Stefan Crichton