Text Manipulation with Stringr

Having clean, structured data is a great thing for any data scientist. Unfortunately, that scenario is almost never the case. In this post, we’ll take a look at cleaning and manipulating text data using the stringr package.

Getting Confident About Confidence Intervals

In most statistical research, we take a sample of data from a larger population to analyze. This allows us to come to conclusions that are representative of our population faster and at lower cost. Confidence intervals provide a range around a sample estimate that likely contains the actual population parameter. In this post, we’ll dive into how we can use and properly explain confidence intervals.

The Value of a Shot

It’s not news to anyone that the NBA has shifted in style dramatically over the past decade. A game once played in the post has extended to the three point line for an obvious reason… three is greater than two.

Using Scouting Reports to Find Similar Draft Prospects

Most of the posts we’ve explored here have been focused on structured data. This data is organized in a way in which we can perform analysis easily, like the stats page for a player on NBA.com. That’s the great thing about doing experiments with a sport like basketball: there are a ton of sources for clean, structured data.

In this project, I wanted to change it up a bit. Rather than looking at data in a structured format, I went the unstructured route, specifically looking at text data. I wanted to see if we could take scouting reports of draft prospects, and compare them to historic scouting reports, allowing us to make comparisons between players. There are a lot of difficulties that come with attacking unstructured data, but it could allow us to come to better conclusions about players.

Bayesian Estimation of Shot Results

The last time we left off, I talked a bit about using empiracle Bayesian estimation to perform inference. We were able to predict the probability of Steph Curry making a shot based on both his historical stats and his current game performance.

I decided to expand a bit on those posts and set up a larger project in the same vein. In this post, I’ll be going over that project, the Bayesian inference shot dashboard (triumphant horns playing in the distance).

Basketball and R

Jake Campbell

Text Manipulation with Stringr

Getting Confident About Confidence Intervals

The Value of a Shot

Using Scouting Reports to Find Similar Draft Prospects

Bayesian Estimation of Shot Results