12/21/2018
by Marlon Ribunal
Comments Off on Forecasting Kyle Kuzma’s Field Goal Percentage (FG%)

Forecasting Kyle Kuzma’s Field Goal Percentage (FG%)

More and more tools are becoming available for data analysts and data scientists. These tools provide convenience and lower the bar of entry for aspiring data professional.

I’ve been looking for an easy way to get to learning predictive analysis and forecasting. Prophet provides that path. Prophet is released by Facebook’s Core Data Science Team.

“Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.”

Just to dip my toes into the waters, I tried Prophet’s Quick Start Guide in R.

Let’s forecast the Field Goal Percentage (FG%) of Kyle Kuzma of the Los Angeles Lakers for the next 6 Months.

I’m using the FG% data scraped from Kuzma’s Game Log from 10/18/2018  to 12/18/2018.

Prophet uses a “normal model fitting API with its prophet function that performs fitting and returns a model object.”

#data source https://www.basketball-reference.com/players/k/kuzmaky01/gamelog/2019/
#load prophet  
library(prophet)

#create data frame
df <- read.csv('C:/Users/Marlon.Ribunal/Desktop/JE Docs/Install/Kuzma_FG_Pct.csv')

#define model
m <- prophet(df=df, growth = "linear", yearly.seasonality = TRUE)

#set horizon (preiods to forecast)
future <- make_future_dataframe(m, periods = 180)

That pretty much read our csv into a data frame, performs fitting, and define the model.

To run a simple forecast on our data:

#plot forecast
plot(m,forecast)

kyle kuzma field goal percentage

We can also break down that forecast into the trend, weekly, and yearly:

#Plot forecast breakdown
prophet_plot_components(m, forecast, uncertainty = TRUE, yearly_start = 0, weekly_start = 0, plot_cap = TRUE)

Kyle Kuzma Field Goal Forecast

If you’re looking into doing some forecasting, Prophet might be a good starting point.  By the way, it’s an open source software so it gets a lot of support from the open source community, plus it’s from Facebook. Visit Prophet’s Github page here.

Marlon Ribunal - Data Professionals At Work

11/02/2018
by Marlon Ribunal
Comments Off on How I Work: Data Professionals at Work

How I Work: Data Professionals at Work

It has been a busy year so far. Last March, I did a review, as a Technical Reviewer, for a SQL 2017 Machine Learning book. In the summer, the whole family went to the Philippines for a 3-week vacation. Between the Machine Learning book and the vacation, I was working with a book publisher for a possible contract for a SQL book. Well, I was just too busy at that point that I decided not to do it. I still have that sitting in my Someday list, though.

In the first quarter of the year, I was busy trying to check off the goals that I set earlier this year, then suddenly I got uber busy at work and had to put those goals in the backburner. But, then again, that’s not enough reason to not get involved in another book. I wanted to write another book but I cannot just do that at this point. So, I was very excited when Mala asked if I wanted to be a part of her book. I said Yes right away.

Data Professionals at Work is a collection of interviews of, well, data professionals working in the various aspects of the data industry – Database Administrator, Data Development, Data Analysis, Business Intelligence, Data Science, among others. You’ll read about how these professionals got their start in the data world. You’ll read about their opinion on what’s on the leading edge in the industry. You’ll get to read about their a-day-in-the-life-of.

Well, you’ll get a glimpse of what I do for a living in Chapter 26. You’ll read about what I have to say about Agile Methodologies, Best Practices, cloud adoption, etc.

All the interviewees were asked the same set of questions, so you’ll get different takes on the same topics, which is great!

If you’re thinking about moving into the data industry, this is a good book to pick up. You’ll get an idea of how everybody started in their field. This is a good book if you’re just starting out. You’ll get tips on how we approach our day-to-day work, tools that we use, work-life-balance, etc.

A note: My company in the book is showing as JustEnough Software Corp. That since changed to Mi9 Retail Co.

Cast Of Characters

(The following list is copied over from I’m in a book! Data Professionals at Work, sorry Kenneth. I was too lazy to write my own list)

Editor and interviewer: Malathi Mahadevan (blog|twitter)

Foreword: Kevin Kline (blog|twitter)

Interviewees:
Kirsten Benzel (blog|twitter)
Jes Borland (blog|twitter)
Tim Costello (blog|twitter)
Kevin Feasel (blog|twitter)
Kenneth Fisher (blog|twitter)
Drew Furgiuele (blog|twitter)
Ginger Grant (blog|twitter)
Kathi Kellenberger (blog|twitter)
Kendra Little (blog|twitter)
Andy Mallon (blog|twitter)
Jimmy May (blog|twitter)
Marlon Ribunal (blog|twitter)
Julie Smith (blog|twitter)
Dave Walden (blog|twitter)
Tracy Boggiano (blog|twitter)
Jason Brimhall (blog|twitter)
Mindy Curnutt (blog|twitter)
Argenis Fernandez(blog|twitter)
Joseph Fleming (twitter)
Matt Gordon (blog|twitter)
Vicky Harp (blog|twitter)
Andy Leonard (blog|twitter)
Steph Locke (blog|twitter)
John Q. Martin (blog|twitter)
John Morehouse (blog|twitter)
Joe Sack (blog|twitter)
Jonathan Stewart (blog|twitter)

The Data Professionals at Work book is now available at Amazon.

ggplot2() facet_grid

04/10/2018
by Marlon Ribunal
1 Comment

Creating Facets in SQL Server 2017 Using R

SQL Server 2017 In-Database Machine learning has brought the analytics closer to the data. It is now convenient to do data analytics either by using the programming language R or Python with the database engine.

In my previous post, I have demonstrated how easy it is to create a bar graph in SQL Server 2017 In-Database Machine Learning using  R.

We’re going to build upon that basic graph.

Sometimes doing data analysis would require us to look at an overview of our data across specific partitions, say a year. For example, we want to see how our product groups fare on month-to-month basis across the last 4 years.

In a data analytics perspective, there are quite a handful of data points in this requirement – data aggregate (quantity), monthly periods, and year partitions.

One of the approaches to handle such requirement is by using a facet. Faceting is a way of plotting subsets of data into a matrix of panels based on one or more variables – or facets.

Again, let’s begin with the end in mind. Here’s what faceting looks like:

Dataset

The quantity aggregates based on monthly period per product group are laid into the year partition facet in the following query. You can also set up your data like this in the R engine.

SELECT DISTINCT
SUM(il.Quantity) OVER (PARTITION BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate),
sg.StockGroupName
ORDER BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate)
) AS Quantity,
DATEPART(YEAR, i.InvoiceDate) AS InvoiceYear,
DATEPART(WEEK, i.InvoiceDate) AS InVoiceWeek,
DATEPART(MONTH, i.InvoiceDate) AS InvoiceMonthNo,
DATENAME(MONTH, i.InvoiceDate) AS InvoiceMonth,
sg.StockGroupName
FROM Sales.Invoices i
JOIN Sales.InvoiceLines il
ON i.InvoiceID = il.InvoiceID
JOIN Warehouse.StockItems si
ON il.StockItemID = si.StockItemID
JOIN Warehouse.StockItemStockGroups sis
ON si.StockItemID = sis.StockItemID
JOIN Warehouse.StockGroups sg
ON sis.StockGroupID = sg.StockGroupID
ORDER BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate);

Setup

Please refer to Creating Bar Graph In SQL Server 2017 Using R for the setup. We need two (2) libraries for this demonstration:

  • ggplot2()
  • scales

Again, if you get “missing package” error, just install it. If you get a “no function x” error, chances are you have not declared the function’s parent package. The data is based on the WideWorldImporters database.

Properties

The x-axis is ordered by DATEPART(MONTH, i.InvoiceDate) AS InvoiceMonthNo.The substring function in R is more intuitive than other programming languages. This expression is used as the label for x-axis: substr(InvoiceMonth, start = 1 , stop = 3). The theme setting is simply set by theme_dark() function.

Chart

The whole chart creation code looks like this:

quantitytschart <- ggplot(groupqty , aes(x = reorder(substr(InvoiceMonth, start = 1 , stop = 3), InvoiceMonthNo), y = Quantity, group = StockGroupName, colour = StockGroupName)) +
geom_line(size=1) + geom_point(size=3) +
facet_wrap( ~ InvoiceYear) + theme_dark() +
labs( x= "Month", y="Stock Group Quantity") +
scale_y_continuous(labels = comma)

That’s it.

Facets*

The above matrix is of type facet_wrap. Another option is facet_grid.

  • facet_grid(. ~ InvoiceYear) 
  • facet_wrap( ~ InvoiceYear)

facet_grid(row ~ col): this is what they call “bivariate”, which creates 2-D matrix of panels based on two factor.

facet_wrap(~cell): “univariate”, creates 1-D strip of panels, based on one factor, and wrap the strip into a 2-D matrix.

*Source: ggplot2 Quick Reference: facet

Here’s the facet_grid (See the screenshot of facet_wrap above):

Complete Code

Here’s the rest of the code:

EXEC sp_execute_external_script @language = N'R',
@script = N'
#Library
library(ggplot2);
library(scales);

chartfile <- "C:\\Marlon_test\\Groupqty.png"
png(filename=chartfile, width=1800, height=900)

#Data Frame
groupqty <- InputDataSet

quantitytschart <- ggplot(groupqty , aes(x = reorder(substr(InvoiceMonth, start = 1 , stop = 3), InvoiceMonthNo), y = Quantity, group = StockGroupName, colour = StockGroupName)) +
geom_line(size=1) + geom_point(size=3) +
facet_grid(. ~ InvoiceYear) + theme_dark() +
#facet_wrap( ~ InvoiceYear) + theme_dark() +
labs( x= "Month", y="Stock Group Quantity") +
scale_y_continuous(labels = comma)

print(quantitytschart)',
@input_data_1 = N'
SELECT DISTINCT
SUM(il.Quantity) OVER (PARTITION BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate),
sg.StockGroupName
ORDER BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate)
) AS Quantity,
DATEPART(YEAR, i.InvoiceDate) AS InvoiceYear,
DATEPART(WEEK, i.InvoiceDate) AS InvoiceWeek,
DATEPART(MONTH, i.InvoiceDate) AS InvoiceMonthNo,
DATENAME(MONTH, i.InvoiceDate) AS InvoiceMonth,
sg.StockGroupName
FROM Sales.Invoices i
JOIN Sales.InvoiceLines il
ON i.InvoiceID = il.InvoiceID
JOIN Warehouse.StockItems si
ON il.StockItemID = si.StockItemID
JOIN Warehouse.StockItemStockGroups sis
ON si.StockItemID = sis.StockItemID
JOIN Warehouse.StockGroups sg
ON sis.StockGroupID = sg.StockGroupID
ORDER BY DATEPART(YEAR, i.InvoiceDate),
DATEPART(MONTH, i.InvoiceDate);
';

Have fun and happy faceting!