SQL, Code, Coffee, Etc.

08/13/2015
by Marlon Ribunal
4 Comments

Getting Started With Power BI Desktop: Installation

Power BI is becoming popular for a lot of reasons. Simply put, Power BI is “BI on your fingertips.” Companies looking to adopt BI technology need not look further. Power BI is easy to learn with its intuitive tools. Gone are the days when building BI solutions always involve massive resources.

Power BI lowers the bar in the BI know-how. It eliminates the complexities of building BI solutions.

Power BI has three (3) components: Power BI, Power BI Mobile, and Power BI Desktop. Power BI is “cloud-based business analytics service.” Power BI Mobile provides you the capability to get connected with your data anywhere you go. Power BI Desktop gives you the platform to author reports intuitively.

In this series, let’s take a look at Power BI Desktop.

Before we get started, let’s see how easy it is to download and install the app.

Go to http://powerbi.microsoft.com/desktop and download Power BI Desktop.

Once downloaded, execute the executable file to initialize the installation. Note: Power BI Desktop requires Internet Explorer 10 or higher versions.

Click Next, and just follow the installation wizard. Accept the terms in the app’s license agreement.

Specify where you want the app installed. I wouldn’t change the default.

Confirm installation of the Power BI Desktop app. Click the Install button

Then wait for the installation to complete. It should only take a couple of minutes.

After the installation is completed, the wizard confirms the success of the installation. Select the Launch Microsoft BI Desktop and click Finish.

Wait for the Power BI Desktop to initialize.

After the successful initialization of the app, the Power BI Desktop splash screen shows up. The screen provides some useful resources to learn more about Power BI. I’d keep the Show this page on startup option checked just so I have a bookmark to these useful resources. You can also access this screen in the File tab then select Getting Started.

That’s it. Power BI Desktop is now ready for analytics or report development!

Next, let’s get some data for our Power BI Desktop.

05/29/2015
by Marlon Ribunal
5 Comments

Finding gaps or missing dates in a date range using TSQL

Here’s a quick how-to on returning temporal data set that includes missing dates. Suppose you are tasked to query an employee’s “time sheet”. You’d want to return not only the days he’s reported to work but also all the days that he missed.

The expected result would look something like this:

Actually, this was asked in the Facebook page of a SQL Server user group.

These are the answers that I gave. There is more than one way to skin a cat (no cats were harmed in the writing of this post).

These are only two of the many options. Feel free to add yours in the comment section.

Test Data

Let’s create our sample primary table (TimeSheet)

CREATE TABLE TimeSheet
(
logdate DATETIME NULL
,empno CHAR(3) NULL
,timein SMALLDATETIME NULL
,timeout SMALLDATETIME NULL
);

Let’s insert some sample data

INSERT INTO dbo.TimeSheet
( logdate, empno, timein, timeout )
VALUES ( '5/18/2015', '001', '08:30AM', '04:30PM' )
,( '5/20/2015', '001', '09:00AM', '03:30PM' )
,( '5/22/2015', '001', '08:30AM', '05:30PM' );

Now that we have our sample data let’s run a couple of scripts.

Using Calendar Table

One of the most convenient ways of finding missing dates in a date range is by using a Calendar table. There’s a ton of materials on this topic. Here’s one that tells you why you need a Calendar table.

For our demo purpose, lets create a simple calendar table that will contain our date range.

CREATE TABLE calendar ( date SMALLDATETIME );

Then, let’s populate that with a small sample of date range:

INSERT INTO dbo.calendar
( date )
VALUES ( '05/18/2015' )
,( '05/19/2015' )
,( '05/20/2015' )
,( '05/21/2015' )
,( '05/22/2015' );

Let’s use the calendar table to come up with the expected result:

SELECT CONVERT(VARCHAR(10), c.date, 101) AS logdate
,t.empno
,CONVERT(CHAR(5), t.timein, 108) AS timein
,CONVERT(CHAR(5), t.timeout, 108) AS timeout
FROM dbo.TimeSheet t
RIGHT OUTER JOIN dbo.calendar c
ON t.logdate = c.date;

And here is what we got using that calendar table:

logdate empno timein timeout
---------- ----- ------ -------
05/18/2015 001 08:30 16:30
05/19/2015 NULL NULL NULL
05/20/2015 001 09:00 15:30
05/21/2015 NULL NULL NULL
05/22/2015 001 08:30 17:30

(5 row(s) affected)

Using Common Table Expression (CTE)

Introduced in SQL Server 2005, Common Table Expression (CTE) has been a convenient “tool” for many SQL Developers. Basically, a CTE is a temporary result set. It is not stored as an object in SQL Server. One of the advantages of CTE’s is it can reference itself. How is that useful? Well, we can see that in our example.

DECLARE @startdate DATETIME
,@enddate DATETIME;

SET @startdate = '05/18/2015';
SET @enddate = '05/22/2015';
WITH calendardates
AS ( SELECT date = @startdate
UNION ALL
SELECT DATEADD(DAY, 1, date)
FROM calendardates
WHERE DATEADD(DAY, 1, date) <= @enddate
)
SELECT CONVERT(VARCHAR(10), c.date, 101) AS logdate
,t.empno
,CONVERT(CHAR(5), t.timein, 108) AS timein
,CONVERT(CHAR(5), t.timeout, 108) AS timeout
FROM dbo.TimeSheet t
RIGHT JOIN calendardates c
ON t.logdate = c.date;

You probably noticed that our CTE named “calendardates” is referenced in the FROM clause within the CTE statement.

Here’s the result.

logdate empno timein timeout
---------- ----- ------ -------
05/18/2015 001 08:30 16:30
05/19/2015 NULL NULL NULL
05/20/2015 001 09:00 15:30
05/21/2015 NULL NULL NULL
05/22/2015 001 08:30 17:30

(5 row(s) affected)

Update: Jeff Moden called my attention to this beautiful article he wrote about the negative impact of Recursive CTE’s, which is a must-read for anyone interested in Recursive CTE’s or rCTE’s as he called them. He’s that good at making up terms like that. Well, if you heard about “RBAR” (ree-bar) and “Tally Table”, he coined those terms.

Please read Jeff’s “Hidden RBAR: Counting with Recursive CTE’s” before considering using rCTE’s in your TSQL codes.

03/19/2015
by Marlon Ribunal
3 Comments

How to resequence column based on numeric prefix using TSQL

Beginning in version 2005, SQL Server is making manipulation of a partitioned set of rows easier by using Window Functions (ROW_NUMBER(), aggregate functions with OVER(), etc.). You can manipulate partitions of rows on the fly with Window Functions.

I’ve seen how complicated queries that require recursion were simplified by using a combination of Common Table Expression (CTE) and Window Functions. There are fewer reasons now, or almost zero excuses, to justify using cursors in your queries.

I had a data requirement that I thought on the onset that I could apply the awesomeness of CTE’s and Window Functions on. I needed to process a dataset into a deterministic output. I had to output the data in pre-determined sort order. Each partition would have a maximum of 10 rows.

To preserve the deterministic order of my data, I would have to prefix each data with 0,1,2,3,5,6,7,8, and 9 within the partition group.

Easy, right? Not really. The problem was my input (bar delimited string) has values that could exceed that 10 max. In cases when the number of values exceeds 10, I would grab the last 10 values. The number of values is not fixed. Some record would have 2, 3, 11, 99, or who-knows-what number of values. Regardless of the number of values, I need to grab the last 10.

Since I don’t know how many values there are in a given record, I would have to read beginning from the last value by reversing the whole string input, and reversing the results back to normal order. I have a parse function that split the values in the bar delimited string input. Since I am parsing the input in reverse, the 0 index gets assigned to the actual last value in the string (reversed).

I was working on SQL Server 2000 for this requirement 🙁

To give you an idea of what I am talking about, here’s my input (left) and the desired output (right). Basically, I needed to resequence the numeric prefix in order, i.e., 0,1,2,3…9 from an input of x…3,2,1,0

Let me walk you through step by step…

Let’s build the sample dataset. Pardon my insert constructor. Remember this is SQL 2000. It could have been nicer, I know.

SELECT x.part_no, x.sub_part_no
INTO #temp
FROM ( SELECT 'a' AS part_no, '7ABC' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no,'6DEF' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '5GHI' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '4JKL' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '3MNO' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '2QRS' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '1TUV' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '0WXY' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '4A12' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '3B34' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '2C56' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '1D78' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '0E01' AS sub_part_no
) x

This is how the pre-processed dataset looks like:

SQL Server 2000

The next step is to find out the maximum, or highest, number of prefix (sequence) in each partition (part_no in this case). You probably already know what I am trying to do here.

SELECT part_no
,LEFT(sub_part_no, 1) AS sequence_no
FROM #temp y

A simple subquery (subselect) returns exactly that:

SELECT part_no
,( SELECT MAX(LEFT(sub_part_no, 1))
FROM #temp x
WHERE x.part_no = y.part_no
) AS max_sequence
,sub_part_no
FROM #temp y

Now that I have determined the max value of my prefix (sequence) within my partition, I lay that next to my original sequence and let simple arithmetic do its magic.

SELECT part_no, max_sequence, sequence_no, cast(max_sequence AS INT) - CAST(sequence_no AS INT) AS new_sequence
FROM(
SELECT part_no
,(SELECT MAX(LEFT(sub_part_no,1)) FROM #temp x WHERE x.part_no = y.part_no) AS max_sequence
,LEFT(sub_part_no,1) AS sequence_no
FROM #temp y
) xx

Putting that together…

SELECT z.part_no
,sub_part_no AS old_sub_part_no
,CAST(CAST(max_counter AS INT) - CAST(LEFT(sub_part_no, 1) AS INT) AS VARCHAR(2)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM ( SELECT part_no
,sub_part_no
,( SELECT MAX(LEFT(sub_part_no, 1))
FROM #temp x
WHERE x.part_no = y.part_no
) AS max_counter
FROM #temp y
GROUP BY part_no
,sub_part_no
) z
ORDER BY z.part_no
,z.new_sub_part_no

Another option is to simulate the ROW_NUMBER() window function (reference here)…

SELECT part_no
,sub_part_no AS old_sub_part_no
,CAST(z.rowNumber AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM ( SELECT part_no
,sub_part_no
,( SELECT COUNT(*) - 1
FROM #temp AS x
WHERE x.part_no = y.part_no
AND x.sub_part_no >= y.sub_part_no
) AS rowNumber
FROM #temp AS y
) AS z
ORDER BY part_no, new_sub_part_no

And there you have it. That’s a simple way of resequencing values within the partition in TSQL.

SQL Server 2008/2012/2014

UPDATE: (3/26/2015)

Thanks, Bob for the script (see comment below). I tested this with SQL Server 2008 R2 and SQL Server 2012

SELECT part_no
,sub_part_no AS old_sub_part_no
,CAST (ROW_NUMBER() OVER ( PARTITION BY part_no ORDER BY CONVERT(INT, LEFT(sub_part_no, 1)) DESC ) - 1 AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM #temp