05/29/2015
by Marlon Ribunal
4 Comments

Finding gaps or missing dates in a date range using TSQL

Here’s a quick how-to on returning temporal data set that includes missing dates. Suppose you are tasked to query an employee’s “time sheet”. You’d want to return not only the days he’s reported to work but also all the days that he missed.

The expected result would look something like this:

finding missing dates in sql server

Actually, this was asked in the Facebook page of a SQL Server user group.

These are the answers that I gave. There is more than one way to skin a cat (no cats were harmed in the writing of this post).

These are only two of the many options. Feel free to add yours in the comment section.

Test Data

Let’s create our sample primary table (TimeSheet)


CREATE TABLE TimeSheet
(
logdate DATETIME NULL
,empno CHAR(3) NULL
,timein SMALLDATETIME NULL
,timeout SMALLDATETIME NULL
);

Let’s insert some sample data


INSERT INTO dbo.TimeSheet
( logdate, empno, timein, timeout )
VALUES ( '5/18/2015', '001', '08:30AM', '04:30PM' )
,( '5/20/2015', '001', '09:00AM', '03:30PM' )
,( '5/22/2015', '001', '08:30AM', '05:30PM' );

Now that we have our sample data let’s run a couple of scripts.

Using Calendar Table

One of the most convenient ways of finding missing dates in a date range is by using a Calendar table. There’s a ton of materials on this topic. Here’s one that tells you why you need a Calendar table.

For our demo purpose, lets create a simple calendar table that will contain our date range.


CREATE TABLE calendar ( date SMALLDATETIME );

Then, let’s populate that with a small sample of date range:


INSERT INTO dbo.calendar
( date )
VALUES ( '05/18/2015' )
,( '05/19/2015' )
,( '05/20/2015' )
,( '05/21/2015' )
,( '05/22/2015' );

Let’s use the calendar table to come up with the expected result:


SELECT CONVERT(VARCHAR(10), c.date, 101) AS logdate
,t.empno
,CONVERT(CHAR(5), t.timein, 108) AS timein
,CONVERT(CHAR(5), t.timeout, 108) AS timeout
FROM dbo.TimeSheet t
RIGHT OUTER JOIN dbo.calendar c
ON t.logdate = c.date;

And here is what we got using that calendar table:


logdate empno timein timeout
---------- ----- ------ -------
05/18/2015 001 08:30 16:30
05/19/2015 NULL NULL NULL
05/20/2015 001 09:00 15:30
05/21/2015 NULL NULL NULL
05/22/2015 001 08:30 17:30

(5 row(s) affected)

Using Common Table Expression (CTE)

Introduced in SQL Server 2005, Common Table Expression (CTE) has been a convenient “tool” for many SQL Developers. Basically, a CTE is a temporary result set. It is not stored as an object in SQL Server. One of the advantages of CTE’s is it can reference itself. How is that useful? Well, we can see that in our example.


DECLARE @startdate DATETIME
,@enddate DATETIME;

SET @startdate = '05/18/2015';
SET @enddate = '05/22/2015';
WITH calendardates
AS ( SELECT date = @startdate
UNION ALL
SELECT DATEADD(DAY, 1, date)
FROM calendardates
WHERE DATEADD(DAY, 1, date) <= @enddate
)
SELECT CONVERT(VARCHAR(10), c.date, 101) AS logdate
,t.empno
,CONVERT(CHAR(5), t.timein, 108) AS timein
,CONVERT(CHAR(5), t.timeout, 108) AS timeout
FROM dbo.TimeSheet t
RIGHT JOIN calendardates c
ON t.logdate = c.date;

You probably noticed that our CTE named “calendardates” is referenced in the FROM clause within the CTE statement.

Here’s the result.


logdate empno timein timeout
---------- ----- ------ -------
05/18/2015 001 08:30 16:30
05/19/2015 NULL NULL NULL
05/20/2015 001 09:00 15:30
05/21/2015 NULL NULL NULL
05/22/2015 001 08:30 17:30

(5 row(s) affected)

Update: Jeff Moden called my attention to this beautiful article he wrote about the negative impact of Recursive CTE’s, which is a must-read for anyone interested in Recursive CTE’s or rCTE’s as he called them. He’s that good at making up terms like that. Well, if you heard about “RBAR” (ree-bar) and “Tally Table”, he coined those terms.

Please read Jeff’s “Hidden RBAR: Counting with Recursive CTE’s” before considering using rCTE’s in your TSQL codes.

03/19/2015
by Marlon Ribunal
3 Comments

How to resequence column based on numeric prefix using TSQL

Beginning in version 2005, SQL Server is making manipulation of  partitioned set of rows easier by using Window Functions (ROW_NUMBER(), aggregate functions with OVER(), etc.). You can manipulate partitions of rows on the fly with Window Functions.

I’ve seen how complicated queries that require recursion were simplified by using combination of   Common Table Expression (CTE) and Window Functions. There are less reasons now, or almost zero excuses, to justify using cursors in your queries.

I had a data requirement that I thought on the onset that I could apply the awesomeness of CTE’s and Window Functions on. I needed to process a dataset into deterministic output. I had to output the data in pre-determined sort order. Each partition would have maximum of 10 rows.

To preserve the deterministic order of my data, I would have to prefix each data with 0,1,2,3,5,6,7,8,and 9 within the partition group.

Easy, right? Not really. The problem was my input (bar delimited string) has values that could exceed that 10 max. In cases when the number of values exceeds 10, I would grab the last 10 values. The number of values is not fixed. Some record would have 2, 3, 11, 99, or who-knows-what number of values. Regardless of the number of values, I need to grab the last 10.

Since I don’t know how many values there are in a given record, I would have to read beginning from the last value by reversing the whole string input, and reversing the results back to normal order. I have a parse function that split the values in the bar delimited string input. Since I am parsing the input in reverse, the 0 index gets assigned to the actual last value in the string (reversed).

I was working on SQL Server 2000 for this requirement :-(

To give you an idea of what I am talking about, here’s my input (left) and the desired output (right). Basically, I needed to resequence the numeric prefix in order, i.e., 0,1,2,3…9 from an input of x…3,2,1,0

resequence tsql sql server

Let me walk you through step by step…

Let’s build the sample dataset. Pardon my insert constructor. Remember this is SQL 2000. It could have been nicer, I know.


SELECT x.part_no, x.sub_part_no
INTO #temp
FROM ( SELECT 'a' AS part_no, '7ABC' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no,'6DEF' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '5GHI' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '4JKL' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '3MNO' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '2QRS' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '1TUV' AS sub_part_no
UNION ALL
SELECT 'a' AS part_no, '0WXY' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '4A12' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '3B34' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '2C56' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '1D78' AS sub_part_no
UNION ALL
SELECT 'b' AS part_no, '0E01' AS sub_part_no
) x

This is how the pre-processed dataset looks like:

input sequence tsql

SQL Server 2000

The next step is to find out the maximum, or highest, number of prefix (sequence) in each partition (part_no in this case). You probably already know what I am trying to do here.


SELECT part_no
,LEFT(sub_part_no, 1) AS sequence_no
FROM #temp y

max sequence tsql

A simple subquery (subselect) returns exactly that:


SELECT part_no
 ,( SELECT MAX(LEFT(sub_part_no, 1))
 FROM #temp x
 WHERE x.part_no = y.part_no
 ) AS max_sequence
 ,sub_part_no
FROM #temp y

resequencing values in tsql

Now that I have determined the max value of my prefix (sequence) within my partition, I lay that next to my original sequence and let simple arithmetic do its magic.


SELECT part_no, max_sequence, sequence_no, cast(max_sequence AS INT) - CAST(sequence_no AS INT) AS new_sequence
FROM(
SELECT part_no
,(SELECT MAX(LEFT(sub_part_no,1)) FROM #temp x WHERE x.part_no = y.part_no) AS max_sequence
,LEFT(sub_part_no,1) AS sequence_no
FROM #temp y
) xx

new sequence tsql

Putting that together…

SELECT z.part_no
 ,sub_part_no AS old_sub_part_no
 ,CAST(CAST(max_counter AS INT) - CAST(LEFT(sub_part_no, 1) AS INT) AS VARCHAR(2)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM ( SELECT part_no
 ,sub_part_no
 ,( SELECT MAX(LEFT(sub_part_no, 1))
 FROM #temp x
 WHERE x.part_no = y.part_no
 ) AS max_counter
 FROM #temp y
 GROUP BY part_no
 ,sub_part_no
 ) z
ORDER BY z.part_no
 ,z.new_sub_part_no

Another option is to simulate the ROW_NUMBER() window function (reference here)…

SELECT part_no
,sub_part_no AS old_sub_part_no
,CAST(z.rowNumber AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM ( SELECT part_no
,sub_part_no
,( SELECT COUNT(*) - 1
FROM #temp AS x
WHERE x.part_no = y.part_no
AND x.sub_part_no >= y.sub_part_no
) AS rowNumber
FROM #temp AS y
) AS z
ORDER BY part_no, new_sub_part_no

tsql resequence output
And there you have it. That’s a simple way of resequencing values within partition in TSQL.

 SQL Server 2008/2012/2014

UPDATE: (3/26/2015)

Thanks, Bob for the script (see comment below). I tested this with SQL Server 2008 R2 and SQL Server 2012

SELECT part_no
,sub_part_no AS old_sub_part_no
,CAST (ROW_NUMBER() OVER ( PARTITION BY part_no ORDER BY CONVERT(INT, LEFT(sub_part_no, 1)) DESC ) - 1 AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no
FROM #temp

07/03/2014
by Marlon Ribunal
0 comments

Executing scripts on multiple servers the easy way

Central Management Server and server groups provide a convenient way to manage multiple servers and databases in one place and at the same time.You can register and manage any servers but only those in SQL Server versions 2008 and higher can be designated as Central Management Server. You can execute […] Continue Reading…