Beginning in version 2005, SQL Server is making manipulation of a partitioned set of rows easier by using Window Functions (ROW_NUMBER(), aggregate functions with OVER(), etc.). You can manipulate partitions of rows on the fly with Window Functions.
I’ve seen how complicated queries that require recursion were simplified by using a combination of Common Table Expression (CTE) and Window Functions. There are fewer reasons now, or almost zero excuses, to justify using cursors in your queries.
I had a data requirement that I thought on the onset that I could apply the awesomeness of CTE’s and Window Functions on. I needed to process a dataset into a deterministic output. I had to output the data in pre-determined sort order. Each partition would have a maximum of 10 rows.
To preserve the deterministic order of my data, I would have to prefix each data with 0,1,2,3,5,6,7,8, and 9 within the partition group.
Easy, right? Not really. The problem was my input (bar delimited string) has values that could exceed that 10 max. In cases when the number of values exceeds 10, I would grab the last 10 values. The number of values is not fixed. Some record would have 2, 3, 11, 99, or who-knows-what number of values. Regardless of the number of values, I need to grab the last 10.
Since I don’t know how many values there are in a given record, I would have to read beginning from the last value by reversing the whole string input, and reversing the results back to normal order. I have a parse function that split the values in the bar delimited string input. Since I am parsing the input in reverse, the 0 index gets assigned to the actual last value in the string (reversed).
I was working on SQL Server 2000 for this requirement 🙁
To give you an idea of what I am talking about, here’s my input (left) and the desired output (right). Basically, I needed to resequence the numeric prefix in order, i.e., 0,1,2,3…9 from an input of x…3,2,1,0
Let me walk you through step by step…
Let’s build the sample dataset. Pardon my insert constructor. Remember this is SQL 2000. It could have been nicer, I know.
SELECT x.part_no, x.sub_part_no INTO #temp FROM ( SELECT 'a' AS part_no, '7ABC' AS sub_part_no UNION ALL SELECT 'a' AS part_no,'6DEF' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '5GHI' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '4JKL' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '3MNO' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '2QRS' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '1TUV' AS sub_part_no UNION ALL SELECT 'a' AS part_no, '0WXY' AS sub_part_no UNION ALL SELECT 'b' AS part_no, '4A12' AS sub_part_no UNION ALL SELECT 'b' AS part_no, '3B34' AS sub_part_no UNION ALL SELECT 'b' AS part_no, '2C56' AS sub_part_no UNION ALL SELECT 'b' AS part_no, '1D78' AS sub_part_no UNION ALL SELECT 'b' AS part_no, '0E01' AS sub_part_no ) x
This is how the pre-processed dataset looks like:
SQL Server 2000
The next step is to find out the maximum, or highest, number of prefix (sequence) in each partition (part_no in this case). You probably already know what I am trying to do here.
SELECT part_no ,LEFT(sub_part_no, 1) AS sequence_no FROM #temp y
A simple subquery (subselect) returns exactly that:
SELECT part_no ,( SELECT MAX(LEFT(sub_part_no, 1)) FROM #temp x WHERE x.part_no = y.part_no ) AS max_sequence ,sub_part_no FROM #temp y
Now that I have determined the max value of my prefix (sequence) within my partition, I lay that next to my original sequence and let simple arithmetic do its magic.
SELECT part_no, max_sequence, sequence_no, cast(max_sequence AS INT) - CAST(sequence_no AS INT) AS new_sequence FROM( SELECT part_no ,(SELECT MAX(LEFT(sub_part_no,1)) FROM #temp x WHERE x.part_no = y.part_no) AS max_sequence ,LEFT(sub_part_no,1) AS sequence_no FROM #temp y ) xx
Putting that together…
SELECT z.part_no ,sub_part_no AS old_sub_part_no ,CAST(CAST(max_counter AS INT) - CAST(LEFT(sub_part_no, 1) AS INT) AS VARCHAR(2)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no FROM ( SELECT part_no ,sub_part_no ,( SELECT MAX(LEFT(sub_part_no, 1)) FROM #temp x WHERE x.part_no = y.part_no ) AS max_counter FROM #temp y GROUP BY part_no ,sub_part_no ) z ORDER BY z.part_no ,z.new_sub_part_no
Another option is to simulate the ROW_NUMBER() window function (reference here)…
SELECT part_no ,sub_part_no AS old_sub_part_no ,CAST(z.rowNumber AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no FROM ( SELECT part_no ,sub_part_no ,( SELECT COUNT(*) - 1 FROM #temp AS x WHERE x.part_no = y.part_no AND x.sub_part_no >= y.sub_part_no ) AS rowNumber FROM #temp AS y ) AS z ORDER BY part_no, new_sub_part_no
And there you have it. That’s a simple way of resequencing values within the partition in TSQL.
SQL Server 2008/2012/2014
UPDATE: (3/26/2015)
Thanks, Bob for the script (see comment below). I tested this with SQL Server 2008 R2 and SQL Server 2012
SELECT part_no ,sub_part_no AS old_sub_part_no ,CAST (ROW_NUMBER() OVER ( PARTITION BY part_no ORDER BY CONVERT(INT, LEFT(sub_part_no, 1)) DESC ) - 1 AS VARCHAR(1)) + RIGHT(sub_part_no, LEN(sub_part_no) - 1) AS new_sub_part_no FROM #temp
Pingback: Finding gaps or missing dates in a date range using TSQL - SQL Server - SQL Server - Toad World
Pingback: Finding gaps in or missing dates using TSQL - SQL Server - SQL Server - Toad World