03/08/2019
by Marlon Ribunal
1 Comment

SQL Server 2017 Reporting Services: The Basics Part 5

We’re now on Part 5 of our SQL Server Reporting Services Basics Series. In this installment, we’ll learn how to use a range parameter. If you want to review the series: Part 1, Part 2, Part 3, Part 4.

You may remember in Part 4 we created a simple parameterized report. We’re going to build upon the same report and add few elements to how we use the parameter.

If you followed Part 4, you have everything that you need to run this report. We only need to modify it a bit to make it work with the range parameter. Let’s dig in.

Create Dataset For Range Parameter Source

When I say “range parameter”, I mean a parameter that uses a preset range of values; example 1 – 100.

For the purpose of this demo, I have created a lookup table for the ranges. You may remember from Part 4 that our existing parameter is called Order Total. We’re going to use the same parameter.

Let’s create a new table called DollarTotalRange:

CREATE TABLE DollarTotalRange
(
RangeID INT NOT NULL IDENTITY(1, 1),
MinTotal INT NULL,
MaxTotal INT NULL,
RangeLabel VARCHAR(150)
) 

Let’s populate that with some data that we’re going to use for our ranges:

INSERT INTO dbo.DollarTotalRange ( MinTotal, MaxTotal )
VALUES ( 1, 1000 )
	, ( 1001, 3000 )
	, ( 3001, 5000 )
	, ( 5001, 8000 )
	, ( 8001, 10000 )
	, ( 10001, 20000 )
	, ( 20001, 30000 )
	, ( 30001, NULL ) ;

Let’s also populate the RangeLabel column with the combination of the MinTotal and MaxTotal columns as ranges:

UPDATE dbo.DollarTotalRange
SET	RangeLabel = CAST(FORMAT( MinTotal, '#,#' ) AS VARCHAR (25)) + 
    ' - ' + 
	ISNULL( CAST(FORMAT( MaxTotal, '#,#' ) AS VARCHAR (25)), 'MAX' ) ;

Let’s take a look at our ranges:

Modify the Stored Procedure with the new Parameter

Let’s modify the existing stored procedure so it can use the new parameter type. It is important to note that the data type of the Order Total parameter is FLOAT. We’re changing that to STRING.

We’re going to make sure also that we can accommodate the possible maximum value of 999,999,999; or you may want to set your own Max Value.

ALTER PROCEDURE [dbo].[OrderTotalQuery] ( 
    @vOrderTotal VARCHAR(255) = NULL
	
)
AS
BEGIN

DECLARE @Min INT, @Max INT

SELECT @Min = MinTotal, @Max = MaxTotal FROM dbo.DollarTotalRange WHERE RangeLabel = @vOrderTotal

SELECT c.CustomerName
    , SUM(sol.Quantity * sol.UnitPrice) AS OrderTotal
FROM Sales.Orders so 
JOIN sales.OrderLines sol ON so.OrderID = sol.OrderID
JOIN sales.Customers c on so.CustomerID = c.CustomerID
GROUP By c.CustomerName, sol.OrderID
HAVING SUM(sol.Quantity * sol.UnitPrice) >= @Min AND SUM(sol.Quantity * sol.UnitPrice) <= ISNULL(@Max,999999999)
ORDER BY OrderTotal Desc
END

Create a New Report Dataset

First thing first, let’s create a new stored procedure based on the DollarTotalRange table we created above.

CREATE PROCEDURE [dbo].[DollarRangeSelect]
AS
BEGIN
   SELECT MinTotal, MaxTotal, RangeLabel
   FROM dbo.DollarTotalRange ;
END	;

Open your Visual Studio and load you SSRS Solution if you haven’t loaded it yet. Navigate to the Solution Explorer, and under Shared Dataset create a new dataset called ds_SSRSTutorial_001_DollarRange. Select the Stored Procedure for the Query Type. Then select the stored procedure we created above.

Then leave everything else as is. Click OK.

Modify the Existing Report Dataset

Under the Shared Dataset, open the existing ds_SSRSTutorial_001_OrderTotalQuery. Click on the Parameters tab and change the data type of the @vOrderTotal parameter from FLOAT to TEXT.

Click OK.

Create a new Report Parameter

Create a new report parameter based on the dataset we just created above. Name it rpt_ds_SSRSTutorial_001_DollarRange.

Click OK.

Modify the Report Parameter

On the Report Data window, navigate to the report parameter.

  1. Navigate to the Report Data window. Edit the
    @vOrderTotal parameter
  2. From the General tab, change the data type of the parameter @vOrderTotal to Text.
  3. On the Available Values tab, Select the Get values from a query option. Select the
    rpt_ds_SSRSTutorial_001_DollarRange dataset we created above. Select RangeLabel column for both Value field and Label field.
Let’s Preview the Report

Deploy the Report!

If you are not familiar with your options of deploying your report, go back Part 3: Report Deployment.

Next, we will continue to work on another type of report parameter, multi-value parameter.

03/01/2019
by Marlon Ribunal
1 Comment

Transforming Data With Azure Data Factory Data Flow

Azure Data Factory (ADF) offers a convenient cloud-based platform for orchestrating data from and to on-premise, on-cloud, and hybrid sources and destinations. But it is not a full Extract, Transform, and Load (ETL) tool. For those who are well-versed with SQL Server Integration Services (SSIS), ADF would be the Control Flow portion.

You can scale out your SSIS implementation in Azure. In fact, there are two (2) options to do this: SSIS On-Premise using the SSIS runtime hosted by SQL Server or On Azure using the Azure-SSIS Integration Runtime.

Azure Data Factory is not quite an ETL tool as SSIS is. There is that transformation gap that needs to be filled for ADF to become a true On-Cloud ETL Tool. The second iteration of ADF in V2 is closing the transformation gap with the introduction of Data Flow.

Let’s build and run a Data Flow in Azure Data Factory v2.

Overview

Just to give you an idea of what we’re trying to do in this post, we’re going to load a dataset based on a local, on-premise SQL Server Database, copy that data into Azure SQL Database, and load that data into blob storage in CSV Format.

In this post we discuss the following topics in a step-by-step fashion:

  1. Azure Data Flow
  2. Data Transformation in Azure Data Factory with Data Flow
  3. Data Flow Components:
    • Select
    • Derived Column
    • Aggregate
    • Pivot
    • Sort
    • CSV Sink

First thing first: If this is the first time you’re using Azure Data Factory, you need to create a data factory in Azure, and for the purpose of the demo in this post, you also need to set up a blob storage.

Prerequisite: Copy the Working Data

Create a new pipeline with copy activity. If you follow the instruction from the previous post, Copy Data From On-Premise SQL Server To Azure Database Using Azure Data Factory, that is our first step.

Change the copy activity source and sink as follow:

Source Query:

SELECT c.CustomerName                    
       , p.FullName AS SalesPerson
       , o.OrderDate
       , s.StockItemName
       , l.Quantity
       , l.UnitPrice
 FROM Sales.Orders o
 JOIN Sales.OrderLines l
   ON o.OrderID = l.OrderID
 JOIN Sales.Customers c
   ON o.CustomerID = c.CustomerID
 JOIN Warehouse.StockItems s
   ON l.StockItemID = s.StockItemID
 JOIN Application.People p
   ON p.PersonID = o.SalespersonPersonID ;

Sink (New table)

CREATE TABLE SalesReport ( 
 CustomerName NVARCHAR (100), 
 SalesPerson NVARCHAR (50), 
 OrderDate DATE, 
 StockItemName NVARCHAR (100), 
 Quantity INT, 
 UnitPrice DECIMAL (18, 2)) ;

Create A Data Flow

Navigate to your Azure Data Factory. Click the Author & Monitor tile to open the ADF home page. From there, click on the pencil icon on the left to open the author canvas.

  1. Click on the ellipsis next to Data Flows (which is still in preview as of this writing).
  2. Select Add Dataflow in the context menu.
  3. Let’s name our Data Flow DataFlowTest001.

Add the Data Source

As of this writing, there are only a few supported sources in Data Flow: Azure Data Warehouse, Azure SQL Database, CSV, and Parquet (column-oriented data storage format of Hadoop). Other sources such as Amazon S3, Oracle, ODBC, HTTP, etc. are coming soon.

Let’s set up the Source.

  1. Click on the Add Source tile.
  2. Let’s name our source as DataFlowTest001DataSource.
  3. Next, create a new dataset.

Create New Source Dataset

We need to set up a new dataset. You should have completed the Prerequisite (see above) by now so you will have a working data for this demo.

There are only a few supported data sources as of this writing: Azure SQL Data Warehouse, Azure SQL Database, CSV, and Parquet (column-oriented data storage format for Hadoop).

  1. In the Source Dataset, click New.
  2. Select Azure SQL Database. Click Finish.

Set up the Source Dataset

Set the dataset as follows:

  1. Name the dataset as AzureSQLSalesReport.
  2. Select the Linked Service AzureSqlDatabase.
  3. For the Table, select the SalesReport. Click Finish.

If you are wondering what that Linked Service is, that would be the connection in SSIS. It is easy to set up. At the bottom of the left-hand blade, click on Connection. This gives you the list of your Linked Services and Integration Runtimes. You can also create new Linked Services from this page.

The AzureSqlDatabase linked service was set up as follows:

Add a Select Component

  1. To add a component to the Data Flow, click the Plus sign at the bottom of the DataFlowDataSource component.
  2. In the Row Modifier section of the context menu, select Select.

Set up the Select Component

What the Select component does is just that, selecting columns like you would if execute a Select command.

  1. Name the component as SalesPersonData.
  2. Add the following columns if they aren’t yet added.
    • SalesPerson
    • OrderDate
    • Quantity
    • UnitPrice

You may notice the information on the left-hand side that says, “4 items: 2 column(s) from the inputs left unmapped.” That’s what we want.

You may want to Inspect and Preview the data. But make sure you switch the Debug mode on top before you preview.

Add a Derived Column Component

  1. Name the Derived Column component or stream as DerivedYearOrderDate. Make sure that the incoming stream is the SalesPersonData (Component).
  2. Add a new derived column called Year and set the expression to year(Orderdate)

Add an Aggregate Component

  1. Name the Aggregate component as LineSalesTotal. The incoming stream should be DerivedYearOrderDate.
  2. Click the Group By Column and add the following Columns
    • SalesPerson
    • Year (derived column based on the OrderDate)
  3. Click the Aggregates tab. Add a new column and call it Linesales. Set the expression to sum(multiply(UnitPrice, Quantity))

Add a Pivot Component

  1. Name the component as PivotSalesPersonPerYear. Make sure that the incoming stream is from our aggregate, LineSalesTotal.
  2. Click on the Group By tab. Select SalesPerson. Keep the alias as is or assign a different alias.
  3. Click on the Pivot Key tab. Select Year.
  4. Click on the Pivoted Column tab. Set up as follows:
    • For the Column name pattern, select the first pattern in the dropdown.
    • Select Lateral for Column arrangement.
    • Type [space] Total Sales in the Suffix box.
    • In the expression box, add sum(LineSales)
    • Leave the column prefix empty.

Add a Sort Component

Let’s sort the stream by name, alphabetically sorted ascending.

  1. Name the Sort component as SortSalesPersonByName. Make sure that the incoming stream is PivotSalesPersonPerYear.
  2. On the Sort Condition, select SalesPerson.
  3. Toggle the Order arrow so it is pointing up.

If you preview the data on the Sort component, you should now see something like this:

Add Destination Sink Component

  1. Let’s call the Sink, SalesReportByYearCSV.
  2. I have previously created a CSV dataset on Azure Blob Storage . You can also create a New dataset in your Data Flow. Create, or in our case, select SalesReportCSV dataset from the Sink dataset dropdown.
  3. Navigate to the Settings tab. On the File name option, select Output to single file.
  4. For the File name, type 2013-2016_SalesReportByYear.

Note: The Column Count on the last 3 components don’t look right. Even the mapping is not showing the right number of the streamed columns. The preview shows the expected results, though.

Overview of the Azure Data Factory Data Flow

If you’ve completed all the above steps, you should now have something like this:

Run the Data Flow

Just like in SSIS, you can not run the Data Flow itself in the data flow context. It needs to be in the Control Flow, or Pipeline in ADF, for its execution.

  1. Create a new Pipeline. Name it DataFlowSalesReport.
  2. Navigate to Activities > Move & Transform. Drag the Data Flow activity component to the canvas. That opens the blade on the left-hand side where you can set up the Data Flow that we just created above with the pipeline activity.
  3. In the Adding Data Flow blade, select Use Existing Data Flow.
  4. Select the Existing Data Flow dropdown, DataFlowTest001. Click Finish.

The only setting that I would change in the pipeline setting is the Timeout. Set the pipeline to time out after 30 minutes.

Publish the Changes and Trigger the Activity Now
  1. Click Publish (should be on top, right-hand-side.
  2. Click on the Trigger, then select Trigger Now to execute the pipeline (and essentially the Data Flow).

Here are some of the few ways of checking the status of the pipeline activity processing:

Check the CSV File Output
  1. To check the file, open your Azure Storage Explorer and Navigate to your Blob storage. You can download the Azure Storage Explorer here.
  2. And you should find the CSV output file produced by the Data Flow.

02/21/2019
by Marlon Ribunal
1 Comment

Copy Data From On-Premise SQL Server To Azure Database Using Azure Data Factory

Let’s take a break from our SQL Server 2017 Reporting Services Basics Series and jump to Azure Data Factory (v2). The latest installment on our SSRS Series is about adding a simple parameter to the report, which you can find here.

Azure Data Factory (ADF) is a data integration service for cloud and hybrid environments (which we will demo here). ADF provides a drag-and-drop UI that enables users to create data control flows with pipeline components which consist of activities, linked services, and datasets. of course, there’s an option to set up components manually for more control. More info on ADF can be found here.

ADF is basically a fully-managed Azure service that provides a platform for orchestrating data movement or Extraction, Transformation, and Loading (ETL) of data from sources to target databases or analytics services. If you’re thinking of SSIS, that would be a similar concept.

In this post, we’ll learn how to copy data from our local instance of SQL Server into an Azure SQL Database by using Azure Data Factory (v2).

First thing first, if you do not have access to an Azure Subscription, sign up for a trial account here. Then, create the following services:

  • Azure SQL Database
  • Azure Data Factory (v2)
  • Local Instance of SQL Server 2017
  • WideWorldImporters Database (in your local instance)

Choosing a Pipeline Template

From your Azure Portal, navigate to your Resources and click on your Azure Data Factory. In the ADF blade, click on Author & Monitor button.

That will open a separate tab for the Azure Data Factory UI. For this demo, we’re going to use a template pipeline.

From the Template Gallery, select Copy data from on-premise SQL Server to SQL Azure.

That will open the panel for setting up the pipeline. Since this is the first time we’re copying a local SQL Server into Azure SQL Database on this ADF, we need to set up the connection between our local database and ADF.

Create the Datasource

From the panel, in the Datasource field, click the dropdown and select New. You can also click the link to the documentation for this template which gives you information about capabilities, prerequisites, and other info of this pipeline.

Create A Linked Service

Clicking New as shown above will open another panel for setting up a Linked Service that will facilitate the connection between our local database and the Azure database. Let’s name our SQL Server linked service LocalDBToAzureDB. Add a description.

In the Connect via Integration Runtime, select New.

“But what is Integration Runtime?”, you ask:

The Integration Runtime (IR) is the compute infrastructure used by Azure Factory to provide data integration capabilities across different network environments. Azure Integration Runtime can be used to connect to data stores and compute services in public network with public accessible endpoints. Use a self-hosted Integration Runtime for Private/On-Premise Networks.

Azure Data Factory Integration Runtime Info

In the Integration Runtime Setup panel, select Self-Hosted.

Click Next. Let’s name our runtime LocalDBAzureDBRuntime. Add a description if you want. The runtime Type, Self-Hosted, should already be selected by the system. Click Next again.

The next panel will provide two options to install the run time – Express or Manual. It also provides a couple of authentication keys to use if you want to add more nodes.

Let’s select Express setup. The link will download an executable. Save it to a convenient location. Run the executable.

The installer will download, install, and register all the necessary components for the runtime. It will also install a configuration manager that we can use to set up, diagnose, and update the runtime. When you get the successful installation message, click Close.

Setting up the Linked Service to Local SQL Server

Next, we need to change settings of the runtime. Find and open the Microsoft Integration Runtime Configuration Manager.

Click the Settings tab. On the Status, there’s a link that says Change. Click that. For the purpose of this demo, let’s select the basic option. Of course, you may want to encrypt the service if this is a production server.

Click OK. That will restart the runtime. Wait until it reconnects to the cloud. You can test the connections on both ends – local SQL Server and Azure Database – in the Diagnostics tab.

Go back to the New Linked Service panel to complete the datasource setup. Provide the local Server name and Database name. I have a local sysadmin account (sqladmin) that I used to access the local database. Test the connection.

Click Finish.

Create the Destination Table

Now before we set up the destination Azure SQL Database, let’s connect to the Azure Database via SQL Server Management Studio (SSMS) and create a new table called StockGroupLineProfit.

CREATE TABLE StockGroupLineProfit ( 
       StockGroupName VARCHAR (255)
     , LineProfit FLOAT ) ;

Let’s go back to the template setup and select the DataDestination. I have already set up my Azure SQL Database in advance. Select the Azure SQL Database. Set a new one if you haven’t done so.

Click Use This Template.

In the pipeline designer UI, set the Timeout of the General properties to 30 minutes. That’s enough time for the service to copy the amount of data that we’re going to set in the pipeline.

Set The Pipeline Data Source

Let’s create a new Source. Let’s name it LocalSQLWideWorldImporters. For the Connection, let’s select the Linked Service that we created above.

Let’s go back to the Source setting of our pipeline. Under the Source Dataset, select Query and input the following query.

SELECT sg.StockGroupName
	 , SUM( il.LineProfit ) AS LineProfit
FROM Sales.InvoiceLines il
JOIN Warehouse.StockItems si
  ON il.StockItemID = si.StockItemID
JOIN Warehouse.StockItemStockGroups sis
  ON si.StockItemID = sis.StockItemID
JOIN Warehouse.StockGroups sg
  ON sis.StockGroupID = sg.StockGroupID
GROUP BY sg.StockGroupName ;

If that T-SQL Code is familiar, it is because it’s the same query we used for our SSRS Report here.

Preview the data so we can have an idea of what we are loading to our Azure SQL Database.

In the Sink tab, let’s create a new sink dataset. Click New. Select Azure SQL Database. Click Finish.

Let’s name our sink dataset AzureStockGroup. For the linked service, I am selecting the Azure SQL Database that I have already set up prior. You have to create a new service if you haven’t done so. This is our target or destination Azure SQL Database. For the Table, select the table that we just created via SSMS, StockGroupLineProfit.

In the Schema tab of the sink setup, add the following columns:

  • StockGroupName (of type String)
  • LineProfit (of type Decimal)

The preview of the destination should be empty at this point.

Now that both the data source and destination for the pipeline are set up, go back to the pipeline designer. Select the Mapping tab. The source/destination mapping should now show up; if not, click the Import Schemas button.

Publish the Pipeline

To publish the pipeline and the linked services that we just created, click Publish All button at the top of the designer.

You should get a notification on the status of the published objects.

Run the Pipeline

Let’s click Debug in the pipeline designer. What debug does here is to actual run the pipeline and copy from the data set we defined in our data source and into our destination. Wait until the Succeeded message appears.

And, if we query our Azure SQL Database, we should now have the data in our destination table.

Enjoy orchestrating your data using Azure Data Factory!