21 February 2010

More Date-related Functionality – Part II

Arithmetic Progression Sets
    In yesterday’s post on the same topic I was showing how to create a sequence of dates on the fly with the help of common table expressions and multi-statement table-valued functions. I was mentioning that similar functionality could be obtained using a numeric sequence (interval), following to create the dates when calling the function. Here’s the function that provides the respective functionality, the difference from fGetDatesInInterval, excepting the different data types, is that it has a third parameter for the increment value (default 1), being thus possible to generate a list of odd/even numbers or any other type of arithmetic progressions
-- SQL Server 2005+: Numeric Interval
CREATE FUNCTION dbo.fGetInterval(
@Start int
= 0
,
@End int
= 0
,
@Step int=1)

RETURNS
@Interval TABLE
(
Value int PRIMARY KEY)

AS
BEGIN
DECLARE
@Sign smallint
SET
@Sign = IsNull(NullIf(Sign(@End-@Start), 0), 1);

WITH
CTE(Sequence)

AS
(

     SELECT COALESCE(@Start, @End, 0) Sequence
     UNION ALL

     SELECT CTE.Sequence + @Step * @Sign Sequence
     FROM CTE
     WHERE (@End-CTE.Sequence-@Step)*@Sign>=0

)
INSERT @Interval
SELECT
Sequence

FROM
CTE

OPTION
(MAXRECURSION 0)

RETURN
END

-- numeric interval
SELECT *
FROM
dbo.fGetInterval(0, 1000, 1)

-- numeric interval (3rd parameter NULL)
SELECT *
FROM
dbo.fGetInterval(0, 999, NULL)

-- even numbers
SELECT *
FROM
dbo.fGetInterval(0, 1000, 2)

-- odd numbers
SELECT *
FROM
dbo.fGetInterval(1, 1000, 2)

-- single value
SELECT *
FROM
dbo.fGetInterval(1, 1, 2)

 
     The numeric values returned by the function could be used to form a date sequence by using the DateAdd function and a Cast to the date data type. The downside with this approach is that if the dates are used in more expressions within the same query, the expression used to calculate the dates increases the visual complexity of the query, a function like fGetDatesInInterval could be easier to use instead.
-- creating a date sequence
SELECT Value
,
Cast(DATEADD(d, Value, GETDATE()) as Date) DateSequence

FROM
dbo.fGetInterval(0, 1000, 1)


     The fGetInterval has a second particularity, the Value attribute from the table used to retrieve the data was declared as PRIMARY KEY, fact that allows to improve queries’ performance when searches are performed on table’s values. The DateSequence from the table returned by fGetDatesInInterval could be defined as PRIMARY KEY too.

    The common table expression and its use resumes in writing a single statement, and considering that the logic from fGetInterval function could be written as a single statement, in theory we could write an inline table-valued function instead. This technique works as long the OPTION clause is not used within the inline function, when this is attempted is returned the following error: Incorrect syntax near the keyword 'OPTION'. As it seems the OPTION clause can’t be used with single-statement objects like views or inline table-valued functions. Therefore common table expressions can be used in single-statement objects, the functionality being thus restrained to the 100 default limit number of recursion.

Missing Values in a Data Set
    While searching for something on the Web, today I found Aaron Akin’s post on Recursive Common Table Expressions, he’s using them in order to determine the missing numbers/dates in a sequence. This can be easily achieved using for example the fGetDatesInInterval  or fGetInterval functions with a left join to the tested data set. Using the AdventureWorks database and fGetDatesInInterval function, supposing that I would like to see the working dates within a time interval with no POs, the query could be written as follows:
-- working days with no PO
SELECT DI.DateSequence
FROM
dbo.fGetDatesInInterval('2003-01-01', '2003-12-31') DI
LEFT JOIN ( -- list of PO dates

    SELECT OrderDate
    FROM Purchasing.PurchaseOrderHeader 
    GROUP BY OrderDate
) POH
ON DATEDIFF(d, DI.DateSequence, POH.OrderDate)=0

WHERE
DATEPART(dw, DI.DateSequence) BETWEEN 2 AND 6 --working days

AND POH.OrderDate IS NULL

     A similar approach could be used also in case of numeric data sets, here’s an example based on fGetInterval  and SplitList table-valued functions:
SELECT DI.Value
FROM
dbo.fGetInterval(1, 10, 1) DI
   LEFT JOIN dbo.SplitList('1,3,6,7,10', ',') SL
      ON DI.Value = SL.Value

WHERE
SL.Value IS NULL

 

20 February 2010

More Date-related Functionality – Part I

Date Intervals Generation
     In reports is sometimes requested to aggregate transactional data at day level, showing all existing data for a given time interval. This can be easily solved by limiting the time interval in the WHERE clause, though the problem is that it might happen that there are no data in the system for one or more dates and the users want to see that in the report. The easiest approach is to do a left join between the table in which are stored the calendar days and the actual data aggregation, though such a table it’s not always available. What shall we do then? So we need a table-valued function that takes two parameters – the start and end date – and creates all the dates in between. Actually it would be enough to have a sequence of numbers between 0 and the number of days between the two dates, and use the DateAdd function on them. Both problems could be easily solved with the help of common table expressions introduced with SQL Server 2005, while for previous versions could be used a simple loop.
    Here’s the multi-statement table valued function supposed to create the sequence of dates:
-- SQL Server 2005+: Date Sequence in Interval
CREATE FUNCTION dbo.fGetDatesInInterval(
@StartDate
date
, @EndDate date)
RETURNS
@Dates TABLE
(
DateSequence date)

AS
BEGIN
    DECLARE
@Sign smallint
    SET
@Sign = Sign(DateDiff(d, @StartDate, @EndDate));
  
  

   WITH CTE(DateSequence)
   AS
(

   SELECT @StartDate DateSequence
    UNION
ALL

    SELECT
Cast(DATEADD(d, @Sign, CTE.DateSequence) as Date) DateSequence

   FROM
CTE

   WHERE
DATEDIFF(d, CTE.DateSequence, @EndDate)*@Sign>0
)

 
    INSERT @Dates
    SELECT
DateSequence

    FROM
CTE

    OPTION
(MAXRECURSION 0)

RETURN
END
Note:
     The OPTION (MAXRECURSION 0) has been introduced in order to avoid the 100 default limit for the number of recursions allowed for a query, a value of 0 equating with an unlimited number of recursions. The function might run also without this option, though if the number of days between the two input dates is greater or equal than 100 then you’ll get the following error:
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
 
   For example in order to get the dates within a year the function call would be:
-- days of a given year (positive increment)
SELECT DateSequence
,
DATEPART(wk, DateSequence) WeekOfYear

,
DATEPART(mm, DateSequence) MonthOfYear

,
DATENAME(w, DateSequence) DayOfWeekName

,
DATEPART(dw, DateSequence) DayOfWeek
FROM
dbo.fGetDatesInInterval('2010-01-01', '2010-12-31')

     The DatePart and DateName functions could be used to get the various date parts of a date. As can be seen in this case the EndDate is greater than the StartDate, what happens if the users inverse the two dates? The above function already handles this situation with the help of @Sign variable, the following query in which the two parameters are inversed returning the same date sequence as above:
-- days of a given year (negative increment)
SELECT DateSequence
FROM
dbo.fGetDatesInInterval('2010-12-31', '2010-01-01')

   The function handles also NULL values, returning the not-NULL value, and in case both parameters are NULL then it returns the current date:
-- second parameter is NULL
SELECT DateSequence
FROM
dbo.fGetDatesInInterval('2010-12-31', NULL)

-- first parameter is NULL
SELECT DateSequence
FROM
dbo.fGetDatesInInterval(NULL, '2010-12-31')

-- both parameters are NULL
SELECT DateSequence
FROM
dbo.fGetDatesInInterval(NULL, NULL)


   The above function can’t be used in earlier versions than SQL Server 2005, though the same output could be obtained using a loop – here’s the modified function:
-- SQL Server 2000: Date Sequence in Interval
CREATE FUNCTION dbo.fGetDatesInIntervalOld(
@StartDate
date
, @EndDate date)
RETURNS
@Dates TABLE

(
DateSequence date)

AS
BEGIN
    DECLARE
@NumberDays int
    DECLARE
@Counter int
    DECLARE
@Sign smallint
    SET
@NumberDays = IsNull(DateDiff(d, IsNull(@StartDate, @EndDate), @EndDate), 0)

    SET
@Counter = 0

    SET
@Sign = IsNull(NUllIf(Sign(@NumberDays), 0), 1)

    WHILE
(@Counter* @Sign<=@NumberDays * @Sign)

    BEGIN

   INSERT @Dates
   SELECT DateAdd(d, @Counter, COALESCE(@StartDate, @EndDate, GetDate()))


   SET @Counter = @Counter + @Sign

END
RETURN
END
   As can be seen from the below test cases the function returns the same output as the common table expression-based implementation:
-- days of a given year (positive increment)
SELECT DateSequence
FROM
dbo.fGetDatesInIntervalOld('2010-01-01', '2010-12-31')

-- days of a given year (negative increment)

SELECT
DateSequence

FROM
dbo.fGetDatesInIntervalOld('2010-12-31', '2010-01-01')
-- second parameter is NULL
SELECT DateSequence
FROM
dbo.fGetDatesInIntervalOld('2010-12-31', NULL)

-- first parameter is NULL
SELECT DateSequence
FROM
dbo.fGetDatesInIntervalOld(NULL, '2010-12-31')

-- both parameters are NULL
SELECT DateSequence
FROM
dbo.fGetDatesInIntervalOld(NULL, NULL)


Number of Working Days
   Given the fact that for each date could be obtained the day of the week, any of the two above functions could be used also to determine the number of (not) working days in a certain time interval:
-- number of (not) working days
SELECT SUM(CASE WHEN DATEPART(dw, DateSequence) BETWEEN 2 AND 6 THEN 1 ELSE 0 END) WorkingDays
,
SUM(CASE WHEN DATEPART(dw, DateSequence) IN (1,7) THEN 1 ELSE 0 END) WorkingDays

,
COUNT(1) NumberDays

FROM
dbo.fGetDatesInIntervalOld('2010-01-01', '2010-12-31')

 
   Even if handy maybe this isn’t the best approach in order to get the number of (not) working days. Some time ago I developed a similar function in VBA, the respective function subtracting the number of weekends falling in the extended time interval. Because I’ve calculated the number of weekends falling in the extended time interval (including first day of the week in which the Start Date falls, respectively the last day of the week in which the End Date falls), I have to add them again – this being the meaning of the two CASE uses.
-- SQL Server 2000+: Date Sequence in Interval
CREATE FUNCTION dbo.fGetWorkingDays(
@StartDate
date
, @EndDate date)
RETURNS
int
AS
BEGIN
DECLARE
@NumberDays int
SET @NumberDays = IsNull(DateDiff(d, IsNull(@StartDate, @EndDate), @EndDate), 0)
RETURN
@NumberDays -- difference days

- 2 * (@NumberDays + DatePart(dw, @StartDate) + (7 - DatePart(dw, @EndDate))) / 7 -- number weekends in extended weeks

+ (1 - CASE DatePart(dw, @StartDate) WHEN 1 THEN 1 ELSE 0 END) -- weekend to be added (first week)

+ (1 - CASE DatePart(dw, @EndDate) WHEN 7 THEN 1 ELSE 0 END) -- weekend to be added (last week)

+ 1 -- first day

END

   Should be enough to test the week for the various combinations of weekend/working days for 2-3 consecutive weeks:
SELECT dbo.fGetWorkingDays('2010-01-30', '2010-02-13') -- Sa-So (10 days)
SELECT
dbo.fGetWorkingDays('2010-01-31', '2010-02-13') -- So-So (10 days)

SELECT
dbo.fGetWorkingDays('2010-02-01', '2010-02-13') -- Mo-So (10 days)

SELECT
dbo.fGetWorkingDays('2010-02-01', '2010-02-12') -- Mo-Sa (10 days)

SELECT
dbo.fGetWorkingDays('2010-01-30', '2010-02-12') -- So-Sa (10 days)

SELECT
dbo.fGetWorkingDays('2010-01-31', '2010-02-12') -- So-Sa (10 days)

SELECT
dbo.fGetWorkingDays('2010-02-06', '2010-02-12') -- Sa-Sa (5 days)

SELECT
dbo.fGetWorkingDays('2010-02-06', '2010-02-13') -- Sa-Sa (5 days)

SELECT
dbo.fGetWorkingDays('2010-02-06', '2010-02-11') -- Sa-Th (4 days)

SELECT
dbo.fGetWorkingDays('2010-02-07', '2010-02-11') -- So-Th (4 days)

SELECT
dbo.fGetWorkingDays('2010-02-06', '2010-02-07') -- Sa-So (0 days)

SELECT
dbo.fGetWorkingDays('2010-02-06', '2010-02-08') -- Sa-Mo (1 day)

SELECT
dbo.fGetWorkingDays('2010-02-05', '2010-02-08') -- Fr-Mo (2 days)

SELECT
dbo.fGetWorkingDays('2010-02-05', '2010-02-05') -- same day (1 day)

 
   In case of doubts the dbo.fGetDatesInIntervalOld could be used to check whether the results are correct. For easier checking of the output of the above test queries here’s the calendar covering the 2-3 weeks considered.
Date Calendar

19 February 2010

Visual Studio 2010 CR doesn’t support SSIS Projects

    I’ve installed today Visual Studio 2010 CR (Candidate Release) without any problems - nice look and feel, new types of projects targeting several new Microsoft solutions (SharePoint 2010, MS Office 2010, Windows Azure, ASP.NET MVC 2, Silverlight 3), everything looked nice until I wanted to create a SSIS project – unfortunately no support for SSIS 2005/2008 within VS 2010. As it seems I’m not the only person looking for that functionality, several discussion forums approaching already this topic on MSDN, SQL Server Developer Center or Microsoft Connect.

      The next version of SQL Server is not 2010, as many would expect, but Microsoft is working on SQL Server 2008 R2 avoiding somehow to synchronize the SQL Server Service solutions with VS 2010. Not sure if this strategic decision was taken for the sake of profit, but there is something fishy about it.

      So if I want to buy VS 2010 and SQL Server 2008 then I need to install also VS 2008, having thus 2 IDEs on my system, they might work together without problems though that means more space, more effort on managing the patches for the two IDE and I would expect that there will be synchronization differences between them, migrate forth and back solutions between frameworks, more issues to troubleshoot. Is this what Microsoft understands by “Integration”?! In the end Microsoft looses because I would expect there will be customers that would avoid moving to VS 2010 just because of the lack of support for SSIS 2005/2008. I wonder how much will hurt customers this lack of SSIS solution integration between VS 2010 and SQL Server 2008?! Quoting an unofficial comment made on one of the forums, it seems that “there will be a separate add-on that will give limited functionality” [1] in working with SSIS, though is this the right approach?

    Being used with the DTS packages functionality available on SQL Server 2000, I found it somehow awkward that Microsoft decided to separate SSIS from the SQL Server Management Studio, on one side I understand that the complexity of SSIS projects requires a IDE much like the one provided by Visual Studio, though for the developer used to SQL Server 2000 not sure if this approach was welcome. Of course, data export/import functionality is available using the’ SQL Server Import and Export Wizard’ though it’s not the same thing, and even if I find the ‘Execute Package Utility’ quite easy to use, I can’t say I’m a fan of it. I wonder which Microsoft’s plans are for the future…

    It seems there are even more surprises from Microsoft that could come with SQL Server R2, for example "using SSIS package as a data source for SSRS report datasets" [2], even if the respective feature is a non-production feature, what happens with the customers that already built their solutions on it?! Of course, Microsoft doesn’t recommend the use of non-standard features as they might not be supported on upper versions, but you know developers, why reinvent the wheel when there is already some functionality for that purpose! 

    I would expect that more such issues will be discovered once developers start to play with VS 2010 and the coming SQL Server 2008 R2.

References:
[1] Microsoft Connect. (2010). SSIS VS2010 project type. [Online] Available from: https://connect.microsoft.com/SQLServer/feedback/details/508552/ssis-vs2010-project-type (Accessed: 19 February 2010)
[2] Siddhumehta. (2010). SSIS Package not supported as a data source in SQL Server Reporting Services ( SSRS ) 2008 R2. [Online] Available from: http://siddhumehta.blogspot.com/2010/02/ssis-package-not-supported-as-data.html (Accessed: 19 February 2010)

18 February 2010

Just in CASE – Part V: Dynamic Queries (Update)

    In the previous post on the same topic: Just in CASE - Part V: Dynamic queries I shown how the dynamic queries built inside of a stored procedure could be used in order to handle multiple parameters requests combinations. Such technique could be useful also when dealing with more example queries, for example when needed to show the same data but at different levels of detail, or when needed to join different database objects (tables, views, table-valued functions) based on given criteria.
-- Retrieve aggregated POs at different levels of detail (1 - Vendor, 2 - Product, 3 - Purchase Order)
CREATE PROCEDURE dbo.pGetAggregatedPOs(
@LevelOfDetail smallint = 0)

AS
BEGIN
DECLARE @Query varchar(2000)
DECLARE @GroupBy varchar(200)

-- Building GROUP BY clause
SET @GroupBy = CASE
    WHEN @LevelOfDetail IN (1, 3) THEN ', POD.VendorNumber, POD.Vendor, POD.VendorActiveFlag'

    ELSE ''

END

+ CASE

    WHEN @LevelOfDetail = 2 THEN ', ITM.ProductName, ITM.ProductNumber, ITM.ProductModel'

     ELSE ''

END
+ CASE

    WHEN @LevelOfDetail = 3 THEN ', POD.PurchaseOrder'

    ELSE ''

END

IF LEN(@GroupBy)>0
    SET
@GroupBy = RIGHT(@GroupBy, Len(@GroupBy)-1)

ELSE
    SET
@GroupBy = ' POD.VendorNumber, POD.Vendor, POD.VendorActiveFlag, ITM.ProductName, ITM.ProductNumber, ITM.ProductModel, POD.PurchaseOrder'


SET @Query = 'SELECT '
+ @GroupBy
+ ', SUM(POD.OrderQty) TotalOrderQty'

+ ', SUM(POD.ReceivedQty) TotalReceivedQty'
+ ', SUM(POD.RejectedQty) TotalRejectedQty'

+ ', SUM(POD.OpenQty) TotalOpenQty'

+ ' FROM Purchasing.vPurchaseOrderDetailEx POD'

+ ' JOIN Production.vProducts ITM ON POD.ProductID = ITM.ProductID'

+ ' GROUP BY '

+ @GroupBy
+ ' ORDER BY '

+ @GroupBy
EXEC(@Query)
END
   In the above stored procedures for @LevelOfDetail parameter should be provided 1 in case is needed to aggregated the data at Vendor, 2 for Product level, and 3 at Purchase Order. In case no parameter is provided or it has other value than 1, 2 or 3, the result will be aggregated at Vendor, Product, Purchase Order combined. Here are the test cases for each case:
EXEC dbo.pGetAggregatedPOs 1 -- Vendors test
EXEC
dbo.pGetAggregatedPOs 2 -- Products test

EXEC
dbo.pGetAggregatedPOs 3 -- Purchase Orders test

EXEC
dbo.pGetAggregatedPOs -- no parameters test

Note:
     You should always treat also the case in which the user provides other value for the parameter than the expected list of values.
     The above stored procedure could be modified in order to provide also the current On Hand in case the User has chosen to aggregate the records at Product level, the easiest way to achieve this is to encapsulate the logic in a view or table-valued function and join it to the main query, adding the OnHand in the GROUP BY clause.

    In the previous post I shown also how a dynamic query could be rewritten using specific techniques for dates, single attributes and list of values. Even the respective approach could affect the performance of the query it allows to encapsulate the same logic also in a table-valued function, making possible to reuse the functionality in other queries and even apply other types of constraints not already handled by the parameters in scope.
-- Retrieve Products using table-valued function
CREATE FUNCTION dbo.fSearchProducts(
@ProductID int

,
@ProductSubcategoryID int

,
@ProductModelIDs varchar(500)
,
@ProductName nvarchar(50)
,
@ProductNumber nvarchar(25)
,
@MakeFlag bit

,
@FinishedGoodsFlag bit

,
@StartStandardCost money

,
@EndStandardCost money

,
@StartSellStartDate varchar(10)
,
@EndSellStartDate varchar(10))

RETURNS
TABLE
AS
RETURN
(

SELECT
ProductID, ProductName, ProductNumber, ProductModel, CatalogDescription, ProductSubcategory

,
MakeFlag, FinishedGoodsFlag, Color, SafetyStockLevel, ReorderPoint, StandardCost, ListPrice

,
Size, SizeUnitMeasureCode, SizeUnitMeasure, WeightUnitMeasureCode, WeightUnitMeasure, Weight

,
DaysToManufacture, ProductLine, Class, Style, dbo.GetDateAsString(SellStartDate) SellStartDate

,
dbo.GetDateAsString(SellEndDate) SellEndDate, dbo.GetDateAsString(ModifiedDate) ModifiedDate

FROM
Production.vProducts

WHERE
IsNull(ProductID, -1) = COALESCE(NullIf(@ProductID, 0), ProductID, -1)

AND IsNull(ProductSubcategoryID, -1) = COALESCE(NullIf(@ProductSubcategoryID, 0), ProductSubcategoryID, -1)

AND IsNull(ProductModelID, -1) IN (SELECT value FROM dbo.SplitList(IsNull(NullIf(@ProductModelIDs, ''), CAST(IsNull(ProductModelID, -1) as varchar(20)) ), ','))
AND IsNull(ProductName, ' ') LIKE IsNull(NullIf(@ProductName, ''), IsNull(ProductName, ' '))
AND IsNull(ProductNumber, ' ') LIKE IsNull(NullIf(@ProductNumber, ''), IsNull(ProductNumber, ' '))
AND IsNull(MakeFlag, 0) = COALESCE(NullIf(@MakeFlag, 0), MakeFlag, 0)
AND IsNull(FinishedGoodsFlag, 0) = COALESCE(NullIf(@FinishedGoodsFlag, 0), FinishedGoodsFlag, 0)

AND StandardCost BETWEEN ISNULL(NullIf(@StartStandardCost, 0), StandardCost) AND ISNULL(NullIf(@EndStandardCost, 0), StandardCost)

AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@StartSellStartDate, ''), '01-01-1753')))<=0
AND DateDiff(d, IsNull(SellStartDate, '1753-01-01'), dbo.GetStringDate(IsNull(NullIf(@EndSellStartDate, ''), '31-12-2009')))>=0
)
     Here are the rewritten test cases for the tabled-defined functions, the output being the same:
SELECT *
FROM
dbo.fSearchProducts(0, 0, '', '', '', 0, 0, 0, 0, '', '') -- all records

SELECT
*

FROM
dbo.fSearchProducts(1, 0, '', '', '', 0, 0, 0, 0, '', '') -- ProductID test

SELECT
*

FROM
dbo.fSearchProducts(0, 2, '', '', '', 0, 0, 0, 0, '', '') -- ProductSubcategoryID test

SELECT
*

FROM
dbo.fSearchProducts(0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '', '') -- ProductModelIDs test

SELECT
*

FROM
dbo.fSearchProducts(0, 0, '', 'Mountain%', '', 0, 0, 0, 0, '', '') -- ProductName test

SELECT
*
FROM dbo.fSearchProducts(0, 0, '', '', 'BK-T18U%', 0, 0, 0, 0, '', '') -- ProductNumber test
SELECT
*

FROM
dbo.fSearchProducts(0, 0, '', '', '', 1, 1, 0, 0, '', '') -- MakeFlag & FinishedGoodsFlag test

SELECT
*

FROM
dbo.fSearchProducts(0, 0, '', '', '', 0, 0, 50, 110, '', '') -- StandardCost test

SELECT
*

FROM
dbo.fSearchProducts(0, 0, '', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002') -- SellStartDate test

    Here’s another example in which several other constraints are used with the table-valued function:
SELECT *
FROM
dbo.fSearchProducts(0, 0, '3, 4, 5', '', '', 0, 0, 0, 0, '01/01/2001', '31/12/2002')

WHERE
Color = 'Black'

AND Size BETWEEN '32' AND '44'

ORDER
BY ProductNumber


     In such queries should be targeted to use function’s parameters rather than using the same parameters in the WHERE constraint of a query! There could be also exceptions in which complex constraints based on multiple attributes need to appear in WHERE clause.