22 September 2009

SQL Reloaded: Incremental Update Technique(s)

In Cognos Enterprise Business Intelligence Developer Guides mentioned in the previous post is repeated over and over again in the WHERE clause a set of constraints like in the below example(e.g. Accounts Payable Analysis for Oracle e-Business Suite, p. 21), involving more or less than 3 tables:
 (A.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND A.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (B.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND B.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (C.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND C.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     For 3 tables it's not a lot of work, though the set of constraints becomes more complicated with the increase in the number of tables. It can be simplified by using the LEAST & GREATEST Oracle functions, resulting the equivalent writting: (LEAST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     The second formulation is simpler and should have the same performance as the first formulation, however it has a small impediment – the LEFT JOINs and LEAST/GREATEST behaviour when working with NULL values. If one of the parameters provided to the two functions is NULL then the result will be NULL, thing which can mess up a query. That’s not difficult to avoid by using for example the NVL function. Supposing that there is a LEFT JOIN between A and two other tables B & C, the above constraints can be rewritten as: (LEAST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     One might arrive to write such constraints for example when attempting to do an incremental update in a data warehouse based on the Creation Date from multiple source tables, loading the records having a Creation Date greater than the date of the previous update (of course the records existing already in data warehouse needs to be deleted or updated). The Creation Date works only for transactional data which don’t change after their creation, for the other types of cases it should be considered the Last Update Date (Date of the Last Update for a record). Also this approach has its flows, working well when records are not deleted from the table; this issue could be fixed for example by creating a trigger on deletion and save the ID of the record deleted in a third party table or actually delete the record from the third party database when the deletion happens in the source table.

ERP Systems: Learning about Oracle APPS internals II

    In a previous post I mentioned the FND_TABLES & FND _VIEWS as sources that can be used to find more about Oracle APPS internals. The two tables are pretty useful though they don't contain all the tables/views created in Oracle unless they were registered in them. So some of the objects developed for example during the various customization or integration phases might not be found in them, therefore it makes sense to check also ALL_ALL_TABLES & ALL_VIEWS tables, providing all the tables available for a certain schema.

SELECT *
FROM ALL_ALL_TABLES
WHERE OWNER = :OWNER

SELECT *
FROM ALL_VIEWS
WHERE OWNER = :OWNER

    Getting the differences between the two sources it’s pretty easy:

SELECT A.*
FROM ALL_ALL_TABLES A
    LEFT JOIN APPS.FND_TABLES B
        ON A.TABLE_NAME = B.TABLE_NAME
WHERE A.OWNER = :OWNER
AND B.TABLE_NAME IS NULL

    If you are interested on Oracle internals you can find more goodies by querying the Oracle dictionary table:

SELECT *
FROM DICT
WHERE TABLE_NAME LIKE '%VIEW%'

    One of such goodies is the view that stores the list of Materialized views:

SELECT *
FROM DBA_MVIEWS

    PS: I know that these are pretty basic stuff for APPS DBAs though not all of us discovered them from beginning.

13 September 2009

SQL Reloaded: Randomized Data Selection

 A few years back I was asked to provide a random set of data from a data collection. Dealing only with a small number of records and as they were in Excel, it was much easier to use the Excel’s RAND function. Even if SQL Server provides a RAND function too, it returns the same value within a data set when the same parameter is provided. In order to avoid this behavior it would be useful if could be provided a distinct parameter value for each row, for example using the data set's integer unique identifier or eventually create one, thus if ID is such a column the query would look something like that: 

SELECT TOP 10 RAND(ID) PERC, ID, N1, N2 
FROM 
ORDER BY RAND(ID)     

The fact that RAND will return always the same value for the same parameter value might be a problem because the above query will always return the same records. In exchange you can use a multiplier for the parameter provided to RAND function - for example in a first attempt you can use RAND(id), in a second RAND(id*2), in a third RAND(id*3), etc.     

This approach offers unfortunately a relatively "weak" randomization which might be not sufficient for repetitive selections, therefore under SQL Server 2005+ could be used instead a CLR-based UDF (user defined function) and use the random functions available in VB, C# or any other CLR programming language, or even the .Net Random class designed to produce a sequence of numbers that meet certain statistical requirements for randomness.

12 September 2009

SQL Reloaded: Number of Records I (via COUNT)

Getting the number of records from a table is straightforward, a simple SELECT doing the job:

-- general syntax 
SELECT count(*) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(*) NoRecords
FROM [Person].[Address]

Some prefer using a count(1) instead of a count(*), though it doesn't make a difference considering how SQL Server handles the two scripts in the background.

-- general syntax 
SELECT count(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT count(1) NoRecords
FROM [Person].[Address]

Same can be achieved by using a SUM:

-- general syntax 
SELECT SUM(1) NoRecords
FROM <schema>.<table>

-- example 
SELECT SUM(1) NoRecords
FROM [Person].[Address]

However the SUM is typically used when counting only certain records within the same line (the attributes must be know however beforehand):

-- counting the occurrences for a certain values via SUM
SELECT SUM(CASE WHEN Class IS NULL THEN 1 ELSE 0 END) [Nulls]
, SUM(CASE WHEN Class = 'H' THEN 1 ELSE 0 END) [H]
, SUM(CASE WHEN Class = 'L' THEN 1 ELSE 0 END) [L]
, SUM(CASE WHEN Class = 'M' THEN 1 ELSE 0 END) [M]
, count(*) NoRecords
FROM [Production].[Product]

The sane output can be used via a GROUP BY based on the grouping attributes:

-- counting the occurrences for a certain attribute via COUNT
SELECT Class
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class

-- counting the occurrences for a group of attributes via COUNT
SELECT Class
, Color
, count(*) NoRecords
FROM [Production].[Product]
GROUP BY Class
, Color

If you want to implement the functionality in a function or stored procedure and provide the name of the table as parameter, then things get a little more complicated as the query needs to be built dynamically on runtime. In stored procedures a query built dynamically can be usually executed using EXEC or EXECUTE stored procedures, though none of them will do as we need to reuse the actual value returned by the query. This can be achieved with the help of a third stored procedure called sp_executesql, and the work can be done relatively easily with the below stored procedure:

-- dropping the stored procedure
--DROP PROCEDURE IF EXISTS dbo.pGetNumberRecords

-- creating the stored procedure
CREATE PROCEDURE dbo.pGetNumberRecords(
  @Table varchar(100) 
, @NumberRecords int OUTPUT 
)
AS 
DECLARE @SglQuery nvarchar(250) 

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @SglQuery = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @SglQuery
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

-- testing the stored procedure 
DECLARE @NumberRecords int 
EXEC dbo.pGetNumberRecords '[Production].[Product]', @NumberRecords OUTPUT 
SELECT @NumberRecords 

The overhead of using a stored procedure for getting the records count is bearable though the solution is not necessarily elegant, a nicer approach being the use of functions for the same purpose:

-- dropping the function
--DROP FUNCTION IF EXISTS dbo.GetNumberRecords

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@table as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
DECLARE @sql varchar(200) 
DECLARE @NumberRecords int

IF IsNull(@table,'')<>'' 
BEGIN 
  SET @sql = N'SELECT @NumberRecords = count(*) FROM ' + @table 

  --get the number of records 
  EXEC sp_executesql @Query = @sql
                 , @params = N'@NumberRecords INT OUTPUT' 
                 , @NumberRecords = @NumberRecords OUTPUT 
END 

RETURN @NumberRecords
END

-- testing the function
SELECT dbo.GetNumberRecords('[AdventureWorks2014].[Production].[Product]')

Unfortunately UDFs have the restriction that they can’t execute stored procedures, and thus the above technique can’t be used. Even if the function is created, when attempting to run it, the following error occurs:
Msg 557, Level 16, State 2, Line 185
Only functions and some extended stored procedures can be executed from within a function.

There is actually an alternative for executing pass-through queries in functions by using OPENQUERY, OPENROWSET or OPENDATASOURCE functions, allowing thus to execute DML statements against a given linked server, as is the case of first two functions, respectively using ad-hoc connection information as is the case of the third function. For exemplification I chose OPENQUERY function, this being the code created to return the number of records from a specified table (you'll need to replace with your server, see below):

--dropping the test function 
-- DROP FUNCTION IF EXISTS dbo.GetNumberRecordsTest

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecordsTest() 
RETURNS int 
AS 
BEGIN 
  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( <server_name>, 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM [AdventureWorks2014].[Production].[Product]') AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecordsTest()

 For this piece of code to work is first necessary to refer the server as a linked server:

EXEC sp_serveroption <server_name>, 'Data Access', true

Withought doing this first you’ll get the error:
Msg 7411, Level 16, State 1, Procedure GetNumberRecords, Line 4 Server '>server_name<' is not configured for DATA ACCESS.

You can use sp_helpserver to get server's name:
-- getting server's name
EXEC sp_helpserver 

The above function doesn’t help much. It would be helpful if we could provide the table’s name as parameter and store the query string in a variable, something like:

-- creating the test function
CREATE FUNCTION dbo.GetNumberRecords(
@TableName as nvarchar(100)) 
RETURNS int 
AS 
BEGIN 
  DECLARE @sql varchar(200) 
  SET @sql = 'SET NOCOUNT ON; SELECT count(*) NoRecords FROM ' + @TableName

  RETURN ( 
     SELECT NoRecords 
  FROM OPENQUERY( , @sql) AS O) 
END 

-- retrieving the number of records
SELECT dbo.GetNumberRecords()

Unfortunately OPENQUERY function same as OPENROWSET and OPENDATASOURCE expect a constant parameter as query and therefore the error (as per SQL Server 2005):

Msg 102, Level 15, State 1, Procedure GetNumberRecords, Line 12 Incorrect syntax near '@sql'. Expecting STRING or TEXT_LEX.

I wonder why the SQL Server team hasn’t allowed the use of variables as parameters for OPENQUERY functions, if not in the previous SQL Server versions at least in SQL Server 2008! Anyway, another impediment would be raised by the deterministic nature of functions.

SQL Reloaded: Deterministic Functions (GetDate SQL Server 2000/2005)

In an answer on LinkedIn, one of the users made me attentive that in SQL Server 2005 and later versions the GetDate() and other non-deterministic UDFs (user-defined functions) can be used inside functions and views, thing which was not possible under SQL Server 2000. This lack of functionality involved considerable overhead when was needed to use current date in table-valued functions. The solution was to send the date as parameter to the respective objects.     

For exemplification the following piece of code would run successfully on SQL Server 2005 but fail on SQL Server 2000, returning the below error message. 

-- creating the test function
CREATE FUNCTION dbo.fGetCurrentDate() 
RETURNS smalldatetime AS 
BEGIN 
    RETURN GetDate()
END

-- testing the function
SELECT dbo.fGetCurrentDate() 

Error message:
Server: Msg 443, Level 16, State 1, Procedure GetCurrentDate, Line 6 Invalid use of 'getdate' within a function.

A similar error message resulted when using the GetDate() function inside of a view:

-- creating the test view
CREATE VIEW dbo.vGetCurrentDate
AS
SELECT GetDate() AS CurrentDate

-- testing the view
SELECT * 
FROM dbo.vGetCurrentDate

The alternative was to use an UDF instead, which functions like a parameterized view, something like:

-- creating the test UDF
CREATE FUNCTION dbo.udfFilterByDate(
 @date datetime)
 RETURNS TABLE
 AS 
 RETURN(
 SELECT * 
 FROM [AdventureWorks2014].Production.Product
 WHERE [SellStartDate]>=@Date)

 -- testing the UDF
 SELECT *
 FROM dbo.udfFilterByDate(GetDate())

 -- testing the UDF (alternative)
 DECLARE @date as datetime
 SET @date = GetDate()
 SELECT *
 FROM dbo.udfFilterByDate(@date)

Using an UDF can still be necessary when is needed to pass dates and other types of parameters used inside of a correlated query or within the logic. 

Related Posts Plugin for WordPress, Blogger...

About Me

My photo
IT Professional with more than 24 years experience in IT in the area of full life-cycle of Web/Desktop/Database Applications Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP implementations & support, Team/Project/IT Management, etc.