About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Tuesday, September 22, 2009

Incremental Update Technique(s)     In Cognos Enterprise Business Intelligence Developer Guides mentioned in the previous post is repeated over and over again in the WHERE clause a set of constraints like in the below example(e.g. Accounts Payable Analysis for Oracle e-Business Suite, p. 21), involving more or less than 3 tables: (A.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND A.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (B.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND B.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS')) OR (C.CREATION_DATE > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND C.CREATION_DATE <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     For 3 tables it's not a lot of work, though the set of constraints becomes more complicated with the increase in the number of tables. It can be simplified by using the LEAST & GREATEST Oracle functions, resulting the equivalent writting: (LEAST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, B.CREATION_DATE, C.CREATION_DATE) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     The second formulation is simpler and should have the same performance as the first formulation, however it has a small impediment – the LEFT JOINs and LEAST/GREATEST behaviour when working with NULL values. If one of the parameters provided to the two functions is NULL then the result will be NULL, thing which can mess up a query. That’s not difficult to avoid by using for example the NVL function. Supposing that there is a LEFT JOIN between A and two other tables B & C, the above constraints can be rewritten as: (LEAST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) > TO_DATE(:FROM_DATETIME,'YYYYMMDDHH24MISS') AND GREATEST(A.CREATION_DATE, NVL(B.CREATION_DATE, A.CREATION_DATE), NVL(C.CREATION_DATE, A.CREATION_DATE)) <= TO_DATE(':TO_DATETIME','YYYYMMDDHH24MISS'))     One might arrive to write such constraints for example when attempting to do an incremental update in a data warehouse based on the Creation Date from multiple source tables, loading the records having a Creation Date greater than the date of the previous update (of course the records existing already in data warehouse needs to be deleted or updated). The Creation Date works only for transactional data which don’t change after their creation, for the other types of cases it should be considered the Last Update Date (Date of the Last Update for a record). Also this approach has its flows, working well when records are not deleted from the table; this issue could be fixed for example by creating a trigger on deletion and save the ID of the record deleted in a third party table or actually delete the record from the third party database when the deletion happens in the source table.
Cognos Enterprise Business Intelligence for Oracle APPS     Recently I discovered on IBM site several Developer Guides for Cognos Enterprise Business Intelligence related to the data warehouse based on Oracle APPS schema: Accounts Payable Analysis for Oracle e-Business Suite Accounts Receivable Analysis for Oracle e-Business Suite General Ledger Analysis for Oracle e-Business Suite Product Analysis for Oracle e-Business Suite Inventory Analysis for Oracle e-Business Suite Procurement Analysis for Oracle e-Business Suite Sales Analysis for Oracle e-Business Suite     I scanned roughly the documents, looking at the queries based on Oracle APPS, the documents confirming to some degree the logic I used in my queries, thing I found comforting. In addition it was interesting to see the fact and dimension tables created on top of Oracle APPS. From what I saw they cover only a small part of the available attributes, having the same issues as most of the Vendor-based data warehouses (e.g. JAROS). Anyway, they are a good starting point for the ones who want to built an in-house data warehouse on Oracle APPS schema.     Cognos Enterprise Business Intelligence comes also in SAP and J.D. Edwards flavours.

Learning about Oracle APPS internals - Part II

    In a previous post I mentioned the FND_TABLES & FND _VIEWS as sources that can be used to find more about Oracle APPS internals. The two tables are pretty useful though they don't contain all the tables/views created in Oracle unless they were registered in them. So some of the objects developed for example during the various customization or integration phases might not be found in them, therefore it makes sense to check also ALL_ALL_TABLES & ALL_VIEWS tables, providing all the tables available for a certain schema.

SELECT *
FROM ALL_ALL_TABLES
WHERE OWNER = :OWNER

SELECT *
FROM ALL_VIEWS
WHERE OWNER = :OWNER

    Getting the differences between the two sources (I hope nobody is upset
I haven't used EXISTS instead of a LEFT JOIN) it’s pretty easy:

SELECT A.*
FROM ALL_ALL_TABLES A
    LEFT JOIN APPS.FND_TABLES B
        ON A.TABLE_NAME = B.TABLE_NAME
WHERE A.OWNER = :OWNER
AND B.TABLE_NAME IS NULL


    If you are interested on Oracle internals you can find more goodies by querying the Oracle dictionary table:

SELECT *
FROM DICT
WHERE TABLE_NAME LIKE '%VIEW%'

    One of such goodies is the view that stores the list of Materialized views:

SELECT *
FROM DBA_MVIEWS

    PS: I know that these are pretty basic stuff for APPS DBAs though not all of us discovered them from beginning.

Sunday, September 13, 2009

Randomized data selection     A few years back I was asked to provide a random set of data from a data collection. Dealing only with a small number of records and as they were in Excel, it was much easier to use the Excel’s RAND function. Even if SQL Server provides a RAND function too, it returns the same value within a data set when the same parameter is provided. In order to avoid this behavior it would be useful if could be provided a distinct parameter value for each row, for example using the data set's integer unique identifier or eventually create one, thus if ID is such a column the query would look something like that: SELECT TOP 10 RAND(ID) PERC, ID, N1, N2 FROM ORDER BY RAND(ID)     The fact that RAND will return always the same value for the same parameter value might be a problem because the above query will always return the same records. In exchange you can use a multiplier for the parameter provided to RAND function - for example in a first attempt you can use RAND(id), in a second RAND(id*2), in a third RAND(id*3), etc.     This approach offers unfortunately a relatively "weak" randomization which might be not sufficient for repetitive selections, therefore under SQL Server 2005+ could be used instead a CLR-based UDF (user defined function) and use the random functions available in VB, C# or any other CLR programming language, or even the .Net Random class designed to produce a sequence of numbers that meet certain statistical requirements for randomness.

Saturday, September 12, 2009

The Number of Records

    Getting the number of records from a table is straightforward, a simple select doing the job:

SELECT count(*) NumberRecords
FROM <your_table>

Actually it’s advisable to use count(1) instead of count(*):

SELECT count(1) NumberRecords
FROM <your_table>

    If you want to implement the functionality in a function or stored procedure and provide the name of the table as parameter, then things get a little more complicated as the query needs to be built dynamically on runtime. In stored procedures a query built dynamically can be usually executed using EXEC or EXECUTE stored procedures, though none of them will do as we need to reuse the actual value returned by the query. This can be achieved with the help of a third stored procedure called sp_executesql, and the work can be done relatively easily with the below stored procedure:
    The overhead is relatively minimum but not necessarily elegant, a nicer approach being the use of functions for the same purpose. Unfortunately UDFs have the restriction that they can’t execute stored procedures, and thus the above technique can’t be used. There is actually an alternative for executing pass-through queries in functions by using OPENQUERY, OPENROWSET or OPENDATASOURCE functions, allowing thus to execute DML statements against a given linked server, as is the case of first two functions, respectively using ad-hoc connection information as is the case of the third function. For exemplification I chose OPENQUERY function, this being the code created to return the number of records from a specified table:
    For this piece of code to work is first necessary to refer the server as a linked server:

EXEC sp_serveroption <server_name>, 'Data Access', true

    Withought doing this first you’ll get the error:
Msg 7411, Level 16, State 1, Procedure GetNumberRecords, Line 4 Server '>server_name<' is not configured for DATA ACCESS.


    The above function doesn’t help too much, it would be helpful if we could provide the table’s name as parameter and store the query string in a variable, something like:
Unfortunately OPENQUERY function same as OPENROWSET and OPENDATASOURCE are expecting a constant parameter as query and therefore the error:

Msg 102, Level 15, State 1, Procedure GetNumberRecords, Line 12 Incorrect syntax near '@sql'. Expecting STRING or TEXT_LEX.

    I wonder why the SQL Server team hasn’t allowed the use of variables as parameters for OPENQUERY functions, if not in the previous SQL Server versions at least in SQL Server 2008! Anyway, another impediment would be raised by the deterministic nature of functions.
GetDate function used in functions SQL Server 2000/2005     In an answer on LinkedIn one of the users made me attentive that in SQL Server 2005 and later versions the GetDate and other non-deterministic UDFs can be used inside functions, thing not possible under SQL Server 2000. This lack of functionality, cumbersome especially when needed to use current date in table-valued functions, was requesting the date to be sent as parameter.     For exemplification the following piece of code would run successfully on SQL Server 2005 but fail on SQL Server 2000, returning the below error message. CREATE FUNCTION dbo.GetCurrentDate() RETURNS smalldatetime AS BEGIN RETURN (GetDate()) END GO SELECT dbo.GetCurrentDate() Error message: Server: Msg 443, Level 16, State 1, Procedure GetCurrentDate, Line 6 Invalid use of 'getdate' within a function.