29 June 2010

Number of Records – The CLR Version

    While scanning rapidly the daily posts from MSDN blogs, I stopped on Chris Skorlinski’s “Example code executing TSQL for each Table in a database” post, wondering then on whether he’s using other techniques than I used in the past. I run into this type of need several times, especially in cases in which I wanted to report in an automated manner the number of records for a given set of tables, typically during ETL tasks. The problem could be easily solved with the help of a cursor, though it’s not the most elegant technique, even if it solves the problem fairly easy. I remember how I tried to find a way to execute a dynamic statement using a dynamic script inside of a  UDF (user-defined function), though that’s not possible because a stored procedure can’t be executed inside of a function. An alternative would be to use the OPENROWSET function and execute a statement using a linked server, though the OPENROWSET function doesn’t work with dynamically created statements.

    Five years ago, soon after SQL Server 2005 RTM appeared on the market, I tried to create a UDF programmatically using an example from Bob Beauchemin & co.’s “A First Look at SQL Server 2005 for Developers” book. The example worked then, but not without some inherent headaches, having to spent some time on troubleshooting the received errors, and if I remember correctly they were caused by a change in the way the connection was created from the SQL Server SQLClient. Anyway, that’s already history, though since then I was wondering on whether is possible to use a programmatically created UDF to return the number of records for a given table sent as parameter. Since then I haven’t had the chance to test that, even if it seemed logically possible, but seeing Chris Skorlinski’s post and as last week I installed Visual Studio 2010 on my home computer, I said to myself that that’s a nice way to see the tool at work. Thought and done, but again not without problems, running in several small but time consuming issues.

    So armed with some patience, I did a quick search on Microsoft site on how to use CLR scalar-valued functions under SQL Server 2008, running into this MSDN resource, which actually returns the record count using a static query. So I opened Visual Studio 2010 and created a new Database/SQL Server/Visual Basic SQL CLR Database Project named SQLServerProject1 (I forgot to give the project a meaningful name) and added a database reference to the AdventureWorks database using Windows Authentication.

CLR UDF - Database Reference

    Then in the new created project I’ve added a UDF to the project (Add/New Item/User Defined Function), naming the new created class as CLRLibrary, I replaced the existing function with the one from the MSDN article, added a new String input parameter called TableName, modified the query and changed UDF’s name to NumberRecords as per below piece of code:
Imports System
Imports
System.Data

Imports
System.Data.SqlClient

Imports
System.Data.SqlTypes

Imports
Microsoft.SqlServer.Server

Imports
System.Runtime.InteropServices 'added manually


Partial Public Class UserDefinedFunctions
SqlFunction(DataAccess:=DataAccessKind.Read)> _
Public Shared Function NumberRecords(ByVal TableName As String) As Integer

Using conn As New SqlConnection("context connection=true")
conn.Open()
Dim cmd As New SqlCommand("SELECT COUNT(1) AS [NumberRecords] FROM " & TableName, conn)
Return CType(cmd.ExecuteScalar(), Integer)
End Using

End Function
End
Class


    In addition, I had to add also a reference to System.Runtime.InteropServices, and now the function was ready for testing so I built the solution; then I copied CLRLibrary’s Full Path in SQL Server Management Studio and attempted to follow the steps described in the MSDN article. Now it’s time for some troubleshooting fun… First I enabled the CLR integration configuration option by running the below script:
--enable CLR integration
use master
go
sp_configure
'clr enabled', 1

go
reconfigure
go

    Using the master database I attempted to create a reference to the assembly, using the Full Path and the dll name.
CREATE ASSEMBLY SqlServerProject1
FROM 'D:\<give the full path here>\SqlServerProject1.dll'

WITH
PERMISSION_SET = EXTERNAL_ACCESS


   The next step was to declare the function:
CREATE FUNCTION dbo.NumberRecords(@table_name nvarchar(100))
RETURNS int
AS

EXTERNAL
NAME SqlServerProject1.[SqlServerProject1.UserDefinedFunctions].NumberRecords

GO


    Actually is the final form of the DDL script, because initially I used only a combination of the Project, class and function name (e.g. SqlServerProject1.UserDefinedFunctions.NumberRecords), receiving the following error message:
Msg 6505, Level 16, State 2, Procedure NumberRecords, Line 1
Could not find Type 'UserDefinedFunctions' in assembly 'SqlServerProject1'.

    After a quick search on Google I found out from Eggheadcafe forum that the Namespace was missing from the declaration, and once I made this change the above statement worked without problems. The next step was naturally to test on whether the function is working:
SELECT dbo.NumberRecords('AdventureWorks.HumanResources.Department')
   And it worked! So it’s time to test it against a set of tables from AdventureWorks database:
SELECT s.name [schema_name]
,
T.name table_Name

,
dbo.NumberRecords('AdventureWorks.' + s.name + '.' + t.name) NumberRecords

FROM
AdventureWorks.sys.tables t
      JOIN AdventureWorks.sys.schemas s
        ON t.schema_id = S.schema_id

ORDER BY s.name
,
T.name


  Actually I missed some important thing from the above steps, somewhere while attempting to test the function or to register the assembly, I received the following error message:
A .NET Framework error occurred during execution of user-defined routine or aggregate "NumberRecords": System.InvalidOperationException: Data access is not allowed in this context.  Either the context is a function or method not marked with DataAccessKind.Read or SystemDataAccessKind.Read, is a callback to obtain data from FillRow method of a Table Valued Function, or is a UDT validation method.

    A post on MSDN SQL Server forum remembered me that I forgot to check on whether I’m using the 3.5 or 4.0 .Net Framework. So in the SQLServerProject1 project, from main menu/Project/SQLServerProject1 Properties/Compile/Advice Compile Options/Target Framework, I changed the value to .Net Framework 3.5. This change requires the project to be closed and reopened.

    In between the above steps I had also to check on whether the assembly was registered using the below script:
--assembly file details
SELECT
A.name [Assembly]

,
AF.name File_Path

FROM
sys.assemblies A
     JOIN sys.assembly_files AF
        ON A.assembly_id = AF.assembly_id

WHERE
A.is_user_defined=1
--AND A.name='SqlServerProject1'

    Also, if you’d like to remove the assembly from the database, you’ll have first to drop the function, and only then drop the assembly:
DROP FUNCTION NumberRecords
DROP
ASSEMBLY SqlServerProject1


    It can be further discussed on whether this approach is acceptable as performance or not, on whether it can be misused, etc. Actually a nice misuse is the following statement which passes a WHERE constraint with the table name.
SELECT dbo.NumberRecords('AdventureWorks.HumanResources.Department WHERE Name LIKE ''P%''')

    It could be created a second parameter for passing the WHERE constraint or, why not, pass the whole query as parameter, as long only a scalar is returned. I hope I haven’t forgot any step, I’m kind of mixing the facts because I attempted also to create the same function using a new created login, running into other type of issues that maybe deserve a second post.

27 June 2010

What’s New #2

Microsoft Office - Cloud Computing is the Word

    Two weeks ago, on 15th of June 2010, Microsoft Office was shipped together with Visio and Project 2010, closing the cycle of releases started with SQL Server 2008 R2, Visual Studio 2010, Sharepoint 2010 (all 3 shipped in April 2010) and Windows Azure (available also in April). The words that describe/unite at best these software tools is cloud computing and collaboration, why that? First we have to consider Azure, the new product from Windows’ portfolio, a framework for cloud computing and SaaS (Software as a Service) architectures, and composed of 3 components, namely Windows Azure which allows running applications and accessing data in the cloud, SQL Azure Database  provides data services in the cloud, while Windows Azure platform AppFabric allows the communication between the applications residing in the cloud. Also MS Office 2010 is part of Microsoft’s strategy toward cloud computing, the weight falling on SharePoint 2010, a business collaboration platform that together with the other MS Office tools allow to manage information, automate and manage business processes, facilitate decision making process, etc. A cornerstone of the framework is the co-authoring tool that “allows multiple people to work on a single copy of a document at the same time or at different times, seamlessly, whether they are online or offline”. As it seems are provided also “community features that allows users to share data as they do on Twitter and Facebook”, a step toward social computing. Microsoft plans to offer an online version of Office 2010, called Office Web Apps (OWA), supposed to be also a competitor for Google Docs.

    There are also people who question the steps done by Microsoft toward cloud computing, but in the end is important to establish the software infrastructure in which cloud computing-based applications could be developed, futures that don’t exist currently could appear in future versions or could be provided by third-party vendors.

    Microsoft comes also with some unpleasant surprises, as it seems Microsoft’s SharePoint Server runs only on 64-bit hardware and requires also a 64 bit SQL Server edition, and this could be quite an important constraint for many customers. The most unpleasant surprise is that Microsoft renounces to the well-known upgrade schema, the reason for that, as mentioned in Ars Technica quoting a Microsoft spokesman, from the need to simplify the product lineup and pricing, based on “partner and customer feedback” (I’m sorry but I can’t really buy that!). The same source expects that upgrades will be available with promotions, after Office’s launch. The only promotion I heard of is Microsoft Office 2010 Technology Guarantee program but if refers only to the customers who “purchased, installed, and activated a qualifying Microsoft Office 2007 product between March 5, 2010, and September 30, 2010”, they being eligible to download Office 2010 at no additional cost. How about the ones who bought a Microsoft Office 2007 copy in 2010 but before 5th of March (like I did)?!

Microsoft TechEd North America Sessions are Online

    The Microsoft TechEd North America sessions held in New Orleans were made available online (video and slides), an opportunity for technical professionals to get an overview on the new advancements in Microsoft technologies, being approached topics related to the various platforms of Windows, MS Office, Dynamics, Web, Cloud Computing & Online Services, etc. I really like the way Microsoft makes its technologies available to the public, especially the fact that it provides also Express versions of their software, allowing newbies and developers to get acquainted and use essential basic functionality. The MSDN, TechNet, webcasts, Channel9  and community and personal blogs bring the technical and non-technical closer to the company and its technologies.

25 June 2010

Trivial Equalities in Queries

    If I remember correctly from the Math literature, equalities of the type 0=0, 1=1 or more general n=n could be referred also as trivial equalities. I tried to find, without success though, an exact definition of what is intended by trivial equality, the closest I could get is the Wikipedia’s content on the use of trivial adjective in Mathematics for objects that have a simple structure. As per my perception, not sure if I’m correct, a trivial equality is an equality in which one of the members is a constant, the same definition holding also for trivial inequalities.

    Now, what have trivial equalities like 1=1 have to do with queries?! As 1=1 always equates to true, same as 1=0 equates to false, could be used in certain scenarios as a simple technique to return all the records, respectively no records from a database. For example given the fact that 0=1 equates to false I used such a constraint under ADO in queries like the below one in order to retrieve metadata from a database.
SELECT *
FROM
Production.Product

WHERE
1=0


    In the same way 1=1 could be used to retrieve all the records from a table:
SELECT *
FROM
Production.Product

WHERE
1=1


    Now if in above query we add different constraints the 1=1 allows to comment/uncomment the constraints as we wish with a minimal of changes.
SELECT *
FROM
Production.Product

WHERE
1=1

AND
Color = 'Black'

--AND SafetyStockLevel>800

AND
StandardCost>300


    This makes it quite useful when creating dynamic queries (see the first constraint from the stored procedure given as example in Just In Case – Part IV: Dynamic Queries post).  I actually have seen quite often this type of writing a query, especially between Oracle developers. Long time ago I asked somebody what’s the consideration behind its use, excepting the fact that it allows to add or remove (comment) constraints. The answer was quite fuzzy, being mentioned a possible improvement in query’s performance. Some time ago I remember somebody was mentioning that in older SQL Server database engines such a constraint could lead to an unexpected query plan, and thus poorer performance. Normally the database engine should recognize such statements as meaningless, and ignore them, however from theory to implementation is a long way and anything is possible. An example in this direction could be found in The real cost of performance, the post and the comments that followed revealing the various facets of using equality sign in programming languages like C# or VB. SQL shouldn’t be so complex, as it doesn’t typically used in the work with objects.

    I used such trivial equalities also in cartesian JOINs as an artifice in order to use ANSI syntax in queries, here’s an example using the SplitList table-valued function introduced long time ago:

SELECT *
FROM dbo.SplitList('1,2,3,4', ',') A
       JOIN dbo.SplitList('A,B,C,D', ',') B
          ON 1=1


    Given the above definition for trivial equalities, in theory all constraints in which one of the members is a hard-coded could be regarded as trivial. Actually all numeric data type constraints could be written as trivial equalities by moving all attributes in one of the members, the other being a constant, for example:
SELECT P.Name Product
,
PM.Name ProductModel

FROM
Production.Product P
        JOIN Production.ProductModel PM
            ON P.ProductModelID - PM.ProductModelID = 0


    Even if such a statement is logically correct, it should be avoided when possible, given the fact that it decreases considerably the performance of the query (just compare the query plan between the above query and the typical join), indexes, if exist, not being used efficiently.

20 June 2010

SSIS and Oracle

    Making SSIS work with Oracle doesn’t seem to be a complicated task especially after several years of experience in doing that, though, as usual, something new appears in the landscape – new software versions, new requirements, an environment with its own particularities, etc. In general, when an application needs a connection to an Oracle server, is needed to install on the client computer several components that come with Oracle Client, add/configure the TNS (Transparent Network Substrate) name, eventually set-up some global variables in case the components were not installed in the default location, and depending on the chosen provider might be needed to configure also a DSN (Data Source Name) pointing to the Oracle server. SSIS makes actually no exception from these steps, once the steps performed you should be in theory ready to develop, test, deploy, schedule and run packages using a connection to an Oracle server.

    Unfortunately, from my experience, every 2-3 installations, there is a problem with the Oracle Client and its configuration, most of the times the solution being quite simple – removing some declarations from the set-up files, correcting the global variables or the TNS name. As it seems the Oracle Client is quite sensible to the changes in the default installation path, therefore is indicated to install the Client using the default path unless you’d like to gain more experience in troubleshooting such installation issues. In addition, as any respectable company, some of the products come with their own defects, the patching of such issues being not so easy, therefore it’s advisable to check beforehand the known issues coming with the Client version you’d need to install, preferably in case you need to take advantage of the latest Oracle features, it makes sense to install the latest stable Client.  

    Because the Oracle Client downloadable package has a few hundred MB, there is a thinner Oracle alternative to Oracle Client, namely Instant Client package, the components could be downloaded from Oracle site (here) and installed individually, as a minimum being necessary to install Instant Client Package – Basic or Basic Lite versions, the ODBC libraries and the SQL *Plus libraries in case you want to test the connection. Of course, after case could be installed also the JDBC, SDK or any other packages Oracle made available.

    On 64 bit platforms might be needed to install in parallel the 32 and 64 bit Oracle/Instant Clients (see SSIS, Oracle and X64 post from Business Vision DEV Team), while in order to troubleshoot the various issues could be a good idea to check the differences between 32 bit and 64 bit registry (here). 

    There are several drivers that allows you to connect to an Oracle database using SSIS, the most popular ones:
- Microsoft OLE BD Provider for Oracle
- Oracle Provider for OLE DB
- .Net Framework Data Provider for Oracle
- .Net Framework Data Provider for Odbc
- Oracle Data Provider for .NET

    The Microsoft drivers come with MDAC (Microsoft Data Access Components), its latest version being MDAC 2.8. Starting with Windows Vista and Windows Server 2008, Microsoft changed MDAC into WDAC (Windows Data Access Components), including it as part of the operating system, removing thus the need to redistribute the components. See Data Access Technologies Road Map, FAQ and Troubleshooting MDAC/WDAC for MDAC/WDAC architecture, components and releases, respectively troubleshooting. Given the various issues that exist with a particular MDAC/WDAC library, see in KB301202 how you could check the current version by using the registers, while for  Windows version up to Windows Server 2003 could be used the Component Checker MDAC Utility.

    If .Net Framework Data Provider for Oracle, respectively for Odbc come with the .Net framework, in exchange, Oracle Data Provider for .NET, see also FAQ is Oracle’s implementation for ADO.NET data access,  supposed to take advantage of advanced Oracle database functionality, it comes with ODAC (Oracle Data Access Components). Please note that you might need in install the respective components in addition to the Oracle Client.

    There are also third party drivers for Oracle or even particularly for SSIS, for example Microsoft Connectors Version 1.1 for Oracle and Teradata for use with SSIS from Attunity; see also a few tips from SQL Server Performance blog.

    Before taking any decision on which driver to use, it might be a good idea to look also at drivers’ limitations and advantages. In the past I often used the Microsoft Oracle ODBC Driver until I run into in an important limitation, namely its inexistent support for unicodes, this residing, according to KB244661, in the fact that “from Microsoft Data Access Components (MDAC) version 2.5 and later versions, both the Microsoft ODBC Driver and OLE DB Provider support ONLY Oracle 7 and Oracle 8i”. Also Oracle Provider for OLE DB seems to have its own limitations.

    SQL Server provides linked servers, the powerful functionality of executing commands against OLE DB data sources on remote servers, including Oracle. As linked servers offer the ability to create cross-vendor distributed queries, in certain scenarios they could prove to be a powerful alternative of querying Oracle databases, in such cases not being needed to create in SSIS package an additional connection to Oracle. KB280106 and an article on Oracle Provider for OLE DB describe how to set up and troubleshoot a linked server to an Oracle database in SQL Server.

    You might want to check also the SQL Server Integration Services with Oracle Database 10 White Paper coming from Microsoft. “Connectivity and SQL Server 2005 Integration Services”  MSDN article written by Bob Beauchemin, Scott Barrett’s blog or this SSIS 64 bit – Using Oracle Provider post.

Troubleshooting – Part III: Troubleshooting query issues: Logical errors

    In last post I provided some general guidelines on how to troubleshoot query issues related to errors thrown by the database engine. That’s a fortunate case, because engine’s validation shows that something went wrong. I’m saying fortunate because there are also cases in which there are also logical errors that result in unexpected output, and the errors could remain even for years undiscovered. It’s true that such errors are hard to discover, but not impossible; adequate testing combined with defensive programming techniques could allow decreasing the volume of such errors. The simplest and handiest test could be performed by looking in a first phase at the number of records returned by the query, and secondly by studying the evolution of cardinality with each table added to the query. The clearest signal that something went bad is when no records, too less or too many records are returned (than expected), then the problem most probably resides in the constraints used, namely in JOIN, WHERE or HAVING clause constraints.

WHERE clause constraints

    If your query is just a simple statement without joins of any type, then more likely the problem resides in the constraints used in the WHERE clause, at least one of them being evaluated to false. Here are several possible reasons:
- One of the hardcoded values or one of the provided parameters is not found between the values available in the table;
- Not treated NULL cases;
- Wrong use of predicates or parenthesis.

    The simplest constraint evaluated to false is “0=1” or something similar, actually can be used any similar values (e.g. ‘A’=’B’, ‘ABC’=’’, etc.). In contrast “1=1” is evaluated always to true. The below query returns no records because the constraint is evaluated for false:
-- single constraint evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0


    A more complex scenario is when multiple predicates and at least one equates always to true or to false, the task in such situations being to identify such constraints:
-- multiple constraints, one evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0
    AND Class = 'L'

    AND SafetyStockLevel >100


    The constraints could be grouped also using parenthesis:
-- multiple constraints with parenthesis, one evaluated to false
SELECT *
FROM
Production.Product

WHERE
1=0
    AND ((Class = 'L'

    AND SafetyStockLevel >100)

      OR (Class = 'M'

    AND SafetyStockLevel =100))


    Dealing with multiple such constraints is not a question of guessing but pure applied Boolean algebra essential in writing accurate queries. So if you are feeling that you’re not mastering such concepts, then maybe it’s a good idea to consult some material on this topic.

    There could be situations in which the total outcome of a set of constraints it’s not so easy to see, in such cases the constraint could be brought in the SELECT statement. Which constraints should be tested depends also on the particularities of your query, here is an example:
--testing constraints' outcome for each record
SELECT *
,
CASE
     WHEN Class = 'L' AND SafetyStockLevel >100 THEN 1
     ELSE 0
  END Constraint1

,
CASE
     WHEN Class = 'M' AND SafetyStockLevel =100 THEN 1
     ELSE 0
  END Constraint2

,
CASE
     WHEN (Class = 'L' AND SafetyStockLevel >100)

        OR (Class = 'M' AND SafetyStockLevel =100) THEN 1
     ELSE 0
END Constraint3

FROM
Production.Product 

  The constraints in which one of the members is NULL and the IS NULL or Null functions are not used, are incorrect evaluated, actually ignored, and in certain cases they might look even correct, though this depends also on the expressions used. For example the below query will return no records, while the next one will return some records. For more on the correct handling of NULLs please see Null-ifying the world or similar posts on the web.

--misusing NULL-based constraints
SELECT *
FROM
Production.Product

WHERE
(NOT 1=NULL) OR (1=NULL)


JOIN constraints

    JOINs based on single JOIN constraints should in theory pose no problem unless the wrong attributes are used in the JOIN and that’s so easy to do especially when no documentation is available on the topic or it is incomplete. Functional or other technical specifications, physical or semantic models, metadata or other developers’ knowledge could help the developer to figure out about the relations between the various entities. On the other side it’s in human nature to make mistakes, forgetting to add one constraint, using the wrong type of join or the wrong attributes in the join constraint. In addition all the situations described above for WHERE clauses constraints apply also to join constraints, independently on whether the ANSI or non-ANSI syntax is used, the later based on writing the JOIN constraints in the WHERE clause, making the identification of the JOIN constraints not so easy to see, this being one of the reasons for which I recommend you to use ANSI syntax.

    A technique I use on a daily basis in order to test my scripts is to check the changes in the number of records with the adding of each new table to the initial query. In a first phase is enough to count the number of records from the table with the lowest level of detail, and the number of records from the final query, this ignoring the WHERE constraints. For example let’s consider the following query written some time ago (most of attributes were removed for simplicity):
-- Purchase Orders
SELECT POD.PurchaseOrderDetailID
,
POD.PurchaseOrderID

,
POD.ProductID

,
POH.ShipMethodID

,
POH.VendorID

FROM
Purchasing.PurchaseOrderDetail POD
     JOIN Purchasing.PurchaseOrderHeader POH
       ON POD.PurchaseOrderID = POH.PurchaseOrderID
           JOIN Purchasing.ShipMethod PSM
              ON POH.ShipMethodID = PSM.ShipMethodID
           JOIN Purchasing.Vendor SMF
              ON POH.VendorID = SMF.VendorID
     JOIN Production.Product ITM
        ON POD.ProductID = ITM.ProductID

WHERE
POD.StockedQty >100
     AND ITM.Class = 'L'


    Usually I write the table with the lowest level of detail (in theory also with the highest number of records) first, adding the referenced table gradually, starting with the tables with the lowest number of records, thus the lookup tables will come first. This usually doesn’t affect the way the database engine processes the query (unless special techniques are used), and it allows to use kind of a systematic approach. This allows me also to test the query without making important changes. As I wrote above, the first test is done against the first table, for this commenting the other joins and eventually the WHERE constraints:
-- testing changes in the number of records - test 1
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD

/* JOIN Purchasing.PurchaseOrderHeader POH
     ON POD.PurchaseOrderID = POH.PurchaseOrderID
          JOIN Purchasing.ShipMethod PSM
             ON POH.ShipMethodID = PSM.ShipMethodID
          JOIN Purchasing.Vendor SMF
             ON POH.VendorID = SMF.VendorID
      JOIN Production.Product ITM
         ON POD.ProductID = ITM.ProductID
WHERE POD.StockedQty >100
AND ITM.Class = 'L' */


    In the second step is enough to check the number of records returned by the whole query without the WHERE constraints:
-- testing changes in the number of records - test 2
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD

     JOIN Purchasing.PurchaseOrderHeader POH
        ON POD.PurchaseOrderID = POH.PurchaseOrderID
            JOIN Purchasing.ShipMethod PSM
               ON POH.ShipMethodID = PSM.ShipMethodID
            JOIN Purchasing.Vendor SMF
               ON POH.VendorID = SMF.VendorID
      JOIN Production.Product ITM
          ON POD.ProductID = ITM.ProductID

/* WHERE POD.StockedQty >100
     AND ITM.Class = 'L' */

    If the number of records is correct then most probably the query is correct, this because there could be incorrect LEFT JOINS that are not reflected in the number of records but in the fact that the corresponding attributes are missing. So don’t rely entirely on this type of test, but take 1-2 examples for which you are sure that records must be retrieved from the LEFT JOIN, and check whether the expected values are shown. Eventually, only for testing purposes, the LEFT JOIN could be modified as a FULL JOIN to see what records are returned.

    Now if there are changes in the number of records from one test to the other, then take the first query and move the comment one join further (see below), and repeat the step until the JOIN that causes the variances is identified. 
 
-- testing changes in the number of records - test 3
SELECT count(1) NumberRecords
FROM
Purchasing.PurchaseOrderDetail POD
JOIN Purchasing.PurchaseOrderHeader POH
ON POD.PurchaseOrderID = POH.PurchaseOrderID

/*      JOIN Purchasing.ShipMethod PSM
           ON POH.ShipMethodID = PSM.ShipMethodID
         JOIN Purchasing.Vendor SMF
            ON POH.VendorID = SMF.VendorID
     JOIN Production.Product ITM
        ON POD.ProductID = ITM.ProductID
WHERE POD.StockedQty >100
AND ITM.Class = 'L' */ 

    Once the join that causes the variances is found, then might be requested then to add a new join constraint or use a LEFT JOIN instead of a FULL JOIN. For example if in the above query there are ShipMethodID with NULL values, then the query should have been written using a LEFT JOIN. On the other side, if duplicates are found, then do a grouping by using the primary key of the table with the lowest level of detail, in this case the Purchasing.PurchaseOrderDetail.

HAVING clause constraints

    HAVING clause constraints behave more like WHERE clause constraints, the difference residing in the different levels where the constraints are applied, so the same technique could be applied in this case too, with the difference that the constraints need to be accommodated to support aggregate functions.
--Vendors with total purchase value >100000
SELECT POH.VendorID
,
SUM(POH.SubTotal) TotalSubTotal

FROM
Purchasing.PurchaseOrderHeader POH

WHERE
POH.Status IN (2,4) -- 2-Approved, 4-Complete

GROUP
BY POH.VendorID

HAVING
SUM(POH.SubTotal)>100000


    Sometimes it helps to bring the HAVING constraints into the SELECT, though that offers limited functionality, therefore it’s more productive maybe to re-link the aggregated query back to the base table(s) on which the statement is based:
--Vendors with total purchase value >100000 - base records
SELECT A.*
,
B.TotalSubTotal

,
CASE

     WHEN B.VendorID IS NOT NULL THEN 1
     ELSE 0
END ConsideredFlag

FROM
Purchasing.PurchaseOrderHeader A
LEFT JOIN ( --aggregated data

      SELECT POH.VendorID
      , SUM(POH.SubTotal) TotalSubTotal
      FROM Purchasing.PurchaseOrderHeader POH
      WHERE POH.Status IN (2,4) -- 2-Approved, 4-Complete

      GROUP BY POH.VendorID
HAVING SUM(POH.SubTotal)>100000
) B
     ON A.VendorID = B.VendorID

WHERE
A.Status IN (2,4) -- 2-Approved, 4-Complete

 
    Also this technique could prove to be limited, though I found it useful in several cases, especially when a complicated formula is included in the aggregated function. The same query could be rewritten with a window aggregate function introduced with SQL Server 2005.
--Vendors with total purchase value >100000 - window aggregate function
SELECT POH.*
,
SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID) TotalSubTotal

,
CASE

     WHEN SUM(POH.SubTotal) OVER (PARTITION BY POH.VendorID)>100000 THEN 1
     ELSE 0
END ConsideredFlag

FROM
Purchasing.PurchaseOrderHeader POH

WHERE
POH.Status IN (2,4) -- 2-Approved, 4-Complete

    When working with amounts, quantities or other number values, it could be useful to check the total amount/quantity based on the query logic against the total amount/quantity based on the base table.