About Me

IT Professional with more than 16 years experience in IT especially in the area of full life-cycle of Web/Desktop Applications Development, Database Development, Software Engineering, Consultancy, Data Management, Data Quality, Data Migrations, Reporting, ERP support, etc.

Sunday, November 20, 2005

Who's gonna rule the world?      This morning I started to read an article from Information Week about the future of database administrators, as it seems they will rule the world. Is not a joke, their role will become more and more important over the next years, why this? 1. Databases become bigger and more complex      The size and complexity of databases is increasing from year to year, same their importance in making faster decision based on current or historical data.      Bigger the size and complexity, bigger also the need for ad-hoc or one mouse click reports, and who’s the person who has the knowledge and resources to do this?! 2. Data mining      Historical data doesn't resume only to simple reporting, but data mining is becoming more popular, having as purpose the discovery of trends in the business, facilitating somehow predictions for business' future.      Let’s not forget that the big players on the database market started to offer data mining solutions, incorporated or not in their database platforms, the data mining features are a start in providing new views over the data.      Big companies will opt probably for data analysts, but for medium-sized and small companies, the role of a data analyst could be easy taken by a database administrator. 3. The knowledge that matters      Even if databases are backend for complex UI, in time, if not immediately, the database administrator becomes aware of underlaying overall structure of the database and, why not, leaded by curiosity or business' requirements he becomes aware of the content of the data. Even if is a paradox, the database administrator has sometimes more data knowledge than a Manager and the skills to do with data that magic that is important for Managers. 4. Data Cleansing      More and more companies are becoming aware of the outliers from their data, so data cleansing is becoming a priority, and again the database administrator has an important role in this process. 5. Database Security      Database platforms are growing, unfortunately same thing happens with their security holes and the number of attacks. A skilled database administrator must have the knowledge to overcome these problems and solve the issues as soon as possible. 6. Training      Even if can be considered a consequence of the previous enumerated facts, it deserves to be considered separately. Companies start to realize how important it to have a skilled database administrator, so probably they will provide the requested training or help the DBA to achieve this. 7. On site/off shore compromise      If in the past years, has been en vogue the concept of off shoring people, including database administrators; probably soon companies will become aware that reducing direct maintenance costs doesn’t mean a reduction of overall costs, which may contain also indirect costs. The delayed response time generated by communication issues, availability or resources plays an important role in the increase indirect costs, most of the big companies will be forced then to opt for a compromise between on site and off shore.      In this case the direct benefit for a database administrator is not so obvious, except the financial benefit there is also an image advantage, even if is disregarded, the image and the fact a person is not anymore isolated is important.

Saturday, November 19, 2005

Cursors and Lists

    I found Cursors really useful when the linear logic provided by a query is not solving a special problem (e.g. concatenate in a list the values returned by a select, multiple updates/insert/deletions based on the success or failure of previous logic).     Cursors are not complicated, but writing the basic code again and again could be painful when the time is limited, so I prefer to have the simplest code and then modify it according to the new requirements.     Just a simple sample of using a Cursor, it returns a list of emails. --creating the sample table CREATE TABLE EmailAddresses(ID int, Email varchar(50)) --insert test records INSERT EmailAddresses VALUES (1, 'John.Travolta@star.com') INSERT EmailAddresses VALUES (2, 'Robert.DeNiro@star.com') INSERT EmailAddresses VALUES (3, 'MegRyan@star.com') INSERT EmailAddresses VALUES (4, 'Helen.Hunt@star.com') INSERT EmailAddresses VALUES (5, 'Jodie.Foster@star.com') CREATE FUNCTION dbo.GetEmails() RETURNS varchar(1000) /* Purpose: returns a list of Emails Parameters: Notes: Sample: SELECT dbo.GetEmails() AS ListEmails */ AS BEGIN DECLARE @Email varchar(50) DECLARE @Emails varchar(1000) SET @Emails = '' -- Create Emails Cursor DECLARE Emails CURSOR FOR SELECT Email FROM EmailAddresses ORDER BY Email OPEN Emails -- Open Emails Cursor --fetch first set of records from Emails Cursor FETCH NEXT FROM Emails INTO @Email WHILE @@FETCH_STATUS = 0 --if the fatch was successful BEGIN SET @Emails = @Emails + @Email + ',' --fetch next set of records from Emails Cursor FETCH NEXT FROM Emails INTO @Email END CLOSE Emails -- close Emails cursor DEALLOCATE Emails --deallocate Emails cursor --remove the extra comma IF Len(@Email)>0 SET @Emails = Left(@Emails, Len(@Emails)-1) RETURN @Emails END     The sample was tested on SQL Server 2000+.     The source query is not complicated, what if the content of the query is not known until runtime, all is known is that only one column of data is returned. Here interferes the magic of EXEC function and a table data type. --creating the sample table CREATE TABLE Countries(ID int, Country varchar(50)) --insert test records INSERT Countries VALUES (1, 'US') INSERT Countries VALUES (2, 'UK') INSERT Countries VALUES (3, 'Germany') INSERT Countries VALUES (4, 'Spain') INSERT Countries VALUES (5, 'France') CREATE PROCEDURE dbo.pListCountries AS SELECT Country FROM Countries ORDER BY Country CREATE FUNCTION dbo.GetList( @Query varchar(250)) RETURNS varchar(1000) /* Purpose: returns a list of List Parameters: Notes: Sample: SELECT dbo.GetList('SELECT Country FROM Countries ORDER BY Country') AS List */ AS BEGIN DECLARE @Result varchar(50) DECLARE @List varchar(1000) DECLARE @Temp TABLE (Val varchar(10)) SET @List = '' --insert in a table data type the results returned by query INSERT @Temp EXEC (@Query) -- Create List Cursor DECLARE List CURSOR FOR SELECT Val FROM @Temp OPEN List -- Open List Cursor --fetch first set of records from List Cursor FETCH NEXT FROM List INTO @Result WHILE @@FETCH_STATUS = 0 --if the fatch was successful BEGIN SET @List = @List + @Result + ',' --fetch next set of records from List Cursor FETCH NEXT FROM List INTO @Result END CLOSE List -- close List cursor DEALLOCATE List --deallocate List cursor --remove the extra comma IF Len(@Result)>0 SET @List = Left(@List, Len(@List)-1) RETURN @List END     Now let’s test the function using a simple SELECT SELECT dbo.GetList('SELECT Country FROM Countries ORDER BY Country') AS List Result: List -------------------------- France,Germany,Spain,UK,US (1 row(s) affected)     Let’s create a stored procedure using the table created above: CREATE PROCEDURE dbo.pListCountries AS SELECT Country FROM Countries ORDER BY Country     And use the stored procedure as parameter for GeList. SELECT dbo.GetList('dbo.pListCountries') AS List Result: List -------------------------- France,Germany,Spain,UK,US (1 row(s) affected)     Let’s create a table defined function based on the table created above: CREATE FUNCTION dbo.ListCountries() RETURNS TABLE AS RETURN(SELECT Country FROM Countries)     And use the stored procedure as parameter for GeList. SELECT dbo.GetList('SELECT * FROM ListCountries ORDER BY Country') AS List Result: List -------------------------- France,Germany,Spain,UK,US (1 row(s) affected)     Can be used as source any other table as long the length of the target column is less than 50 characters (@Result was defined as varchar(50)).     The method could be useful in SQL Server 2000, while in 2005 version can be used a managed table valued function instead .

Problems in SQL Server 2000 DTS Packages

    There are cases when a problem can not be solved with a query, additional data processing being requested. In this category of cases can be useful to use a temporary table or the table data type. Once the logic built you want to export the data using a simple DTS project having as Source the built query. Now the surprises appear: 1. Temporary table Sample: CREATE PROCEDURE dbo.pTemporaryTable AS    CREATE TABLE #Temp(ID int, Country varchar(10)) -- create a temporary table    -- insert a few records    INSERT #Temp VALUES (1, 'US')    INSERT #Temp VALUES (2, 'UK')    INSERT #Temp VALUES (3, 'Germany')    INSERT #Temp VALUES (4, 'France')    SELECT * FROM #Temp -- select records    DROP TABLE #Temp – drop the temporary table Test the stored procedure in Query Analyzer EXEC dbo.pTemporaryTable Output: ID Country ----------- ---------- 1 US 2 UK 3 Germany 4 France (4 row(s) affected)     Now trying to use the stored procedure as Source in a DTS package will bring the error message: Error Source: Microsoft OLE DB Provider for SQL Server Error Description: Invalid object name ‘#Temp’     I hoped this has been fixed in SQL Server 2005 (Beta 3) version, but when I tried to use the stored procedure as a Source I got the error: Error at Data Flow Task [OLE DB Source [1]]: An OLE DB error has occurred. Error code: 0x80004005 An OLE DB record is available. Source: "Microsoft OLE DB Provider for SQL Server" Hresult: 0x80004005 Description: "Invalid object name '#Temp'." Error at Data Flow Task [OLE DB Source [1]]: Unable to retrieve column information from the data source. Make sure your target table in the database is available. ADDITIONAL INFORMATION: Exception from HRESULT: 0xC020204A (Microsoft.SqlServer.DTSPipelineWrap) 2. Table data type sample CREATE PROCEDURE dbo.pTableDataType AS    DECLARE @Temp TABLE (ID int, Country varchar(10)) -- create table    -- insert a few records    INSERT @Temp VALUES (1, 'US')    INSERT @Temp VALUES (2, 'UK')    INSERT @Temp VALUES (3, 'Germany')    INSERT @Temp VALUES (4, 'France')    SELECT * FROM @Temp -- select records Test the stored procedure in Query Analyzer EXEC dbo.pTableDataType Output: ID Country ----------- ---------- 1 US 2 UK 3 Germany 4 France (4 row(s) affected)     This time the error will not appear when the Source is provided, but when the package is run: Error message: Invalid Pointer     In Server 2005 (Beta 3), when I tried to run the package this went bad again: SSIS package "Package6.dtsx" starting. Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning Information: 0x4004300A at Data Flow Task, DTS.Pipeline: Validation phase is beginning Information: 0x40043006 at Data Flow Task, DTS.Pipeline: Prepare for Execute phase is beginning Information: 0x40043007 at Data Flow Task, DTS.Pipeline: Pre-Execute phase is beginning SSIS package "Package6.dtsx" finished: Canceled.     In fact I saw there will be problems because in Flat File Connection Manager’s Preview no data were returned. I hope these problems were solved in the last SQL Server Release, if not we are in troubles! Possible Solutions:     If the Stored Procedures provided above are run in SQL Query Analyzer or using ADO, they will work without problems.     This behavior is frustrating, especially when the logic is really complicated and you put lot of effort in it; however there are two possible solutions, both involving affordable drawbacks: 1. Create a physical table manually or at runtime, save the data in it and later remove the table. This will not work for concurrent use (it works maybe if the name of the table will be time stamped), but for ad-hoc reports might be acceptable. 2. Built an ADO based component which exports the data to a file in the format you want. The component is not difficult to implement, but additional coding might be requested for each destination type. The component can be run from a UI or from a DTS package. 3. In SQL Server Yukon is possible to use the CLR and implement the logic in a managed stored procedure or table valued function. I just hope it works.

Friday, November 18, 2005

   This evening home is snowing, so nice, I miss the snow and flakes... I was just thinking about prime numbers and why people had such a fascination about them. In mysticism every number has a powerful meaning, a power coming from inside, and therewith attracts certain powers from exterior, is strange, hard to understand all this, especially how all numbers can be reduced to a number between 1 and 9.    Every philosophy has a little truth in it.
   I was trying to find a nice article on number reduction, for example the one from sourceryforge could be useful. Last week I implemented a function which returns the reduction of any sequence, could be date or number. I was intending to apply it on prime numbers, but even if can be observed certain similarities between different chains, is difficult to believe this will lead anywhere. CREATE FUNCTION dbo.NumberReduction( @text varchar(50)) RETURNS smallint /* Purpose: reduces to number a sequence of digits Parameters: @text varchar(50) - digits sequence Notes: the function works with any sequnce of alphanumeric, not numeric values being ignored Sample: SELECT dbo.NumberReduction('11/17/2005') SELECT dbo.NumberReduction('1234') SELECT dbo.NumberReduction('1234 8993 90003 3456') */ BEGIN    DECLARE @index smallint    DECLARE @total int    SET @index = 1    SET @total = 0    --for each char in alphanumeric sequence    WHILE (@index<=len(@text))    BEGIN          IF IsNumeric(Substring(@text, @index, 1)) = 1          SET @total = @total + Cast(Substring(@text, @index, 1) as smallint)       SET @index = @index+1    END    --if the number is not in 1..9 range, it calls further the NumberReduction for the new total    IF len(@total)>1       SET @total = dbo.NumberReduction(@total)    RETURN @total END Now, returning to prime numbers, I tried to generate the prime numbers between 1 and 1000000 and analize how many prime numbers are for each thousand. SELECT PrimeNumber/1000 DataSet , count(*) NumberPrimeNumbers FROM PrimeNUmbers GROUP BY PrimeNumber/1000 ORDER BY DataSet As it seems the maximum number of prime numbers is reached for the first thousand and decreases. DataSet*1000 Number Prime Numbers -------------------- ------------------ 0 (2..1000) 168 1 (1001..2000) 135 2 (2001..3000) 127 3 ........... 120 4 ........... 119 5 ........... 114 6 ........... 117 7 ........... 107 8 ........... 110 9 ........... 112 10 .......... 106 11 .......... 103 12 .......... 109 13 .......... 105 14 .......... 102 15 .......... 108 16 .......... 98 17 .......... 104 18 .......... 94 19 .......... 104 20 .......... 98 ........................... 980 ......... 67 981 ......... 75 982 ......... 70 983 ......... 70 984 ......... 70 985 ......... 76 986 ......... 76 987 ......... 63 988 ......... 71 989 ......... 72 990 ......... 71 991 ......... 79 992 ......... 65 993 ......... 68 994 ......... 78 995 ......... 69 996 ......... 69 997 ......... 83 998 ......... 74 999 ......... 65 That's all for today, is hard to believe somebody will read this :) today, anyway a nice weekend to everybody!

Thursday, November 17, 2005

To be or not to be... prime

    First of all, what is a prime number? According to Wolfram a prime number is "any positive integer p>1 that has no positive integer divisors other than 1 and p itself. Not bad, in other words there is no positive integer numbers p>m>1 that p = m*n, n>1.     So, all we have to do is to check if any number m between 2 and p-1 is not a divisor for p.     If m is not a divisor for p then we could write: p/m = n + r, r > 0, this is in mathematics, but in programming even if is sql, C++ or any programming language, when working with integers, by p/m is understood n while by p mod m is r. Respecting the order of operations (p/m)*m = p only if m is a divisor for p. Now, is it really necessary to check all the numbers between 2 and p-1 if they are or not divisors for p? Probably not, when m increases, p/m decreases, and also the range of possible divisors, which is between m+1 and p/m-1 until m becomes greater or equal with p/m. m=p/m only when m is the square from p, so is necessary and sufficient to check only the values between 2 and √p.     Complicated? maybe at the first sight if somebody is not used with mathematics.     To summarize, if all the numbers m betwen 2 and √p are not divisors of p then p is prime. This is the "theory" used in IsPrime function. CREATE FUNCTION dbo.IsPrime( @number bigint) RETURNS smallint /* Purpose: checks if a positive integer is prime number Parameters: @number bigint - the number to be checked if prime Notes: Sample: SELECT dbo.IsPrime(1000) SELECT dbo.IsPrime(947) SELECT dbo.IsPrime(111913) SELECT dbo.IsPrime(113083) */ BEGIN DECLARE @index bigint DECLARE @result bit SET @index = 2 SET @result = 1 WHILE (@index<=sqrt(@number)) BEGIN IF (@number = (@number/@index)*@index) SET @result = 0 SET @index = @index+1 END RETURN @result END     The function is ready, could be useful a piece of code to test the function for a certain interval. This is what pListPrimeNumbers is supposed to do. CREATE PROCEDURE dbo.pListPrimeNumbers( @startInterval bigint , @endInterval bigint) AS /* Purpose: lists the prime numbers from a the [@startInterval, @endInterval] interval Parameters: @startInterval bigint - the left side of the interval in which will be searched the prime numbers @endInterval bigint - the right side of the interval in which will be searched the prime numbers Notes: Sample: EXEC dbo.pListPrimeNumbers 900, 1000 EXEC dbo.pListPrimeNumbers 111900, 112900 */ DECLARE @index bigint SET @index = @startInterval --for each integer in interval WHILE (@index<=@endInterval) BEGIN -- list a number if prime IF (dbo.IsPrime(@index)=1) SELECT @index [Prime Numbers] SET @index = @index+1 END     The only problem with the stored procedure is that when its results are listed in grid and it returns more than 100 prime numbers will get receive the message:     The query has exceeded the maximum number of result sets that can be displayed in the results grid. Only the first 100 result sets are displayed in the grid.     Cool message, isn't it? Happens, but no problem appears if the results are displayed to text or file: Prime Numbers -------------------- 907 (1 row(s) affected) Prime Numbers -------------------- 911 (1 row(s) affected) Prime Numbers -------------------- 919     Still, the result is not what has been expected. Using a DTS Package will not bring the wanted result because it will return only the first record.     It's easier to create a simple table, which will allow further analyssis of the data. CREATE TABLE PrimeNumbers (PrimeNumber bigint) and insert in it the results returned by pListPrimeNumbers stored procedure INSERT PrimeNumbers EXEC dbo.pListPrimeNumbers 1, 10000 Note:     The utility of IsPrime function is doubtful, there are not many the cases when such a function is needed in SQL.     Initially I wanted to test the performance of implementing IsPrime in SQL vs implementing it as a managed function, but another surprize: Msg 6505, Level 16, State 1, Procedure IsPrimeNumber, Line 1 Could not find Type 'UserDefinedFunctions' in assembly 'MyLibrary'.     And what's even more interesting is that 3 months ago the same functionality worked without problems, but since then I had to reinstall the SQL Server because I couln't use DTS packages after I installed LINQ preview. Welcome to the "real" world of Microsoft :(.

Monday, November 14, 2005

Out of Space

    Sometime ago I found a post in which somebody was annoyed of writing code like this: DECLARE @zipcode char(5) SET @zipcode = '90' SELECT CASE len(@zipcode) WHEN 1 THEN '0000' + @zipcode WHEN 2 THEN '000' + @zipcode WHEN 3 THEN '00' + @zipcode WHEN 4 THEN '0' + @zipcode WHEN 5 THEN @zipcode END     I remembered I met the same situation in a project I worked long time ago, but instead of wroting a line of code for each case, I prefered to write a simple function, which used the Space and Replace built-in functions to add first a number of spaces and then replace them with a defined character. Maybe somebody will argue about the performance of such a function, but I prefer in many cases the reusability against performance, when the difference is not so big. Additionally this enhances code’s readability.     So, here is the function: CREATE FUNCTION dbo.FillBefore( @target varchar(50) , @length int , @filler char(1)) /* Purpose: enlarges a string to a predefiend length by puting in front of it the same character Parameters: @target varchar(50) - the string to be transformed , @length int - the length of output string , @filler char(1) - the character which will be used as filler Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL Sample: SELECT dbo.FillBefore('1234', 5, '0') SELECT dbo.FillBefore('1234', 10, ' ') SELECT dbo.FillBefore(1234, 10, ' ') SELECT dbo.FillBefore(Cast(1234 as varchar(10)), 10, '0') SELECT dbo.FillBefore(NULL, 10, '0') */ RETURNS varchar(50) AS BEGIN RETURN Replace(Space(@length-len(IsNull(@target, ''))), ' ', @filler) + IsNull(@target, '') END and how it can be used: SELECT dbo.FillBefore('1234', 5, '0') SELECT dbo.FillBefore('1234', 10, ' ') SELECT dbo.FillBefore(1234, 10, ' ') SELECT dbo.FillBefore(Cast(1234 as varchar(10)), 10, '0')     Another situation, when Space function was useful, was when I had to save the result of a query to a text file in fixed width format. It resumes to the same logic used in FillBefore function, except the spaces must be added at the end of the string. For this I just needed to modify the order of concatenation and create a new function. CREATE FUNCTION dbo.FillAfter( @target varchar(50) , @length int , @filler char(1)) /* Purpose: enlarges a string to a predefiend length by adding at the end the same character Parameters: @target varchar(50) - the string to be transformed , @length int - the length of output string , @filler char(1) - the character which will be used as filler Notes: the function works if the length of @length is greater or equal with the one of the @target, otherwise will return a NULL Sample: SELECT dbo.FillAfter('1234', 5, '0') SELECT dbo.FillAfter('1234', 10, ' ') SELECT dbo.FillAfter(1234, 10, '0') SELECT dbo.FillAfter(Cast(1234 as varchar(10)), 10, '0') SELECT dbo.FillAfter(NULL, 10, '0') */ RETURNS varchar(50) AS BEGIN RETURN IsNull(@target, '') + Replace(Space(@length-len(IsNull(@target,''))), ' ', @filler) END Notes: 1.    In SQL Server 2005, the output of a query can be saved using a DTS Package with fix format directly to a text file using ‘fixed width’ (the columns are defined by fix widths) or ‘ragged right’ (the columns are defined by fix widths, except the last one which is delimited by the new line character) format for the destination file. 2.    To document or not to document. I know that many times we don't have the time to document the database objects we created, but I found that a simple comment saved more time later, even if was about my or other developer's time. 3.    If is really needed to use the CASE function, is good do a analysis of the data and put as first evaluation the case with the highest probability to occur, as second evaluation the case with the second probability to occur, and so on. The reason for this is that each option is evaluated until the comparision operation evaluates to TRUE.     For example, if the zip codes are made in most of the cases of 5 characters, then it makes sense to put it in the first WHEN position.

Tuesday, November 08, 2005

Resources

So, for the beginning a few links I found useful over the time: 1. www.sqlservercentral.com 2. www.sqldts.com a site mainly about DTS Packages 3. www.sswug.org even if it requires a membership it has also free article about data technologies and programming. I consider it a useful source, otherwise I do not want to pay the membership for nothing 4. www.databasejournal.com is my first source of information, it has good tutorials and articles. 5. www.fawcette.com it has resources about databases and programming, is really useful and covers a range of topics, not only Microsoft 6. www.informit.com is the first source of information about programming stuff I worked with, I watched it growing over the years. It has also a library of free books but also one of the biggest library of programming books available for reading with different memberships. SQL Server Reference Guide section could be useful. 7. www.sqljunkies.com 8. http://staff.develop.com/bobb/weblog/default.aspx Bob Beauchemin's Blog, will be moved soon so... 9. www.microsoft.com/events/series/default.mspx Microsoft webcasts about all their technologies 10. www.sqlteam.com 11. www.sqlserverdatamining.com 12. http://msdn.microsoft.com/data/Default.aspx Microsoft website about their data solutions 13. www.15seconds.com even if it has only a few information about data solution, I found it useful over time 14. www.yukonxml.com interesting website, it worth a look There are a few more websites I found useful over time, I will provide them when I will have the chance. Sometimes is useful and faster to do a search on a search engine like www.google.com or www.yahoo.com.

Introduction

Hi everybody, so I started to blog, this until I will get sufficient information for a website, there is nothing extraordinary in here, things you probably will find searching on the web. I will try to post all the problems I am running into, this also because I am having a predilection in running into all kind of problems. :) Happy and errorless programming! Adrian