# Expert SQL Server 2008 Development- P5

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

0
58
lượt xem
5

## Expert SQL Server 2008 Development- P5

Mô tả tài liệu

Tham khảo tài liệu 'expert sql server 2008 development- p5', công nghệ thông tin, cơ sở dữ liệu phục vụ nhu cầu học tập, nghiên cứu và làm việc hiệu quả

Chủ đề:

Bình luận(0)

Lưu

## Nội dung Text: Expert SQL Server 2008 Development- P5

1. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS server. For a further discussion of the correct placement of data and application logic, refer back to Chapter 1. Examples of commonly cited situations in which SQLCLR is perhaps a better choice than TSQL include manipulation of string or XML data, certain math functions that are provided by dedicated methods in the .NET Base Class Library, and situations where procedural code is more efficient than set- based logic. In order to test the validity of these claims, I decided to set up some simple test cases to compare the relative performance of T-SQL against SQLCLR, which are described in the following sections. Creating a “Simple Sieve” for Prime Numbers For this test, I created two simple procedures that return a list of all prime numbers up to a supplied maximum value—one implemented in T-SQL, and one using SQLCLR. The logic of these tests was made as simple as possible: each is supplied with a maximum value that is decremented in a loop, and in each iteration of the loop, the modulo operator is used to determine the remainder when that value is divided by every lesser number. If the remainder of the division is 0 (in other words, we have found a factor), we know that the current value is not a prime number. The loop therefore moves on to test the next possible value. If the inner loop tests every possible divisor and has not found any factors, then we know the value must be a prime. Using this kind of “simple sieve” algorithm for finding prime numbers relies on basic mathematical functions and procedural logic, which makes it a good test to compare the performance of T-SQL and SQLCLR. Here’s the T-SQL implementation: CREATE PROCEDURE ListPrimesTSQL ( @Limit int ) Download at WoweBook.com AS BEGIN DECLARE -- @n is the number we're testing to see if it's a prime @n int = @Limit, --@m is all the possible numbers that could be a factor of @n @m int = @Limit - 1; -- Loop descending through the candidate primes WHILE (@n > 1) BEGIN -- Loop descending through the candidate factors WHILE (@m > 0) BEGIN -- We've got all the way to 2 and haven't found any factors IF(@m = 1) BEGIN PRINT CAST(@n AS varchar(32)) + ' is a prime' BREAK; END -- Is this @m a factor of this prime? IF(@n%@m) 0 BEGIN -- Not a factor, so move on to the next @m SET @m = @m - 1; CONTINUE; END 179
2. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS ELSE BREAK; END SET @n = @n-1; SET @m = @n-1; END END; GO And here’s the SQLCLR implementation using exactly the same logic: [SqlProcedure] public static void ListPrimesCLR(SqlInt32 Limit) { int n = (int)Limit; int m = (int)Limit - 1; while(n > 1) { while(m > 0) { if(m == 1) { SqlContext.Pipe.Send(n.ToString() + " is a prime"); } if(n%m != 0) { m = m - 1; continue; } else { break; } } n = n - 1; m = n - 1; } } Note Clearly, if you actually wanted to get a list of the prime numbers, you would NOT use such a naive approach as this. The example used here is intended to provide a simple procedure that can be implemented consistently across both T-SQL and SQLCLR. 180
3. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS I tested each solution several times, supplying different values for the maximum limit from which the loop starts. The average execution time for each solution is shown in the graph illustrated in Figure 7-1. Figure 7-1. Comparison of prime number sieve implemented in T-SQL and SQLCLR The results should come as no surprise—since the approach taken relies on mathematical operations in an iterative loop, SQLCLR is always likely to outperform set-based T-SQL. However, you might be surprised by the magnitude of the difference between the two solutions, especially as the number of iterations increases. If we were to compare simple inline, or nonprocedural, calculations then there would likely not be such a stark contrast between the two methods. Calculating Running Aggregates Few practical database applications need to produce a list of prime numbers—a more common type of mathematical query operation that might benefit from the use of SQLCLR is when you need to calculate a value in a row based on the value of previous rows’ data. The most common example of such a linear query is in the calculation of aggregates, such as running sums of columns. The typical approach using T-SQL is to make use of a self-join on the table, such as follows: SELECT T1.x, SUM(T2.x) AS running_x FROM T AS T1 INNER JOIN T AS T2 ON T1.x >= T2.x GROUP BY T1.x; 181
4. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS Unfortunately, the process required to satisfy this query is not very efficient. Assuming that an index exists on the x column of the table, the preceding query generates the execution plan shown in Figure 7-2. Figure 7-2. A nested index seek used to create a running sum in T-SQL To sum all of the previous values in the column requires a nested loop containing an index seek. The number of rows returned by this seek increases exponentially as more rows are processed. On the first row, this seek must only sum one value, but to find the running sum over a set of 100 rows, 5,050 total rows need to be read. For a set of 200 rows, the query processor needs to process 20,100 total rows—four times the amount of work required to satisfy the previous query. Thus, the performance of this approach to calculate running aggregates degrades rapidly as more rows are added to the table. An alternative solution, which can yield some significant performance benefits, is to make use of a cursor. There is a commonly held perception in the SQL Server development world that cursors are a bad thing, but they do have valid use cases, and this might be one of them. However, there are a number of good reasons why many developers are reluctant to use cursors, and I’m certainly not advocating their use in general. A better approach would be to use SQLCLR to loop through and store the running values using local variables, and then stream the results one row at a time via the SqlPipe. An example of such a solution is given in the following code listing: [Microsoft.SqlServer.Server.SqlProcedure] public static void RunningSum() { using (SqlConnection conn = new SqlConnection("context connection=true;")) { SqlCommand comm = new SqlCommand(); comm.Connection = conn; comm.CommandText = "SELECT x FROM T ORDER BY x"; SqlMetaData[] columns = new SqlMetaData[2]; columns[0] = new SqlMetaData("Value", SqlDbType.Int); columns[1] = new SqlMetaData("RunningSum", SqlDbType.Int); int RunningSum = 0; SqlDataRecord record = new SqlDataRecord(columns); 182
5. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS SqlContext.Pipe.SendResultsStart(record); conn.Open(); SqlDataReader reader = comm.ExecuteReader(); while (reader.Read()) { int Value = (int)reader[0]; RunningSum += Value; record.SetInt32(0, (int)reader[0]); record.SetInt32(1, RunningSum); SqlContext.Pipe.SendResultsRow(record); } SqlContext.Pipe.SendResultsEnd(); } } I’ve used this solution on a number of occasions and find it to be very efficient and maintainable, and it avoids the need for any temp tables to be used to hold the running sums as required by the alternatives. When testing against a table containing 100,000 rows, I achieve an average execution time of 2.7 seconds for the SQLCLR query, compared to over 5 minutes for the TSQL equivalent. String Manipulation To compare the performance of string-handling functions between T-SQL and SQLCLR, I wanted to come up with a fair, practical test. The problem is that there are lots of ingenious techniques for working with string data: in T-SQL, some of the best performing methods use one or more common table expressions (CTEs), CROSS APPLY operators, or number tables; or convert text strings to XML in or order to perform nontrivial manipulation of character data. Likewise, in SQLCLR, the techniques available differ considerably depending on whether you use the native String methods or those provided by the StringBuilder class. I decided that, rather than try to define a scenario that required a string-handling technique, the only fair test was to perform a direct comparison of two built-in methods that provided the equivalent functionality in either environment. I decided to settle on the T-SQL CHARINDEX and .NET’s String.IndexOf(), each of which searches for and returns the position of one string inside another string. For the purposes of the test, I created nvarchar(max) strings of different lengths, each composed entirely of the repeating character a. I then appended a single character x onto the end of each string, and timed the performance of the respective methods to find the position of that character over 10,000 iterations. 183
6. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS The following code listing demonstrates the T-SQL method: CREATE PROCEDURE SearchCharTSQL ( @needle nchar(1), @haystack nvarchar(max) ) AS BEGIN PRINT CHARINDEX(@needle, @haystack); END; And here’s the() CLR equivalent: [SqlProcedure] public static void SearchCharCLR(SqlString needle, SqlString haystack) { SqlContext.Pipe.Send( haystack.ToString().IndexOf(needle.ToString()).ToString() ); } Note that the starting position for CHARINDEX is 1-based, whereas the index numbering used by IndexOf() is 0-based. The results of each method will therefore differ, but they will have done the same amount of work obtaining that result. I tested each procedure as follows, substituting different parameter values for the REPLICATE method to change the length of the string to be searched: DECLARE @needle nvarchar(1) = 'x'; DECLARE @haystack nvarchar(max); SELECT @haystack = REPLICATE(CAST('a' AS varchar(max)), 8000) + 'x'; EXEC dbo.SearchCharTSQL @needle, @haystack; The execution times required for 10,000 runs of each method are shown in Figure 7-3. Figure 7-3. Comparing performance of CHARINDEX against String. ()IndexOf() 184
7. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS As with the prime number sieve example given earlier, the logic required for string searching, matching, and replacing is best suited to the highly efficient routines provided by the .NET Base Class Library. If you currently have code logic that relies heavily on T-SQL string functionality including CHARINDEX, PATINDEX, or REPLACE, I highly recommend that you investigate the alternative options available through SQLCLR—you might be surprised by the performance gain you achieve. Enhancing Service Broker Scale-Out with SQLCLR Having discussed some of the theory behind working with the SQLCLR and given some isolated performance comparisons, let’s now turn our attention to a more detailed example that puts these ideas into practice. Service Broker is frequently mentioned as an excellent choice for helping to scale out database services. One of the more compelling use cases is a Service Broker service that can be used to asynchronously request data from a remote system. In such a case, a request message would be sent to the remote data service from a local stored procedure, which could do some other work while waiting for the response—the requested data—to come back. There are many ways to architect such a system, and given that Service Broker allows messages to be sent either as binary or XML, I wondered which would provide the best overall performance and value from a code reuse perspective. In the following sections, I’ll guide you through my investigations into XML and binary serialization using SQLCLR. XML Serialization I started working with the HumanResources.Employee table from the AdventureWorks2008 database as a sample data set, imagining a remote data service requesting a list of employees along with their attributes. After some experimentation, I determined that the FOR XML RAW option is the easiest way to serialize a table in XML format, and I used the ROOT option to make the XML valid: DECLARE @x xml; SET @x = ( SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees') ); GO XML is, of course, known to be an extremely verbose data interchange format, and I was not surprised to discover that the data size of the resultant XML is 105KB, despite the fact that the HumanResources.Employee table itself has only 56KB of data. I experimented with setting shorter column names, but it had very little effect on the size and created what I feel to be unmaintainable code. Next, I set up a trace to gather some idea of the performance of the XML serialization (for more information on traces, refer to Chapter 3). The trace results revealed that the average execution time for the preceding query on my system, averaged over 1,000 iterations, was a decidedly unimpressive 3.9095 seconds per iteration. After some trial and error, I discovered that XML serialization could be made to perform better by using the TYPE directive, as follows: DECLARE @x xml; SET @x = ( 185
8. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees'), TYPE ); GO This change brought the average time per iteration down slightly, to 3.6687 seconds—an improvement, but still not a very good result. XML Deserialization Even though XML serialization had not yielded impressive performance, I decided to carry on and test deserialization. The first problem was the code required to deserialize the XML back into a table. In order to get back the same table I started with, I had to explicitly define every column for the result set; this made the code quite a bit more complex than I’d hoped for. Furthermore, since the XQuery value syntax does not support the hierarchyid datatype, the values in the OrganizationNode column must be read as nvarchar and then CAST to hierarchyid. The resulting code is as follows: DECLARE @x xml; SET @x = ( SELECT * FROM HumanResources.Employee FOR XML RAW, ROOT('Employees'), TYPE ); SELECT col.value('@BusinessEntityID', 'int') AS BusinessEntityID, col.value('@NationalIDNumber', 'nvarchar(15)') AS NationalIDNumber, col.value('@LoginID', 'nvarchar(256)') AS LoginID, CAST(col.value('@OrganizationNode', 'nvarchar(256)') AS hierarchyid) AS OrganizationNode, col.value('@JobTitle', 'nvarchar(50)') AS JobTitle, col.value('@BirthDate', 'datetime') AS BirthDate, col.value('@MaritalStatus', 'nchar(1)') AS MaritalStatus, col.value('@Gender', 'nchar(1)') AS Gender, col.value('@HireDate', 'datetime') AS HireDate, col.value('@SalariedFlag', 'bit') AS SalariedFlag, col.value('@VacationHours', 'smallint') AS VacationHours, col.value('@SickLeaveHours', 'smallint') AS SickLeaveHours, col.value('@CurrentFlag', 'bit') AS CurrentFlag, col.value('@rowguid', 'uniqueidentifier') AS rowguid, col.value('@ModifiedDate', 'datetime') AS ModifiedDate FROM @x.nodes ('/Employees/row') x (col); GO The next problem was performance. When I tested deserializing the XML using the preceding query, performance went from poor to downright abysmal—averaging 6.8157 seconds per iteration. At this point, I decided to investigate SQLCLR options for solving the problem, focusing on both reuse potential and performance. 186
9. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS Binary Serialization with SQLCLR My first thought was to return binary serialized DataTables; in order to make that happen, I needed a way to return binary-formatted data from my CLR routines. This of course called for .NET’s BinaryFormatter class, so I created a class called serialization_helper, cataloged in an EXTERNAL_ACCESS assembly (required for System.IO access): using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Security.Permissions; using System.Runtime.Serialization.Formatters.Binary; public partial class serialization_helper { public static byte[] getBytes(object o) { SecurityPermission sp = new SecurityPermission( SecurityPermissionFlag.SerializationFormatter); sp.Assert(); BinaryFormatter bf = new BinaryFormatter(); using (System.IO.MemoryStream ms = new System.IO.MemoryStream()) { bf.Serialize(ms, o); return(ms.ToArray()); } } public static object getObject(byte[] theBytes) { using (System.IO.MemoryStream ms = new System.IO.MemoryStream(theBytes, false)) { return(getObject(ms)); } } public static object getObject(System.IO.Stream s) { SecurityPermission sp = new SecurityPermission( SecurityPermissionFlag.SerializationFormatter); sp.Assert(); 187
13. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS //Serialize and return the output return new SqlBytes( serialization_helper.getBytes(theList.ToArray())); } Once this function is created, calling it is almost identical to calling GetDataTable_Binary: DECLARE @sql nvarchar(max); SET @sql = 'SELECT BusinessEntityID, NationalIDNumber, LoginID, OrganizationNode.ToString(), OrganizationLevel, JobTitle, BirthDate, MaritalStatus, Gender, HireDate, SalariedFlag, VacationHours, SickLeaveHours, CurrentFlag, rowguid, ModifiedDate FROM HumanResources.Employee' DECLARE @x varbinary(max); SET @x = dbo.GetBinaryFromQueryResult(@sql); GO The result, 57KB worth of binary data, represented a 15 percent reduction in size compared to the DataTable method. If using this method to transfer data between broker instances on remote servers, the associated decrease in network traffic could make a big difference to performance. What’s more, the serialization performance using SqlDataReader was the fastest yet, with an average query execution time of just 0.0490 seconds. Binary Deserialization Pleased with the results of binary serialization using SQLCLR, I decided to go ahead with deserialization. Continuing with my stress on reuse potential, I decided that a stored procedure would be a better choice than a UDF. A stored procedure does not have a fixed output as does a UDF, so any input table can be deserialized and returned without worrying about violating column list contracts. The first part of the stored procedure follows: [Microsoft.SqlServer.Server.SqlProcedure] public static void GetTableFromBinary(SqlBytes theTable) { //Deserialize the input object[] dt = (object[])( serialization_helper.getObject(theTable.Value)); 191
14. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS //First, get the fields object[] fields = (object[])(dt[0]); SqlMetaData[] cols = new SqlMetaData[fields.Length]; //Loop over the fields and populate SqlMetaData objects for (int i = 0; i
15. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS } break; default: cols[i] = new SqlMetaData( (string)field[0], dbType); break; } } Once population of the columns collection has been completed, the data can be sent back to the caller using the SqlPipe class’s SendResults methods. After starting the stream, the remainder of the objects in the input collection are looped over, cast to object[], and sent back as SqlDataRecords: //Start the result stream SqlDataRecord rec = new SqlDataRecord(cols); SqlContext.Pipe.SendResultsStart(rec); for (int i = 1; i < dt.Length; i++) { rec.SetValues((object[])dt[i]); SqlContext.Pipe.SendResultsRow(rec); } //End the result stream SqlContext.Pipe.SendResultsEnd(); } Not only had the binary serialization test yielded positive results, it turns out that deserialization of data prepared in this manner is exceptionally fast compared with the alternatives. The performance test revealed that average time for deserialization of the SqlDataReader data was just 0.2666 seconds—an order of magnitude faster than deserialization of similar XML. The results of the fastest refinements of each of the three methods discussed in this section are shown in Table 7-1. Table 7-1. Results of different serialization approaches Method Average Serialization Time Average Deserialization Time Size XML (with TYPE) 3.6687 6.8157 105KB Binary (DataTable) 0.0576 68KB Binary (SqlDataReader) 0.0490 0.2666 57KB The combination of better network utilization and much faster serialization/deserialization demonstrated in this example shows how SQLCLR can be a great technique for transferring tabular data between Service Broker instances in scale-out and distributed processing scenarios. 193
16. CHAPTER 7 SQLCLR: ARCHITECTURE AND DESIGN CONSIDERATIONS Summary Getting the most out of SQLCLR routines involves a bit of thought investment. Up-front design and architecture considerations will yield great benefits in terms of security, reliability, and performance. You should also consider reuse at every stage, in order to minimize the amount of work that must be done when you need the same functionality six months or a year down the road. If you’ve already coded it once, why code it again? To illustrate these concepts, I showed an example that serialized tables using the BinaryFormatter, which could be used to extend SQL Server Service Broker. I used a common, core set of more highly privileged utility assemblies in order to limit the outer surface area, and tried to design the solution to promote flexibility and potential for use in many projects throughout the lifetime of the code. 194
17. CHAPTER 8 Dynamic T-SQL The general objective of any software application is to provide consistent, reliable functionality that allows users to perform given tasks in an effective manner. The first step in meeting this objective is therefore to keep the application bug-free and working as designed, to expected standards. However, once you’ve gotten past these basic requirements, the next step is to try to create a great user experience, which raises the question, “What do the users want?” More often than not, the answer is that users want flexible interfaces that let them control the data the way they want to. It’s common for software customer support teams to receive requests for slightly different sort orders, filtering mechanisms, or outputs for data, making it imperative that applications be designed to support extensibility along these lines. As with other data-related development challenges, such requests for flexible data output tend to fall through the application hierarchy, eventually landing on the database (and, therefore, the database developer). This is especially true in web-based application development, where client-side grid controls that enable sorting and filtering are still relatively rare, and where many applications still use a lightweight two-tier model without a dedicated business layer to handle data caching and filtering. “Flexibility” in the database can mean many things, and I have encountered some very interesting approaches in applications I’ve worked with over the years, often involving creation of a multitude of stored procedures or complex, nested control-of-flow blocks. These solutions invariably seem to create more problems than they solve, and make application development much more difficult than it needs to be by introducing a lot of additional complexity in the database layer. In this chapter, I will discuss how dynamic SQL can be used to solve these problems as well as to create more flexible stored procedures. Some DBAs and developers scorn dynamic SQL, often believing that it will cause performance, security, or maintainability problems, whereas in many cases it is simply that they don’t understand how to use it properly. Dynamic SQL is a powerful tool that, if used correctly, is a tremendous asset to the database developer’s toolbox. There is a lot of misinformation floating around about what it is and when or why it should be used, and I hope to clear up some myths and misconceptions in these pages. Note Throughout this chapter, I will illustrate the discussion of various methods with performance measures and timings recorded on my laptop. For more information on how to capture these measures on your own system environment, please refer to the discussion of performance monitoring tools in Chapter 3. 195