CLR User-Defined Functions - SQL Server 2008

SQL Server 2000 introduced T-SQL user-defined functions, a feature that has greatly improved the programmability of SQL Server. Scalar user-defined functions allow developers to easily maximize encapsulation and reuse of logic. They return a single, scalar value based on zero or more parameters. These types of functions are useful for defining “black-box” methods; for instance, logic that needs to be used in exactly the same way throughout many stored procedures can be embedded in a scalar function. If the logic ever needs to change, only the function needs to be modified. Table valued user-defined functions, on the other hand, can be thought of as parameterized views. These functions return a rowset of one or more columns and are useful for situations in which a view can return too much data. Because these functions are parameterized, developers can force users to filter the returned data.

Much like T-SQL user-defined functions, CLR functions come in both scalar and table-valued varieties. Scalar functions must return a single value of an intrinsic SQL Server type (an integer or a string). Table-valued functions, on the other hand, must return a single, well-defined table. This is in contrast to stored procedures, which can return both an integer value and one or more tables, at the same time. Also unlike stored procedures, functions do not support output parameters.

CLR functions are also similar to T-SQL functions in that data manipulation from within a function is limited. When using the context connection (covered in Chapter 14), data cannot be modified. Connecting via a noncontext connection does allow data modification, but this is not recommended in most scenarios, due to the fact that a scalar function can be called once per row of a table and the data modification could occur on each call, incurring a large performance hit compared to doing a single modification using set-based techniques.

Much like CLR stored procedures, the key to deciding when to use a CLR user-defined function instead of a T-SQL user-defined function is necessity of the power afforded by the .NET base class library. If a T-SQL user-defined function can do the job in question, T-SQL is preferred—most of the time it will deliver better performance and quicker development turnaround. However, for those cases in which additional functionality is required—such as the compression example in this chapter— CLR user-defined functions will prove invaluable.

In this section, we’ll look at scalar CLR user-defined functions that enable binary data compression in the database server and table-valued CLR user-defined functions that return rowsets from various sources.

Adding a User-Defined Function to aVisual Studio Project

To add a function to a preexisting SQL Server project in Visual Studio 2008, right-click the project name in Solution Explorer and select Add ➤User-Defined Function, as shown in Figure.

Adding a user-defined function to a SQL Server project

Adding a user-defined function to a SQL Server project

The Visual Studio 2008 User-Defined Function Template

Adding a user-defined function called NewFunction to a Visual Studio 2008 SQL Server project will produce a template similar to the following:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString NewFunction()
{
// Put your code here
return new SqlString("Hello");
}
};

This template is quite a bit simpler than the user-defined type template shown previously in this chapter. A user-defined function requires nothing more than a public static method decorated with the SqlFunction attribute. The function shown here is a scalar function that returns a SqlString. A few additions are necessary to create a table-valued function. Let’s first take a look at the SqlFunction attribute.

The SqlFunction Attribute

The SqlFunction attribute has several parameters, none of which is required:

  • DataAccess: This parameter can be set to one of two values of the DataAccessKind enumerator. The possible values are None and Read. A value of None indicates that the function performs no data access, and this is enforced by the SQL Server engine. Attempts to perform data access will be met with an exception. Read indicates that the function is allowed to read data from the context connection (execute a T-SQL query against the database). User-defined functions cannot modify data in the context connection. The default value for this parameter is None.
  • FillRowMethodName: This parameter is used to define a method for outputting rows in a table-valued user-defined function. See the section “Defining a Table-Valued User-Defined Function” later in this chapter for information on building table-valued functions.
  • IsDeterministic: This parameter indicates whether a scalar function should be treated by the query optimizer as deterministic. This means that every call with the same exact input parameters will yield the same exact output. For instance, a function that adds 1 to an input integer is deterministic; the output will always be the input value plus 1. On the other hand, the GETDATE function is nondeterministic; a call with the same set of input parameters (none; it has no input parameters) can yield different output as time passes. Certain SQL Server features, such as indexed views, depend on determinism, so treat this parameter carefully. The default value is false.
  • IsPrecise: This parameter allows the developer to specify whether the function internally rounds or truncates, thereby eliminating precision of the result. Even a function that does not use floating-point numbers as inputs or outputs may be imprecise if floating-point arithmetic is used within the function. Knowing whether the results are precise can help the optimizer when calculating values for indexed views and other features. To be on the safe side, always set this parameter’s value to false when working with floating-point computations. The default value for this parameter is false.
  • Name: This parameter is used by Visual Studio (and possibly other third-party tools) to override the name that will be used for the user-defined function when deployed. If this parameter is set, the name specified in the parameter will be used. Otherwise, the name of the method decorated with the SqlFunctionAttribute will be used.
  • SystemDataAccess: This parameter determines whether the function has access to system data from the context connection. Possible values for this parameter are the two values of the SystemDataAccessKind enumerator: None and Read. If the value is set to None, the function will not be able to access views in the sys schema. The default value is None.
  • TableDefinition: This parameter is used to define the output table format for table-valued user-defined functions. Its input is a string-literal column list, defined in terms of SQL Server types and/or user-defined types. This parameter is covered in more detail in the section, “Defining a Table-Valued User-Defined Function” later in this chapter.

Scalar User-Defined Functions

When most developers think of functions, they think of scalar functions, which return exactly one value. The utility of such functions is quite obvious. They can be used to encapsulate complex logic such that it doesn’t need to be repeated throughout many queries in the database. By using scalar functions, developers can ensure that changes to logic can be made in a single centralized location, which can be a boon for code maintainability.

A somewhat less obvious use for scalar functions, which is made much more desirable by CLR integration, is to expose library functionality not available natively within SQL Server. Examples include such common libraries as regular expressions, enhanced encryption (beyond what SQL Server offers), and data compression. The CLR exposes a variety of very useful libraries that are now easy to consume for T-SQL programming projects.

Binary Data Compression Using a Scalar User-Defined Function

The .NET Framework 3.5 base class library exposes a namespace called System.IO.Compression, which includes classes for compressing data streams using the GZip and Deflate algorithms. The power of these algorithms can be harnessed for data applications by defining scalar functions to compress and decompress binary data. These functions can be used in document repositories to greatly reduce disk space, and moving compression into the data layer means that applications need only be concerned with the data itself, not its on-disk storage format. However, there is a downside to moving compression from the application into the data tier. Compression is expensive in terms of processor and memory utilization. Before moving compression into production databases, ensure that the servers can handle the additional load, lest performance suffer.

The first step in modifying the function template to handle compression is to add using directives for the IO and Compression namespaces.

using System.IO;
using System.IO.Compression;

The System.IO namespace is necessary, as it contains the classes that define streams. A MemoryStream will be used as a temporary holder for the bytes to be compressed and decompressed.

To facilitate the compression using the GZip algorithm, the function must create a memory stream using binary data from the SQL Server caller and pass the stream to the specialized GZipStream to get the compressed output. The BinaryCompress function is implemented in the following code:

[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBytes BinaryCompress(SqlBytes inputStream)
{
using (MemoryStream ms = new MemoryStream())
{
using (GZipStream x =
new GZipStream(ms, CompressionMode.Compress, true))
{
byte[] inputBytes = (byte[])inputStream.Value;
x.Write(inputBytes, 0, inputBytes.Length);
}
return (new SqlBytes(ms.ToArray()));
}
}

Note that this function uses the SqlBytes type for both input and output. The SqlTypes namespace includes definitions of both the SqlBinary and SqlBytes types, and according to the .NET documentation, these are equivalent. However, the Visual Studio SQL Server Project deployment task treats them differently. SqlBinary is mapped to SQL Server’s varbinary(8000) type, whereas SqlBytes is mapped to varbinary(max), which can store 2GB of data per instance. Since this function is intended for compression of large documents to be stored in a SQL Server database, varbinary(max) makes a lot more sense. Limiting the document size to 8,000 bytes would be quite restrictive.

For developers working with character data instead of binary, note that this same situation exists with the SqlString and SqlChars types. The former maps to nvarchar(4000); the latter maps to nvarchar(max). Also note that these are mappings as done by Visual Studio only. In the case of manual deployments, these mappings do not apply—SqlString will behave identically to SqlChars for any size nvarchar, and SqlBinary will be interchangeable with SqlBytes for any size varbinary.

You should also consider the use of the using statement within the function. This statement defines a scope for the defined object, at the end of which the Dispose method is called on that object, if the type implements IDisposable. It is generally considered good practice in .NET development to use the usingstatement when working with types that implement IDisposable, such that a call to Dispose is guaranteed.

This is doubly important when working in the SQL Server-hosted CLR environment, since both the database engine and the CLR will be contending for the same resources. Calling Dispose helps the CLR to more quickly clean up the memory consumed by the streams, which can be considerable if a large amount of binary data is passed in.

Decompression using GZipStream is very similar to compression, except that two memory streams are used. The following function implements decompression:

[Microsoft.SqlServer.Server.SqlFunction]
public static SqlBytes BinaryDecompress(SqlBytes inputBinary)
{
byte[] inputBytes = (byte[])inputBinary.Value;
using (MemoryStream memStreamIn = new MemoryStream(inputBytes))
{
using (GZipStream s =
new GZipStream(memStreamIn, CompressionMode.Decompress))
{
using (MemoryStream memStreamOut = new MemoryStream())
{
for (int num = s.ReadByte(); num != -1; num = s.ReadByte())
{
memStreamOut.WriteByte((byte)num);
}
return (new SqlBytes(memStreamOut.ToArray()));
}
}
}
}

Using the Compression Routines

The code can now be compiled and deployed using either the Visual Studio deployment task or manuallywith the T-SQL CREATE FUNCTION statement (see the upcoming section titled “Managing CLR User-Defined Functions” for more information). To compress data, simply use the BinaryCompression function the same way any T-SQL function would be used. For instance, to get the compressed binary for all of the documents in the Production.Document table in the AdventureWorks database, you could use the following T-SQL:

SELECT dbo.BinaryCompress(Document)
FROM Production.Document

And, of course, the output of the BinaryCompress function can be passed to BinaryDecompress to get back the original binary.

You should take care to ensure that the data being compressed is data that can be compressed. The nature of the GZip algorithm is such that uncompressable data will actually produce a larger output—the opposite of the goal of compression. For instance, you could use the following query to compare compressed and uncompressed data sizes for documents in the Production.Document table:

SELECT
DATALENGTH(Document),
DATALENGTH(dbo.BinaryCompress(Document))
FROM Production.Document

The results of this query show that, on average, compression rates of around 50 percent are seen. That’s not bad. But trying the experiment on the photographs in the Production.ProductPhoto table has a slightly different outcome. The result of compressing that data show around a 50 percent increase in data size! The following query can be used to test the photograph data:

SELECT
DATALENGTH(LargePhoto),
DATALENGTH(dbo.BinaryCompress(LargePhoto))
FROM Production.ProductPhoto

The lesson to be learned is to always test carefully. Compression can work very well in many cases, but it can incur hidden costs if developers are not aware of its caveats.

Table-Valued User-Defined Functions

User-defined functions, as mentioned previously, come in two varieties: scalar and table-valued. The former must return exactly one value, whereas the latter can return a table of values, with many columns and rows. A table-valued user-defined function can be thought of as a parameterized view: the query logic is encapsulated within the body of the function, and parameters can be passed in to control the output. In addition, a table-valued function can be used anywhere in T-SQL queries that a view can.

CLR user-defined functions are somewhat different from their T-SQL counterparts, in that they have the capability to stream their data to the client (the calling SQL Server process) instead of writing it to a temporary table as multistatement T-SQL user-defined functions do. This can mean, in some cases, that CLR user-defined functions will be able to outperform their T-SQL counterparts. Remember, however, that as with any performance-boosting methodology, you should test both methods in most cases to ensure that you make the best choice.

Defining a Table-Valued User-Defined Function

Creating a table-valued user-defined function involves defining a function that returns an instance of a collection that implements the IEnumerable interface. This collection will be enumerated by the query engine, and that enumeration will result in calling a second function for each member of the collection, in order to map its attributes to a series of output parameters that map to the column list for the table.

This process is better described using a concrete example. Assume that you wish to encapsulate the following query in a user-defined function:

SELECT Name, GroupName FROM HumanResources.Department

This query can be evaluated and used to populate a DataTable. Since the DataTable class implements IEnumerable, it is a valid return type for a table-valued function. The following code defines a method called GetDepartments that retrieves and returns the data using a context connection:

[Microsoft.SqlServer.Server.SqlFunction(
DataAccess=DataAccessKind.Read,
FillRowMethodName="GetNextDepartment",
TableDefinition="Name nvarchar(50), GroupName nvarchar(50)")]
public static IEnumerable GetDepartments()
{
using (SqlConnection conn =
new SqlConnection("context connection=true;"))
{
string sql =
"SELECT Name, GroupName FROM HumanResources.Department";
conn.Open();
SqlCommand comm = new SqlCommand(sql, conn);
SqlDataAdapter adapter = new SqlDataAdapter(comm);
DataSet dSet = new DataSet();
adapter.Fill(dSet);
return (dSet.Tables[0].Rows);
}
}

Aside from the fact that this method contains no exception-handling logic—and will behave very poorly if the Department table is empty—an important thing to note in this code listing is the SqlFunction attribute. Since the function is reading data from the database using the context connection, the DataAccess parameter is set to DataAccessKind.Read.

But more important, because this is a table-valued function, both the FillRowMethodName and TableDefinition parameters are used. The FillRowMethodName parameter defines the name of the method that will be used to map each member of the IEnumerable collection returned by the method to a column. The column list that the method must support is defined by the TableDefinition parameter.

In this case, the method is called GetNextDepartment. The method must have a single input parameter of type object, followed by an output parameter for each column defined in the TableDefinition parameter. The following code implements the GetNextDepartment method:

public static void GetNextDepartment(object row,
out string name,
out string groupName)
{
DataRow theRow = (DataRow)row;
name = (string)theRow["Name"];
groupName = (string)theRow["GroupName"];
}

When the user-defined function is called, it will return a reference to the DataTable, which implements IEnumerable. The SQL Server engine will call MoveNext (one of the methods defined in the IEnumerator interface, which is required by IEnumerable) on the DataTable for each row of output. Each call to MoveNext will return an instance of a DataRow, which will then be passed to the GetNextDepartment function. Finally, that function will map the data in the row to the proper output parameters, which will become the columns in the output table.

This architecture is extremely flexible in terms of ability to define output columns. If a DataTable or other collection that implements IEnumerable does not exist in the .NET class library to satisfy a given requirement, it is simple to define a type that does. Keep in mind that the output types can be either intrinsic SQL Server types or user-defined types. This added flexibility is a sign of the tight integration provided since SQL Server 2005 for the hosted CLR environment.

The full code for the GetDepartments function follows:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(
DataAccess=DataAccessKind.Read,
FillRowMethodName="GetNextDepartment",
TableDefinition="Name nvarchar(50), GroupName nvarchar(50)")]
public static IEnumerable GetDepartments()
{
using (SqlConnection conn =
new SqlConnection("context connection=true;"))
{
string sql =
"SELECT Name, GroupName FROM HumanResources.Department";
conn.Open();
SqlCommand comm = new SqlCommand(sql, conn);
SqlDataAdapter adaptor = new SqlDataAdapter(comm);
DataSet dSet = new DataSet();
adaptor.Fill(dSet);
return (dSet.Tables[0].Rows);
}
}
public static void GetNextDepartment(object row,
out string name,
out string groupName)
{
DataRow theRow = (DataRow)row;
name = (string)theRow["Name"];
groupName = (string)theRow["GroupName"];
}
};

References in CLR Projects: Splitting the StringArray into a Table

An important feature to be aware of when working with SQL Server’s CLR integration is that assembliesloaded into SQL Server databases can reference one another. On top of that, not every assembly loaded into the database needs to expose SQL Server routines or types. A developer can, therefore, reference third-party libraries within SQL CLR classes or reference other SQL CLR classes.

To reference a third-party assembly within another assembly to be loaded within a SQL Server database, the third-party assembly must first be loaded using CREATE ASSEMBLY. For instance, assuming an assembly called My Third Party Assembly was in the C:Assemblies folder, the following code would load it into a SQL Server database:

CREATE ASSEMBLY MyThirdPartyAssembly
FROM 'C:AssembliesMyThirdPartyAssembly.DLL'
WITH PERMISSION_SET = EXTERNAL_ACCESS

Note that the permission set as defined on the assembly will be enforced, even if the referencing assembly is given more permission. Therefore, even if an assembly that references MyThirdPartyAssembly has the UNSAFE permission, any operation that occurs within MyThirdPartyAssembly will be limited to those allowed by EXTERNAL_ACCESS.

When working in Visual Studio 2008, a reference can be added to a SQL Server project only once the assembly to be referenced has been loaded into the database using either CREATE ASSEMBLY or a Visual Studio deployment task. To add a reference to an assembly that has already been loaded, right-click References in Solution Explorer and select Add Reference. A dialog box like the one shown in Figure appears. Select the assembly to reference and click OK. Figure shows adding a reference to the StringArray assembly defined earlier in this chapter.

Adding a reference to the StringArray assembly

Adding a reference to the StringArray assembly

Once a reference has been added, the referenced assembly can be treated like any other library. A using directive can be used to alias namespaces, and any public classes, properties, and methods are available.

The following code defines a table-valued user-defined function that takes an instance of the StringArray type as input and outputs a table:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction(FillRowMethodName = "GetNextString",
TableDefinition = "StringCol nvarchar(max)")]
public static IEnumerable GetTableFromStringArray(StringArray strings)
{
string csv = strings.ToString();
string[] arr = csv.Split(',');
return arr;
}
public static void GetNextString(object row, out string theString)
{
theString = (string)row;
}
};

The GetTableFromStringArray method retrieves the comma-delimited list of values from the StringArray using the ToString method. This is then split into an array using the String.Split method. Since all arrays are derived from System.Array, and since that class implements IEnumerable, this collection is valid for a return value without any further manipulation.

Each element of the array is nothing more than a string, so the GetNextString method merely needs to cast the row as a string and set theString appropriately. The result is a table of strings that can be joined to another table, inserted into a table, or returned to a client as a result set.

Note that in a real-world scenario, it might make more sense to define the GetTableFromStringArray method to directly consume a comma-delimited list instead of the StringArray type. This would extend the method beyond the 8,000-character limit imposed by CLR user-defined types and make it slightly more flexible. The example listed here is mainly intended to convey the utility of assembly references, and as such, it may not be the best possible solution in every case.

Managing CLR User-Defined Functions

If an assembly has been loaded into the database using CREATE ASSEMBLY, functions can be created or dropped without using the Visual Studio deployment task.

To create a function that is exposed in an assembly, use CREATE FUNCTION and specify the name of the assembly, the name of the class the function resides on, and the name of the method that defines the function. The following code creates the BinaryCompress type, from an assembly called CompressionRoutines, that contains a class called UserDefinedFunctions:

CREATE FUNCTION BinaryCompress(@Input varbinary(max))
RETURNS varbinary(max)
AS
EXTERNAL NAME CompressionRoutines.UserDefinedFunctions.BinaryCompress

To drop a function, use DROP FUNCTION. A function cannot be dropped if it is referenced by a constraint or schema-bound view. The following code drops the BinaryCompress function:

DROP FUNCTION BinaryCompress

Functions can also be altered by using ALTER FUNCTION, which is generally used to modify the input or output datatypes. For instance, you may wish to modify the BinaryCompress function, limiting the input to 1,000 bytes:

ALTER FUNCTION BinaryCompress(@Input varbinary(1000))
RETURNS varbinary(MAX)
AS
EXTERNAL NAME CompressionRoutines.UserDefinedFunctions.BinaryCompress

Although there is no dedicated view for user-defined functions, they can be enumerated using the sys.objects catalog view. To do so, use the type column and filter on FT for table-valued CLR functions or FS for scalar CLR functions. The following T-SQL will return data about both types:

SELECT *
FROM sys.objects
WHERE type in ('FS', 'FT')

All rights reserved © 2018 Wisdom IT Services India Pvt. Ltd DMCA.com Protection Status

SQL Server 2008 Topics