Monday, 12 January 2009

Adventures in Filestream Data

Regular readers of this blog may have seen my previous article on using SQL Server 2008's new spatial data support to create the Beanie Tracker Web application. The application tracks the travels of a small white bear named Beanie, using Virtual Earth to show photos of him at various locations around the world. The idea for creating the application came from a whitepaper and a hands-on lab I created for the SQL Server marketing team at Microsoft.

As part of the same project for the SQL Server marketing team, I wrote a whitepaper on managing unstructured data in SQL Server 2008, which includes a short description of another new feature: FILESTREAM data. The idea behind FILESTREAM is to help solve a common problem in database solutions that need to store binary large object (BLOB) data such as images or multimedia objects. In the past, you've basically had two choices for storing BLOBs; you can store them in the database in varbinary columns, or you can store them on the file system and include a file path reference in the database so that applications can find them. The debate about which of these approaches is the best one is well documented, and usually boils down to a tradeoff between the performance advantages and data access flexibility of storing the data on the file system versus the manageability benefits of storing the data in the database. FILESTREAM support in SQL Server 2008 is designed to offer a best of both worlds approach in which the data is stored physically on the file system, but managed as if it were stored in the database. Additionally, a dual programming model enables the data to be accessed through Transact-SQL statements or through the Win32 file streaming API. FILESTREAM is supported in all editions of SQL Server 2008 (other than Compact Edition) - including SQL Server Express; and it's worth noting that because the data is physically stored in the file system, your binary data is not counted when restricting the size of a SQL Server Express database to 4 GB. To try all this for myself, I decided to modify the Beanie Tracker application to store the photos of Beanie as FILESTREAM data. This proved remarkably simple.

The first step was to configure FILESTREAM support in the SQL Server instance. To do this, you view the properties of the SQL Server instance in SQL Server Configuration Manager and set the appropriate values on the FILESTREAM tab as shown here:


Next you need to use sp_configure to enable FILESTREAM as shown here:

EXEC sp_configure filestream_access_level, 2

Now that FILESTREAM is enabled, you can create a database that can store FILESTREAM data:

ON PRIMARY(NAME=BeanieData, FILENAME='c:\data\BeanieData.mdf'),
LOG ON(NAME=BeanieLog, FILENAME='c:\data\BeanieLog.ldf')

Note the inclusion of a filegroup that contains FILESTREAM data. The FILENAME parameter for this filegroup is the folder on the file system where the FILESTREAM data will be stored. If you take a look at this location in Windows Explorer, you'll see the files and folders that SQL Server uses to store the data as shown here (you need to give yourself permission to view the folder):


Now you can create a table with a varbinary(max) column for your FILESTREAM data. By specifying the FILESTREAM attribute for the column, you ensure that the data is stored in the FILESTREAM filegroup (and therefore on the file system) rather than in the database data pages:

([PhotoID] uniqueidentifier ROWGUIDCOL NOT NULL PRIMARY KEY,
[Description] nvarchar(200),
[Photo] varbinary(max) FILESTREAM NULL,
[Location] geography)

Note that you also need to include a ROWGUID column when storing FILESTREAM data. In the previous version of the Beanie Tracker database, the PhotoID column was an integer. I changed it to a uniqueidentifier column in this version.

After the table has been created, you can treat the FILESTREAM column just as if it were an ordinary varbinary column. For example, here's a Transact-SQL statement to insert a photo from an existing .jpg file:

([PhotoID], [Description], Photo, Location)
(Newid(),'Beanie in Paris',
geography::STPointFromText('POINT (2.328 48.8661)', 4326))

To access the data from a client application, you can use the Win32 streaming API to read the data from the file system, or you can use Transact-SQL. The client application does not need to do anything special for FILESTREAM data. Here's the Visual Basic .NET code in the Beanie Tracker Web application to retrieve the photo for a specific location:

Dim PhotoID As String = Request.QueryString("PhotoID")
'Connect to the database and bring back the image contents & MIME type for the specified picture
Using myConnection As New SqlConnection(ConfigurationManager.ConnectionStrings("BeanieTracker").ConnectionString)
Const SQL As String = "SELECT [Photo] FROM [Photos] WHERE [PhotoID] = @PhotoID"
Dim myCommand As New SqlCommand(SQL, myConnection)
myCommand.Parameters.AddWithValue("@PhotoID", PhotoID)
Dim myReader As SqlDataReader = myCommand.ExecuteReader
If myReader.Read Then
  Response.ContentType = "image/jpeg"
End If
End Using

The page containing this code is requested by the "tootip" in the Virtual Earth control when the user hovers the mouse over a pin on the map, so the appropriate image of Beanie is displayed in as shown here:







So all in all, it's pretty easy to use FILESTREAM data in a SQL Server database. In this example, I get the advantages of storing my images on the file system, freeing up valuable space in a SQL Server Express database and enabling access via Transact-SQL or the Win32 streaming API. However, I also get the manageability advantages of storing the data in a the database, so it's included when I perform a backup for example.