Graeme's Place

Auditing for Microsoft Azure SQL Database

2014-09-10T14:20:00.001+01:00

I was recently asked to contribute a module on Azure SQL Database to a new course on Azure for IT Professionals. This gave me an excuse to experiment with the new Auditing feature, which is currently in preview. Auditing enables you to record events that occur in your Azure database, such as logins, data access, data updates, and schema changes; and is an important addition to Azure SQL Database that supports security and compliance requirements in many organizations.

Setting up Auditing is pretty straightforward, as you can see in this video.

To learn more about auditing in Azure SQL database, see http://azure.microsoft.com/en-us/documentation/articles/sql-database-auditing-get-started/.

Microsoft Azure SQL Database Self-Service Restore

2014-05-10T11:25:00.001+01:00

You may have missed the recent enhancements to Azure SQL Database service tiers, which Microsoft announced last month. There are three new service tiers (or “editions” if you prefer) of SQL Database that are currently available in preview – basic, standard, and premium. That’s currently in addition to the Web and Business service tiers that were available previously, though these are being retired over the next year. The performance and scalability options that go with these new tiers are pretty compelling, and may make a good subject for a future post. However, the other key new feature that comes with these features is a self-service restore feature that provides a basic disaster recovery capability.

When you create a database with one of the new service tiers, Azure automatically maintains backups that you can use to back-out unintentional changes or recover an accidentally deleted database. The specific options for restoring a database depends on the service tier:

Basic tier databases can be restored to the most recent daily backup. Backups are retained for 24 hours.
Standard tier databases can be restored to a specific point in time, and backups are retained for 7 days.
Premium databases can be restored to a specific point in time in the last 35 days.

You can restore databases using the Azure management portal, or with PowerShell. Restoring an existing database creates a new database of the same service tier with a name that reflects the date and time to which the database has been recovered. For example, suppose you executed a Transact-SQL command that accidentally deleted the contents of a table in your database. Depending on the service tier, you can restore the database to the most recently available recovery point before the data was deleted. After you’ve verified that the recovered database contains the required data, you can delete the original database and the use ALTER DATABASE statement to rename the restored database to match the original name.

If you delete an entire database, it remains listed in the portal until its retention period has expired. If you accidentally delete a database, you can immediately restore it to the most recently available recovery point (again, depending on the service tier).

Of course, this self-service recovery feature isn’t a replacement for a properly planned disaster recovery solution. However, it’s a nice addition to SQL Database that could potentially save you a lot of pain and stress one day!

Using the Buffer Pool Extension in SQL Server 2014

2013-08-14T17:41:00.001+01:00

SQL Server makes use of physical memory to cache frequently accessed data pages in a buffer pool. This reduces disk I/O and optimizes overall performance. An easy way to improve performance of I/O bound workloads is therefore to simply add more physical memory. However, in some cases, adding memory to a server is not possible – for example because of hardware limitations in the motherboard. Additionally, although the cost of memory continues to drop, when viewed as a per-megabyte cost, RAM is significantly more expensive than disk devices – including solid state disk (SSD) devices, which provide significant performance improvements over mechanical disks.

SQL Server 2014 introduces the buffer pool extension; a feature that enables you to take advantage of non-volatile storage like SSD devices, and use them to extend the buffer pool. In some scenarios, this can be a cost-effective way to improve the performance of database workloads when adding more memory to the server is not an option. With the buffer pool extension enabled, SQL Server 2014 uses the non-volatile storage for clean pages (that is, pages that have been committed), making them faster to retrieve than if they had been paged out of the buffer to their disk storage location. By using the buffer pool extension for only clean pages, the risk of data loss in the event of a server or storage device failure is avoided (in the case of storage device failure, the buffer pool extension is automatically disabled).

The following video demonstrates how to enable and disable the buffer pool extension in SQL Server 2014.

This article is based on the CTP 1 release of SQL Server 2014, and details are subject to change between now and the eventual release of the product. For more information about the buffer pool extension in SQL Server 2014, visit http://blogs.technet.com/b/dataplatforminsider/archive/2013/07/25/buffer-pool-extension-to-ssds-in-sql-server-2014.aspx.

Power Query for Excel Demo

2013-07-25T11:23:00.001+01:00

A couple of weeks ago, I posted a demo that instructors of Microsoft Learning course 20467B can use to demonstrate the Data Explorer add-in for Excel 2013. Since then, Microsoft has rebranded Data Explorer as “Power Query”, and announced that it will form part of the Power BI capabilities in Office 365.

A new version of the add-in is now available here so I’ve updated the demo steps, which you can download from my SkyDrive folder. Other than the renaming of the Data Explorer tab on the ribbon to Power Query, the steps are much the same as they were before, so the following video is still a reasonably good guide to the tool.

Migrating SQL Server Databases to Windows Azure

2013-07-12T17:01:00.001+01:00

When the IT industry started getting excited about this thing called “the cloud” a few years ago, there were many (myself included) who were sceptical about the willingness of organizations to abandon their own on-premises infrastructure and start moving business applications to the web. At first, the idea of relying on IT services hosted and managed by some Internet provider seemed dubious at best – sure, individual consumers could use an email server provided by their ISP, but businesses need to manage their own IT. Don’t they?
Then, gradually, we started to make concessions.
…OK, maybe a a hosted Exchange Server would reduce some administrative overheads.
…Yes, maybe using a hosting service can provide better availability for some Web applications.
…Alright, using a software-as-a-service solution for CRM might reduce licensing and hardware costs.
Fast forward to today, and we’ve come a long way – to the point where even that stalwart of enterprise software packages, Microsoft Office, is now delivered to 1 in 4 of Microsoft’s enterprise customers in the form of the Office 365 cloud service rather than as on-premises desktop software. Despite the initial doubts of the naysayers, it’s beginning to look like this cloud thing might just catch on after all! Most of the applications we use every day, as consumers and increasingly as employees, are delivered as cloud services that we can consume anywhere and on an increasing array of mobile devices. Organizations are seeing this extended reach, while at the same time reducing overheads for hardware, software licensing, maintenance, and other costs – A win-win scenario if ever I saw one.
However, there’s always been one section of the IT community that is even more conservative than the Finance department. One group of IT professionals that that regards even the smallest change with deep suspicion. One last bastion of fierce resistance to new fangled trends. I’m talking of course, about database administrators. You can move my Exchange Server to the cloud. You can store all of our documents in SharePoint Online. You can even deliver Office applications through a browser. But you’ll have to take my on-premises database from my cold, dead hands!
But even that resistance is beginning to crumble. It makes sense for web-hosted applications to store their data on the web, and as more and more IT services are moved to the cloud, it also makes sense to include traditional business application data stores as a part of that migration. They key issues that need to be addressed are:

Can the data be moved to the cloud without compromising security or compliance requirements?
Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
Can a hosted database meet our availability and disaster recovery requirements?
Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?

Then, assuming that the answer to those three questions is “yes” (and in many cases, it is), the only remaining question is:

Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?

Well, let’s see how Windows Azure shapes up in terms of these critical questions.
Windows Azure is Microsoft’s cloud platform, and it underpins many of the cloud services that the company offers. It also provides Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service(IaaS) solutions that organizations can use to build new solutions, and migrate existing IT services to the cloud. In terms of database offerings, there are two primary options to consider:

Windows Azure SQL Database – a PaaS offering that enables you to host data in a (mostly) SQL Server-compatible service without having to worry about configuring and managing hardware or the operating system.
SQL Server in a Windows Azure Virtual Machine – an IaaS offering that is exactly what it sounds like – a virtual machine running Windows with SQL Server installed in it.

Windows Azure SQL Database

OK, let’s start with Windows Azure SQL Database. This was formerly known as SQL Azure, and provides many of the data storage and management features of SQL Server without the need to manage the operating system or SQL Server instance. If your database consists of traditional relational data in tables, with views and stored procedures used to provide a layer of abstraction, then Windows Azure SQL Database may well be a good option for you. It can’t support “special” data, such as spatial or XML data types, and there are a few other limitations see General Guidelines and Limitations (Windows Azure SQL Database) on MSDN for details; but it supports all of the functionality required by a large percentage of typical business application databases.
So, how does it stack up against the questions we asked earlier?
Can the data be moved to the cloud without compromising security or compliance requirements?
It depends on your specific requirements; but network connectivity to the database can be restricted to a specific range of IP addresses, and can be performed over SSL connections. Client requests are authenticated using SQL Server native authentication based on a login and password, and the same permissions-based authorization scheme used in SQL Server is used to control access to tables and other database objects. In terms of compliance policies, you have control over which geographic region in which the data center hosting your Windows Azure SQL database server is located.
Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
You specify the size of your database as you create it, but it can grow to a maximum of 150 GB. Additionally, you can use federations to partition data across multiple databases to increase scalability and performance.
Can a hosted database meet our availability and disaster recovery requirements?
Windows Azure provides built-in resiliency by internally replicating your database across three redundant storage locations within the data center where your server is hosted. You can back up a SQL Database by copying it to another SQL Database.
Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?
Because SQL database is a PaaS offering, much of the physical server administration you would typically need to manage for a SQL Server instance is handled for you by Windows Azure. For logical administration tasks, such as managing users or creating database objects, you can use SQL Server Management Studio to connect to Windows Azure SQL database – enabling you to manage on-premises SQL Server instances and cloud-based Windows Azure SQL Database instances in the same tool.
Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?
Well, let’s take a look at this demonstration and find out:

As you can see, it’s really pretty straightforward to migrate an on-premises SQL Server database to Windows Azure SQL Database. If your database doesn’t depend on any features of SQL Server that aren’t supported by Windows Azure SQL Database, and you’re happy to let Windows Azure look after the physical configuration of your database server, then Windows Azure SQL Database is a good option for you.

SQL Server in a Windows Azure Virtual Machine

So, what about applications where you need to support SQL Server capabilities that aren’t available in Windows Azure SQL Database? Or where you specifically want control over the operating system and server-level configuration of your database server?
In this case, provisioning a virtual machine in Windows Azure might be a better option. There are a number of pre-defined virtual machine images, some of which include an installation of SQL Server; and if none of them suits you, there’s always the option to create your own and install whatever software you require. So how does this option meet our data migration requirements?
Can the data be moved to the cloud without compromising security or compliance requirements?
As with Windows Azure SQL Database, you can choose the geographic region of the data center where your virtual machine will be hosted. Access to the server is controlled through an endpoint that you must define for your Windows Azure virtual machine, and all network connectivity to the virtual machine can be restricted by using Windows firewall. You can use Windows Azure virtual networking to integrate virtual machines in Windows Azure with your corporate Active Directory infrastructure, and use Windows authentication to connect to SQL Server. Or alternatively, you can use SQL Server native authentication through logins and passwords, or even set up certificate-based authentication – exactly the same as with an on-premises instance of SQL Server. Additionally, you can make use of SQL Server’s security-related capabilities such as transparent database encryption and auditing.
Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
When you provision a virtual machine in Windows Azure, you can specify the number of virtual cores and the amount of memory allocated to the VM. At the moment, the largest VM available has 8 cores and 56 GB of memory, but I’d expect that to get larger over time. The VM uses Windows Azure storage for its virtual hard disks, and you can add multiple VHDs and use filegroups to stripe data across them. This technique has been shown to improve IOPS performance.
Can a hosted database meet our availability and disaster recovery requirements?
As with all Windows Azure storage blobs, the VHDs for the VM are replicated across three redundant physical data storage devices in the data center. Additionally, you can use SQL Server HA capabilities, such as AlwaysOn Availability Groups to protect against failure of a VM. You can back up databases in a Windows Azure VM just as you would for an on premises instance of SQL Server, and use the SQL Agent to automate backup tasks on a scheduled basis.
Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?
SQL Server in a virtual machine in Windows Azure is still just SQL Server. You can use SQL Server Management Studio to connect to it, and you can use all of the same management tools and agents you use for your on-premises database servers.
Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?
Once again, here’s a demonstration:

Note that this demonstration is based on pre-release software, and may not reflect what actually ships with SQL Server 2014. However, it’s clear that the intention is to include a simple, wizard-based tool that will help you easily migrate on-premises SQL Server databases to Windows Azure virtual machines.

Conclusion

Migration of IT services to the cloud is inevitable. There are simply too many cost, scalability, and mobility advantages to justify not doing it. However, I don’t think it will happen in one big mass movement – and in particular, I think corporate databases will be among the last elements to be migrated. For at least a while, probably many years, we’ll be living in a hybrid world where some data is managed on-premises, and other data is moved to the cloud. To support that scenario, we need tools and technologies that make it easy to move data from one place to the other, and to manage it consistently wherever it’s hosted.
The combination of SQL Server on-premises, Windows Azure SQL Database, and SQL Server in a Windows Azure virtual machine manages to pull this trick off well. With similar merging of private and public cloud network infrastructure support in Windows Server and System Center, the lines between “the cloud” and “the enterprise” are blurring to the point where, from an IT management perspective, it really doesn’t matter where a service is physically located.
If you want to learn more about Windows Azure database options, visit http://www.windowsazure.com/en-us/solutions/data-management/.

What the Heck is Hekaton?

2013-07-05T14:39:00.001+01:00

SQL Server 2014 introduces a new in-memory OLTP capability that was previously known by its codename, “Hekaton”. The technology introduces two new concepts to SQL Server: memory-optimized tables and native stored procedures. This article explores these features and provides a simple demonstration of how to use them.

The idea of optimizing data access performance by using in-memory storage is not new. SQL Server has always used caching to keep recently accessed data in memory, and recent releases have seen the addition of in-memory technology for large volume data analytics (PowerPivot and tabular models in Analysis Services) and high-performance table indexes that primarily benefits data warehouse workloads (columnstore indexes). What’s new in SQL Server 2014 is the ability to optimize an entire table for in-memory storage, effectively eliminating disk i/o for CRUD operations and massively improving query performance.

Memory-Optimized Tables

Memory-optimized tables are tables that you define using CREATE TABLE statements, in a similar fashion to traditional disk-based tables. However, memory-optimized tables are different from disk-based tables in the following ways:

The CREATE TABLE statement is used to generate a C struct, which is in turn compiled into a DLL and loaded into memory.
All data for the table is stored in memory, and all operations on the data occur in memory. By default, memory-optimized tables are durable (so they’re persisted to disk in order to survive restarts and support high-availability); but when the database is online, the table is always accessed directly in memory with no need to read pages from disk.
Columns in memory-optimized tables are indexed using hash indexes (range indexes may be supported in a later build), in which the result of hashing the indexed value determines the in-memory “bucket” in which the row is stored. Rows with the same hashed value are stored as a linked list within the bucket.
Table data is persisted to disk as a stream, not in 8K pages like a traditional table. The data must be stored in a filegroup that is created with the CONTAINS MEMORY_OPTIMIZED_DATA option. Indexes are not persisted, and will be regenerated in the event of a restart.
Some data types - notably text, image, and nvarchar(max) - are not supported. Similarly some features such as identity columns and foreign-key constraints cannot be used in memory-optimized tables.

Native Stored Procedures

Memory-optimized tables can co-exist with disk-based tables, and you can execute Transact-SQL queries that contain joins between disk-based tables and memory-optimized tables. In fact, you can use Transact-SQL to query memory optimized tables just like any other table, so you can improve the performance of some workloads by changing existing disk-based tables to memory-optimized tables without breaking existing applications that query them. The ability to use regular Transact-SQL to query memory optimized tables is provided by an interop layer in the SQL Server engine that does the necessary work to convert Transact-SQL statements into C code that can access the in-memory data structures in the compiled DLL for the table.

However, if your application code only needs to access data in memory optimized tables, you can further improve performance by using native stored procedures. Native stored procedures are created using the familiar CREATE STORED PROCEDURE statement to define the Transact-SQL statements you want to execute. The code is then translated into C and compiled into a DLL, just like a memory optimized table. The DLL is then loaded into memory, and since the instructions it contains are now compiled as native machine code, execution performance is greatly improved. There are some limitations in this release, and only the most commonly used Transact-SQL statements and functions are supported in native stored procedures; but for a large percentage of common database workloads, you should find that using memory optimized tables and native stored procedures can significantly improve application performance.

The following demonstration shows how to use memory optimized tables and native stored procedures.

So, should you convert all of your tables and stored procedures to take advantage of this new technology? Probably not (at least, not yet). There are some workloads where the new in-memory capabilities will bring enormous benefits in terms of improved performance; but there are also some cases where current limitations prevent them from being used. Even when an existing disk-based table is fully compatible with a memory optimized schema, you may find minimal improvement for some i/o workloads.

The important thing to understand when planning to use (or not use) memory optimized tables, is that the performance benefit is not purely a result of storing the data in memory. After all, SQL Server does a pretty good job of caching commonly accessed data in disk-based tables anyway. The crucial difference to the way data in a memory optimized table is accessed is that no locks or latches are used to support concurrency. In a disk-based table, if multiple transactions need to access the data concurrently, locks are used to ensure consistency and avoid one transaction’s results being affected by the data modifications of another transaction. Although SQL Server does support row-level locking, transactions that affect multiple rows can quickly escalate locking to page-level – causing concurrency issues that affect query performance. This can be especially acute in tables with so-called “hotspots” – for example a table with a clustered index on an incrementing key value, where all new rows are inserted at the end of the table. Memory optimized tables do not use locks to manage concurrency. Instead, a form of row-versioning is used to track modifications to rows by multiple transactions; which in any case usually happen so quickly (sub-millisecond) that concurrency clashes are extremely rare. If the i/o pattern for your table typically incurs a lot of locking, then making the table memory optimized will probably improve performance. If not, then you may not benefit significantly from changing the table. As an example, in the video demo, the 500,000 inserts were wrapped in a single transaction – which when executed against the disk-based table incurred locking to support isolation for 500,000 atomic INSERT statements. When creating the demo, I noticed that removing the BEGIN TRAN and COMMIT statements that enclose the loop (so that the inserts were done as 500,000 independent INSERT statements) resulted in a much less significant difference in the time taken to load the disk-based table and the time taken to load the memory optimized table (typically, the memory optimized table was around 5-6 seconds quicker).

This article is based on the CTP 1 release of SQL Server 2014, and details are liable to change between now and the eventual release of the product. You can download the preview of SQL Server 2014 from http://www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx.

The Transact-SQL code used in this demonstration is available from here.

GeoFlow Demo for Course 20467B

2013-07-03T13:52:00.001+01:00

Yesterday I posted a demo that Microsoft Certified Trainers can use in course 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 to show students how to use the Data Explorer add-in for Excel 2013. GeoFlow is another new Excel add-in that you might want to demonstrate in class. It enables users to visualize data that includes geographic and temporal dimensions on an animated map, showing how data points in specific geographic locations change over time. You can get the GeoFlow add-in from the Microsoft Office web site, and you can download the demo steps in a PDF document from here.

Click the thumbnail below to view the demonstration in a new window/tab.

Note that GeoFlow is a preview release at the moment, and is subject to change in the future. The demo is provided as-is, and no support will be provided for it by Microsoft Learning or Content Master.

Enjoy!

Update: GeoFlow has been renamed Power Map, and forms part of the Power BI capability being added to Microsoft Office 365.

Data Explorer Demo for Course 20467B

2013-07-02T16:09:00.001+01:00

Earlier this year we released Microsoft Official Curriculum course 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012. Since then, Microsoft has released a preview of Data Explorer, an add-in for Excel that enables users to browse and query data in a variety of sources. Data Explorer builds on the self-service BI techniques taught in course 20467B, and if you are an instructor delivering the course, you can add value by demonstrating how to use it within the context of a Microsoft-based BI solution. To help you, I’ve put together a simple demo that should be easy to set up and perform in the virtual machine environment for the course. It will probably work best towards the end of module 8 (Designing a Microsoft Excel-Based Reporting Solution), and you can download the steps in a PDF document from here.

Click the thumbnail below to view the demo.

Of course, bear in mind that Data Explorer is a preview release at the moment, and is subject to change in the future. The demo is provided as-is, and no support will be provided for it by Microsoft Learning or Content Master. Nevertheless, I hope you find it useful!

Update: Data Explorer has been renamed Power Query, and forms part of the Power BI capability being added to Microsoft Office 365.

Role Playing Games with SQL Server 2012 Analysis Services

2012-10-05T16:11:00.001+01:00

I’m currently working with Microsoft Learning, writing a course on designing BI solutions with SQL Server 2012. Obviously, this is a huge subject to try to cover, and raises a whole bunch of really interesting design considerations. One of the new things BI developers need to consider with SQL Server 2012, is whether to use a “traditional” multidimensional data model, or whether to use the new-fangled tabular model. In most cases, from an end-user’s perspective (no pun intended), there is little to pick between the two. In fact, in an Excel PivotTable, most users will struggle to spot any difference. However, for the cube developer, there are some significant differences. There are some things you can do easily in multidimensional projects (or indeed, things that are done automatically for you by SQL Server data Tools) which require (sometimes extremely complex) custom development in a tabular model. Other things are relatively straightforward to accomplish in both models, but require different implementations. An example of the latter is the implementation of role-playing dimensions. You can do this in both models, but there are some differences.

Role-playing dimensions are used to create multiple cube dimensions that are based on the same underlying dimension in the database. The classic example is a date dimension in which each member represents a calendar date. In your cube, you may have a a Sales Order measure group that is related to the Date dimension by multiple keys, for example an Order Date and a Delivery Date. Another example might be an Address dimension that is related to a Shipment measure group by both an Origin key and a Destination key. This multi-use of the same underlying dimension means that the dimension table is defined only once, but users can use it to slice the data by different keys – so for example, a user could view sales by order date or by delivery date (or both).

OK, so first, let’s see how a role-playing dimension is implemented in a multidimensional model. I’m using the AdventureWorksDW2012 sample database, which contains a FactResellerSales table that is related to a DimDate table using three key columns – OrderDateKey, ShipDateKey, and DueDateKey. When I create a data source view from the data warehouse tables in the multidimensional project, all three of the relationships are detected as shown here.

Using the wizard to create a cube automatically detects the multiple relationships, and results in a single DimDate dimension in the database but three role-playing dimensions in the cube (Order Date, Ship Date, and Due Date) as shown here.

The role-playing dimensions are really just references to the same DimDate dimension, but aggregations will be calculated based on each relationship. I’ll go ahead and add a hierarchy to the DimDate dimension:

When a user browses the cube in Excel, each of the three role-playing dimensions is available for them to slice the sales data, and all three of these dimensions have the same Calendar Date hierarchy that I defined for the base DimDate dimension:

Now let’s compare the experience with a tabular model. When I import the same tables into a tabular model project, the relationships are detected, and I can create the same hierarchy as before in the DimDate table. However, notice that two of the relationships are shown as dotted lines, while one is shown as a solid line.

This indicates that although the relationships have all been detected, only one of them is active at any one time. When a user browses the model in Excel, they only see one DimDate dimension, which will show aggregations for the active relationship (in this case, Order Date, but there’s no easy way for the user to tell that from the user interface):

The solution to this problem is obvious. So obvious in fact, that it took me a while to figure it out! The answer is to import the same table multiple times, and rename it appropriately:

After you’ve imported one copy of the table for each role-playing dimensions, you simply delete the inactive relationships from the original table, and create new ones to join the relevant keys in the fact table to the new dimension tables. You’ll also need to create duplicates of any hierarchies you want to appear in all of the dimensions.

Now when users browse the model, they’ll see all three dimensions, and as long as you’ve assigned appropriate names to each copy of the table, it should be obvious what each dimension represents.

PowerPivot and Power View in Excel 2013

2012-07-23T12:28:00.001+01:00

It’s just typical of my job that just a few short weeks after the publication of some Microsoft Official Curriculum courses that I’ve spent months working on, Microsoft should choose to make a preview of the next release of the software on which they are based available! As you may know, we recently published courses 10778A and 40009A, both of which make use of the PowerPivot and Power View features in Excel and SharePoint 2010; so it was with a certain amount of trepidation that I installed the preview of Office 2013 to get a first look at the enhancements that have been made.

The first, and most obvious, change is that the PowerPivot add-in for Excel no longer needs to be installed from a separate package. It’s built into Excel and only needs to be enabled, which you do by configuring the COM Add-ins in Excel’s options as shown here.

Note that there’s also a Power View add-in – more about that later!

After the PowerPivot add-in has been enabled, users will see the POWERPIVOT tab on the ribbon, as shown here.

With this ribbon, you can not only manage a PowerPivot tabular data model for the workbook as you can in Excel 2010, but you can also create calculated fields and KPIs without having to directly edit the model – making the process a little bit more intuitive for information workers.

Clicking Manage opens the PowerPivot window, which is similar to that of the previous release. There are a few enhancements of course, but anyone familiar with PowerPivot in Excel 2010 will find themselves in familiar territory. In this case, I’ve opened a PowerPivot workbook I created with Excel 2010 based on data in the AdventureWorksDW SQL Server sample database. The changes to this release meant that I was prompted to allow Excel to update the data model and re-save the workbook, so one thing to be aware of is that you can open (and update) Excel 2010 PowerPivot workbooks in Excel 2013, but after they’ve been updated you won’t be able to open them in Excel 2010. You can see the diagram view of my PowerPivot data model below – note that it includes a hierarchy in the Sales Territory table.

After you’ve created the data model in your workbook, you can use it as a source for PivotTables, just as you could in Excel 2010. There are however, one or two nice enhancements on a new ANALYZE tab of the ribbon that make it easier to do things like create slicers. Another new feature is the ability to create timeline filters that make it easier to analyse data based on chronological periods. To add a timeline, just click Insert Timeline and specify any of the time-based attributes that Excel identifies as having a suitable relationship in the model.

After you’ve inserted a timeline, you can use it to filter the data in the PivotTable as shown here.

Earlier, I mentioned that Excel 2013 includes a Power View add-in. This enables information workers to create Power View reports from the data model in the workbook (and external data sources). Previously, Power View was only available in SharePoint Server 2010, but in Office 2013 you can use it right there in an Excel workbook.

To create a Power View report from the data model in the workbook, just click Power View on the INSERT tab of the ribbon.

If necessary, you’ll be prompted to install Silverlight (which is required by the Power View add-in), and after doing so you’ll be able to create a Power View report from the data in your PowerPivot data model as shown here.

Note that you can include hierarchies in a Power View report, which wasn’t supported in the previous release. There are several other enhancements in this release, including support for new data visualizations (such as pie charts), and even visualization of geographical data on a Bing Maps map, as shown here.

This short article just highlights a few of the improvements to PowerPivot and Power View in Excel 2013. There are many more new features in Excel, as well as greater ability to share BI capabilities across the enterprise through enhancements in SharePoint 2013 and SQL Server 2012 SP1, which I look forward to exploring in more depth.

Matching Data with SQL Server 2012 Data Quality Services

2012-07-03T16:18:00.001+01:00

In a previous post, I described how you can use Data Quality Services (DQS) to create a knowledge base for the domains (data columns) used in your business data and use it to cleanse data by correcting invalid or inconsistent values. Data cleansing is however only one side of the coin when it comes to DQS. You can also use DQS to perform data matching – in other words, finding records that potential duplicates of one another and consolidating them to a single surviving record.

When you think about it, the potential for duplicate data entry in most complex business environments is enormous. For example, let’s imagine an e-commerce site where customers need to register before placing orders. It’s perfectly conceivable that a customer who only uses the site occasionally might re-register with slightly different details because they’ve forgotten that they had registered previously or can’t remember their login credentials. Even if the site applies a policy that demands a unique email address for each registration, there’s nothing to stop the same customer registering multiple times with different email addresses. For an individual sales order, the fact that the customer is registered multiple times is inconsequential – as long as the payment and delivery address details are correct, the order can be processed successfully. However, then the company wants to use its data to perform any kind of business intelligence (BI) reporting or analysis that aggregates information per customer, then the duplicate entries can lead to misleading results.

To use DQS to match data, you must first add a matching policy to a knowledge base. You can use an existing knowledge base that is also used for data cleansing, or you can create a knowledge base specifically for data matching. In this example, I’m opening an existing knowledge base that contains domains for customer records for the Matching Policy activity.

Just as when performing knowledge discovery, I need to map some sample data to the domains defined in the knowledge base. This enables me to test the matching policy against a known data set as I build it, and therefore verify that it successfully identifies known duplicate records. In this case, I’m using data in an Excel workbook as the source for my sample data, but you can also use a table in a SQL Server database.

Having mapped sample data to the domains, I can now define the matching rules for my matching policy. You can include multiple rules, and each one uses a set of weighted comparisons of domain values to identify clusters of records that are potential duplicates of one another.

Potential matches are determines based on a score that is calculated from the weighted comparisons you define in the rule. Here are the comparisons I’ve used in my Match Customer rule:

Domain	Similarity	Weight	Prerequisite
Birth Date	Exact		X
Email Address	Exact	20
Postal Code	Exact	10
Country/Region	Exact	10
First Name	Similar	10
Last Name	Similar	10
Street Address	Similar	20
City	Similar	10
State	Similar	10

Note that an exact match of the Birth Date domain is specified as a prerequisite. In other words, only records where the birth date is an exact match will be considered as candidates for a potential duplicate. Prerequisite domains in a matching rule must use the Exact similarity and have no weighting value. All of the other domains are calculated based on an exact or similar match, and have weightings, which add up to a total of 100.

Assuming the birth date for the records being compared is a match, DQS then makes the other comparisons defined in the matching rule and adds the specified weighting value for each comparison that is true to produce an overall score. For example, consider two records with identical Birth Date values being compared using the Match Customer rule defined above. If the Email Address domains for both records is an exact match, 20 is added to the score. If the First Name domains are similar (for example, “Rob” and “Robert”), another 10 is added to the score, and so on until all of the comparisons in the rule have been made. The resulting score is then compared to the minimum matching score defined for the matching rule (in this case 80). If the score exceeds the minimum matching score, then the records are considered a match. Multiple records that are considered matches for one another are grouped into a cluster.

After you have defined the matching rules, you can use them to find matches in the sample data you mapped earlier. This gives you the opportunity to verify that the rules behave as expected against a known dataset. In this case, the dataset results in a single cluster of matches that includes two records – one for Daniel Garcia and another for Dan Garcia.

Now that I’ve defined my matching policy, I can publish the knowledge base and allow the data stewards in my organization to use it for data matching.

To use a knowledge base to perform data matching, create a new data quality project, specify the knowledge base, and specify the Matching activity as shown here.

The first step, as it is in any DQS project, is to map the fields in your data source to the domains in the knowledge base. Just as before, the data source can be a table in a SQL Server database or an Excel file. This time, I’m using the Customers table in the Staging SQL Server database.

After you’ve mapped the domains, you can start the matching process. When the process is complete, the clusters of matched records is displayed. In this case, there are two clusters, each containing two matches. At tis stage, you can choose to reject any matches that you know aren’t duplicates.

When the matches have all been identified, you can export the results to a SQL Server table or an Excel file. You can also export survivors (one record from each cluster that is chosen as the correct one) based on one of the following survivorship rules:

Pivot record – A record in the cluster that is chosen arbitrarily by DQS.
Most complete and longest record – The record that has the fewest null field values and the longest overall data length.
Most complete record – The record that has the fewest null fields.
Longest record – The record that has the longest overall data length.

The exported results include all of the source data with additional columns for the clusters of matching records to indicate the matching rule used and score calculated for each match, and the pivot record for each match cluster.

The exported survivors contain all of the non-matching records from the original data source and one version of each matched record based on the survivorship rule you selected. In the following example, I’ve highlighted the surviving records from my matching process.

In some case, you can simply replace the original data set with the survivor records to create a de-duplicated set of records. However, in most business scenarios you’ll need to apply some logic (manual or automated) to handle relationships between duplicate records and other tables. For example, before I eliminate the duplicate customer records identified by the matching process in the above example, I would need to reassign any sales orders that are currently related to those customer records to the surviving records.

Hopefully you’ve found this brief introduction to data matching with DQS useful. To learn more about DQS and its use in a data warehousing solution, you can attend Microsoft Official Curriculum (MOC) course 10777A: Implementing a Data Warehouse with SQL Server 2012.

Cleansing Data with SQL Server 2012 Data Quality Services

2012-04-07T15:40:00.001+01:00

I’ve been a bit quiet on the blogging side of things for a while, and in my defence I’ve been pretty heads-down working as a vendor for Microsoft as the lead author for a couple of new courses on SQL Server 2012 data warehousing and BI (courses 10777A and 10778A if you’re interested). As part of this work, I’ve been exploring the new data cleansing capabilities in SQL Server 2012 Data Quality Services (DQS). This article is a simple walkthrough of how to use DQS to cleanse data as part of an Enterprise Information Management (EIM) or Extract, Transform, and Load (ETL) solution.

So, what is data cleansing all about then? Well, most people involved in building or managing data-driven applications and BI solutions will have come across the problem of inconsistent or invalid data values for columns (or “domains”) that are used for business analysis. For example, let’s suppose your database stores customer data, including the customer’s address; and you you want to count customer sales by country. When customers or sales employees enter customer data into the system, it’s perfectly possible (and actually quite likely given a large enough volume of customers) that some values will be either entered incorrectly (for example “Unted States” instead of “United States”) or inconsistently (for example, some customers may enter “United States”, some others may enter “USA”, and others still may enter “America”). When you try to aggregate sales, you’ll end up with inaccurate counts because there are several values in use for the same country.

Here’s an Excel workbook containing a subset of data extracted from a SQL Server database table to show some typical data quality problems.

Note that the data contains a number of problems, including:

The City column contains “New York” and “NYC” for New York City.
The Country column contains “United States” and “USA” for the US.
The Country column also contains “United Kingdom” and “Great Britain” for the UK.

DQS enables you to address this problem by cleansing the data based on a known set of values and rules for the key domains (columns) that exist in your datasets. The way that DQS does this is by enabling you to create and maintain a knowledge base that contains the known valid values for a related set of domains, along with validation rules (for example, an EmailAddress value must include a “@” character) and common synonyms can be corrected to a leading value (for example, by correcting “USA” and “America” to the leading value “United States”). After you have created a knowledge base, you can use it to cleanse any data that includes the same domains (so for example, if you create a knowledge base for geographical domains such as City, State, and Country, you can use it to cleanse any data that includes these fields – such as customer data or employee address data. SQL Server 2012 includes the Data Quality Services Client tool (shown below), which you can use to create, maintain, and use DQS knowledge bases.

When you create a new knowledge base, you can do so from scratch, or you can use an existing knowledge base as a starting point. SQL Server 2012 ships with a pre-existing knowledge base for US-based demographic data named DQS Data, and in this example, I’ll use it as the basis for my own CustomerKB knowledge base as shown below.

The DQS Data knowledge base includes a number of pre-defined domains, as shown in the image above. I only need some of these domains, and I’ll need to add some additional ones that are specific to my own data; so I’ve initially selected the Domain Management activity as I create the CustomerKB knowledge base. I only intend to use the Country/Region, US – Last Name, and US – State domains from the DQS Data knowledge base, so I’ll delete the others. The domains I’m retaining contain official values for country and US state names, and common last names (surnames) based on US demographic data such as the 2000 US census.

Since my customer data includes records for customers all over the world, I’ll rename the domains in my knowledge base to remove the “US” prefix. I’ll also add a new domain named City so that I can validate city names in the data.

Note that I can select each domain and view the known values that are currently defined in the knowledge base as shown below. The City domain has no known values (because I’ve just created it), and the others have inherited values from the DQS Data knowledge base. The image below shows the known values for the Country/Region domain. Note that the knowledge base defines leading values for each country (such as “Afghanistan”) and synonyms that, while valid in their own right, should be corrected to the leading value to ensure consistency.

I’ve now completed my initial knowledge base, so I’m ready to finish the domain management activity. Clicking Finish produces a prompt to publish the knowledge base as shown below, but before I’m ready to use it I want to populate the known values for the City domain from my existing data by performing some knowledge discovery; so I’ll click No.

Knowledge Discovery is an activity in which you connect to a data source and map fields in the source to domains in the knowledge base. DQS can then use the data source to discover new values for the domains defined in the knowledge base. The first step in this process is to open the knowledge base for the Knowledge Discovery activity as shown here. Note that the activity is performed using a wizard interface, with a sequence of steps.

After opening the knowledge base, I need to select a data source (I’m using the Excel workbook we saw earlier), and map the columns in the data source to the domains in the knowledge base as shown below. Note that the data source can include columns that are not mapped to domains, and does not need to include a column for every domain in the knowledge base. However, only the mapped domains will be included in the knowledge discovery process.

On the next page, I can start the data discovery analysis. DQS will read the source data and identify new values for the domains in the knowledge base, as shown here.

On the final page of the wizard, you can view the values that have been discovered for each domain. In this example, the values discovered for the City domain include New York and NYC, as shown below. I can identify these as synonyms by selecting them both and clicking the Set selected domain values as synonyms button.

The value I selected first becomes the leading value, as shown here.

For the Country/Region domain, DQS has discovered a new “Great Britain” value. I can mark this as invalid and specify an existing value to which it should be corrected (in this case, “United Kingdom”).

Clearing the Show Only New checkbox reveals the values that already existed before knowledge discovery, and I can see that “Great Britain” is now under the “United Kingdom” leading value. I can also see that there were 151 instances of the existing “United States” value found, along with a further 42 instances of “USA”, which was already specified as a synonym for “United States”.

Now I’m ready to finish the knowledge discovery activity and publish the knowledge base.

After you have published a knowledge base, you can use it to cleanse data from any data source containing columns that can be mapped to the domains defined in it. The simplest way to do this is to create a new data quality project based on the knowledge base and specify the Cleansing activity, as shown here.

Again, the activity takes the form of a wizard with sequential steps. The first step is to map the columns in the data source to the domains in the knowledge base, just as I did when performing the knowledge discovery activity previously; only this time I’m using the full Customers table in my CustomerDB SQL Server database instead of the sample data I had extracted to Excel.

Next, I run the cleansing process and DQS applies the knowledge base to the source data to identify corrected and suggested values. Corrected values are corrections DQS makes to the data based on known rules and synonyms. Suggested values are further possible corrections or new values that are generated based on a number of data quality heuristics that DQS uses when analyzing data.

On the next page, on the Suggestions tab for each domain, I can view the suggestions identified by DQS. Here, DQS has identified a City domain value of "W. York”, which is sufficiently similar to the known value “York” for a correction to be suggested. Note that I can select the value and view the records that contain it to verify that “W. York” is commonly being used to denote “York” in England (as opposed for example, to “New York” in the United States). I can then choose to approve or reject individual instances of the correction, or accept/reject the suggestion that “W. York” should be considered a synonym of “York” (if I approve the suggestion) or added as a new known value in its own right (if I reject the suggestion).

On the New tab, I can view the new values that were discovered for the domain. In this case, a number of new values were identified for the City domain, including Bracknell in England.

On the Corrected tab, I can view the values that were corrected based on pre-existing known synonyms or suggestions that I have approved.

After reviewing the results of the cleansing activity, I can export the cleansed data to a SQL Server database table, and .csv file, or an Excel workbook. Note that I can choose to export just the cleansed data values or I can include the cleansing information for further analysis.

The exported results are shown in the following image. Note that the results include all of the source columns, and that for each of the columns that was mapped to a domain there are five columns in the results: The source value, the output value, the reason for any corrections, the level of confidence (between 0 and 1) for the correction, and the status of the column (correct or corrected).

By creating a data cleansing project, a business user who understands the data domains can act as a “data steward” and enforce the quality of the data in application databases or analytical and reporting systems. Additionally, when you are confident in the ability of your knowledge base to cleanse data, you can incorporate DQS data cleansing into a SQL Server Integration Services (SSIS) data flow that extracts data from a source as part of an ETL process for data warehousing or EIM. The following image shows an SSIS data flow that includes the DQS Cleansing transformation.

In this example, the CustomerDB data source uses an OLE DB connection to extract data from the Customers table in SQL Server. The DQS Cleansing transformation is then configured to use the CustomerKB knowledge base and map the appropriate columns from the data source to domains for cleansing, as shown here.

The Staging DB destination uses an OLE DB connection to load data from the data flow into a staging table as part of an ETL process for a data warehousing solution. The output columns for the mapped domains are used to load the cleansed values into the staging table, as shown here.

Running the SSIS package extracts the source data, applies the DQS knowledge base to cleanse the mapped columns, and loads the cleansed data into the staging database as shown here.

This walkthrough provides a simple example of how you can use DQS to cleanse data and improve data quality for reporting and analysis. There are a number of additional features of DQS that are not shown here, including the ability to define composite domains that consist of multiple columns and the ability to include external reference cleansing data from the Windows Azure Data Market in your knowledge base (for example to apply post code validation and correction rules based on standard data from a postal service authority). You can learn more about using DQS to cleanse data by attending course 10777A: Implementing a Data Warehouse with SQL Server 2012.

SQL Server “Denali” Integration Services – Projects and Parameters

2011-02-22T18:19:00.001+00:00

Some previous posts in this blog have discussed new features in the SQL Server “Denali” database engine. For this post however, I want to focus on some of the key enhancements in SQL Server “Denali” Integration Services (SSIS). SSIS first appeared in SQL Server 2005 as an evolution of the Data Transformation Services (DTS) component in previous releases, and has steadily become a core element of Extract, Transform, and Load (ETL) operations for many data warehousing implementations.

The big news in the “Denali” release of SQL Server Integration Services, is a whole new deployment model for SSIS solutions. In previous releases, the only available unit of deployment is the package (a .dtsx file), and this could be deployed either to the file system or to the MSDB database in a SQL Server instance. This single-package deployment model is at-odds with the development model for SSIS solutions, in which a developer can create a single project that contains multiple packages. Prior to “Denali”, each package must be deployed and any variables that need to be set at runtime must be managed through a package configuration for each individual package. SSIS in “Denali” still supports this “legacy” deployment model, but now also supports project-level deployment to a the new Integration Services Catalog, and project-level parameters that can be used to set variables across multiple packages within a project.

The first thing you need to do to take advantage of this new deployment model, is to create an Integration Services catalog on an instance of SQL Server. The Integration Services catalog is a central database in which SSIS projects can be stored and managed, and you can have one catalog per SQL Server instance. The Integration Services catalog uses the SQLCLR (the .NET common language runtime hosted within SQL Server), so you need to enable this first by using the following Transact-SQL:

sp_configure 'show advanced options', 1;

GO

RECONFIGURE;

GO

sp_configure 'clr enabled', 1;

GO

RECONFIGURE;

GO

Now you’re ready to create an Integration Services catalog, which you can in SQL Server Management Studio as shown here.

When you create the Integration Services catalog, you’re prompted for a password that can be used to protect the databases master key used to encrypt the data.

After clicking OK, refreshing the Object Explorer view reveals two new items as shown here. The first is a database named SSISDB, and the second is an SSISDB node beneath the Integration Services folder. The database is a regular SQL Server database that contains a number of tables, views, and stored procedures that you can use to manage and run SSIS projects and packages stored in the catalog. It is also where the projects and packages in your catalog are physically stored. The SSISDB node under the Integration Services folder provides a management interface for the catalog and enables you to define a logical folder structure for your catalog.

To create a folder in your catalog, simply right-click the SSISDB node under the Integration Services folder, and click Create Folder. Here I’ve created a folder with the imaginative name My Folder. Note that subfolders named Projects and Environments have automatically been created – we’ll return to these later.

OK, so now we have an Integration Services catalog that contains a folder to which we can deploy an SSIS project; so I guess it’s time we went ahead and created a project to deploy. For our purposes, we’ll create a simple SSIS project that includes a data flow task that extracts the list of database names from the sysdatabases system view in the master database and copies it to a table in another database. I’m going to copy the database list to a table in a database called CmSampleDB, and to make matters a little more interesting, I’m going to create two tables that can act as the destination for the list – one to be used when testing the SSIS solution, and another to be used in production. We’ll design the SSIS project to support a project-level parameter so you can specify which table to use at runtime. Here’s my Transact-SQL code to create the destination tables:

USE CmSampleDB

GO

CREATE TABLE TestDBList

(name nvarchar(250))

GO

CREATE TABLE DBList

(name nvarchar(250))

GO

Now we can go ahead and create the SSIS project using SQL Server Business Intelligence Development Studio (BIDS). Creating an SSIS project in “Denali” is exactly the same as in previous versions, just select the Integration Services Project template as shown here:

When the new project is created, it will contains a single package named Package.dtsx, which you can rename to suit your own requirements – I’m going to name my My Package.dtsx. You can add more packages to the project as required, so for example, I’ll add a second package which I’ll name, um, My Other Package.dtsx. In Solution Explorer, my project now looks like this.

So far, nothing is very different from how you would create an SSIS project in previous releases of SQL Server, but here’s where we’re going to use a new feature – Project Parameters. Project parameters are, as the name suggests, parameters that can be used to pass variable values to the project at runtime. Because these parameters are scoped at the project level, they can be used by any package in the project. To add a project parameter, right-click the project in Solution Explorer and click Project Parameters, or click Project Parameters on the Project menu. Either of these actions displays the Parameters pane as shown here:

As you can see, I’ve used this pane to create a project-level parameter named TableName with a default value of TestDBList. This default value is more correctly known as the Design default value, since it’s used when I run the project within BIDS. When I deploy the project, I can set a Server default value that will override this one when packages in this project are run on the server.

Now I need to create the data flow task that copies the database names from sysdatabases in the master database to the table indicated by the TableName parameter in the CmSampleDB database. To do this I just need to drag a Data Flow task to the design surface of My Package.dtsx as shown here:

Next, I’ll double-click the data flow task to view the data flow design surface, and use the Source Assistant item on the SSIS Toolbox to create a new connection to the master database on my SQL Server instance. Then I can configure the OLE DB source that gets created to extract the name column from the sysdatabases system view by using the following SQL command:

SELECT name FROM sysdatabases

The data flow surface now looks like this:

Next I’ll use the Destination Assistant to add a connection to the CmSampleDB database on my SQL Server instance, and connect the output from the source to the destination as shown here:

To complete the data flow, I need to configure the destination to insert the output from the source into the table specified in the project-level TableName parameter, as shown here:

Now I’m ready to build and deploy the project to the Integration Services catalog I created earlier. Building the project in BIDS creates a .ispac file, which you can then import into the catalog using SQL Server Management Studio, or deploy directly to the catalog from BIDS by clicking Deploy on the Project menu (or by right-clicking the project in Solution Explorer and clicking Deploy). Whichever approach you use, deployment to the catalog is accomplished via the Integration Services Deployment Wizard. After the Welcome screen, the wizard prompts you to select the project you want to deploy – in this case, the .ispac file I just built.

Next, the wizard loads and validates the project before prompting you for the destination. This consists of the server where the Integration Services catalog is hosted, and the path to the folder where you want to deploy the project. Here, I’ve select the My folder folder i created earlier.

Finally, the wizard prompts you to set Server default values for any project parameters. You can use the Design default value, specify a new value, or use an environment variable. We’ll look at environment variables shortly, but for now I’ve set the Server default value for the TableName parameter to DBList.

Completing the wizard deploys the project to the catalog, which you can verify in SQL Server Management Studio. note that the project is actually saved to the Projects sub-folder of the path specified in the wizard, and that all packages within the project are deployed as a single unit.

The final thing I want to do is to define a test environment and a production environment that can be used to control the execution context for the project. To do this, I’ll right-click the Environments folder and click Create Environment. Using this approach I’ve created two environments called Test and Production.

You can edit the properties of each environment to create environment variables, which in turn can be used to set project parameters when project packages are run in the context of the environment. For example, here I’m creating an environment variable named tName in the Test environment with a value of TestDBList. I’ve also created an environment variable with the same name in the Production environment and assigned the value DBList.

Finally, I can hook the environments up to the project by editing the properties of the project in the Integration Services catalog and adding environment references, as shown here…

… and setting parameters to get their values from environment variables as shown here (note that in the CTP release, you must click OK after adding the environment references on the References page before re-opening the Properties window and changing the parameter value on the Parameters page):

So now we have a project deployed to our Integration Services catalog. The project contains two packages - one of which doesn’t actually do anything, and another that copies the list of database names from the sysdatabase system view in the master database to a table in the CmSampleDB database. There is a project-level parameter that is used to indicate which table the database names should be copied to, and this is set to TestDBList or DBList depending on the environment that the package is executed in. To test this, I can right-click My Package.dtsx in Object Explorer and click Run, which produces the following dialog box:

Note that I can select the environment reference I want to use, which will determine the value of the tName environment variable, which will in turn set the value for the TableName project parameter and ultimately determine which table the data is copied to. For this example, I’ll select the Test environment and run the package, and the data is coped to the TestDBList table as shown below:

To review past operations in the catalog, you can right-click SSISDB under the Integration Services folder in Object Explorer and click Operations. This shows a list of all operations that have been performed in the catalog in reverse order of occurrence, so in this example you can see that a project was deployed and then a package was executed.

Double-clicking an entry in the list reveals more information about the operation. For example, here’s the details for the package execution:

Note that clicking the Parameters tab shows the parameter values that were used for this particular execution:

I’ve used this article to give you a quick tour of the new deployment model for SSIS and how you can use project parameters and environment variables to create a more flexible but manageable ETL solution with SSIS in SQL Server “Denali”. For more information about what’s new in SSIS in “Denali”, see SQL Server Books Online.

Paging With SQL Server “Denali”

2011-01-27T13:43:00.001+00:00

One of the more common tasks in a data-bound application is to display data in a “page-able” user interface – often some sort of data grid. So for example, you might want to create a Web application that shows a list of available products, but which limits the list to ten products at a time. The user then “pages” through the data by clicking “Next Page” or something similar. There are loads of user interface design and implementation patterns that enable this kind of functionality, but they usually either involve fetching all of the data from the database and caching it locally in the application, or implementing some sort of “current page” tracking functionality to adjust the query used to retrieve the data on each page.

SQL Server “Denali” introduces a new way to manage paging within the SELECT statement itself. More specifically, you can use the new OFFSET and FETCH keywords in the ORDER BY clause to limit the query results to a specific page of data. The OFFSET keyword is used to indicate the “starting row” of the results (i.e. the number of rows to skip before this page), and the FETCH keyword is used to indicate the number of pages to be returned (i.e. the page size). For example, the following query skips 20 rows in the underlying dataset and then returns the next 10 rows:

SELECT so.SalesOrderID, so.OrderDate, c.CustomerName
FROM SalesOrder so
JOIN Customer c ON so.Customer = c.CustomerID
ORDER BY SalesOrderID ASC
OFFSET 20 ROWS
FETCH NEXT 10 ROWS ONLY

For this to work predictably, the ORDER BY clause must include a unique column (or combination of columns) and the underlying dataset must not change between queries.

With this functionality, you can implement an effective paging solution that tracks the position of the first row in the current page, the first row in the next page, and the first row in the previous page. For example, the following stored procedure retrieves the requested page of data based on page size and offset parameter values, and then returns the first row positions for the next and previous pages:

CREATE PROCEDURE GetSalesOrders(@PageSize int, @Offset int,
@NextPage int OUTPUT, @PrevPage int OUTPUT)
AS
-- Retrieve the requested page of data
SELECT so.SalesOrderID, so.OrderDate, c.CustomerName
FROM SalesOrder so
JOIN Customer c ON so.Customer = c.CustomerID
ORDER BY SalesOrderID ASC
OFFSET @Offset ROWS
FETCH NEXT @PageSize ROWS ONLY

-- Set the row position markers
SET @NextPage = @@ROWCOUNT + @Offset
SET @PrevPage = @Offset - @PageSize
GO

You can then call this stored procedure to navigate forward and backward through the data like this:

DECLARE @StartRow int = 0, @Next int, @Prev int
EXECUTE GetSalesOrders 10, @StartRow, @Next OUTPUT, @Prev OUTPUT
SET @StartRow = @Next
EXECUTE GetSalesOrders 10, @StartRow, @Next OUTPUT, @Prev OUTPUT
SET @StartRow = @Next
EXECUTE GetSalesOrders 10, @StartRow, @Next OUTPUT, @Prev OUTPUT
SET @StartRow = @Prev
EXECUTE GetSalesOrders 10, @StartRow, @Next OUTPUT, @Prev OUTPUT

This code calls the stored procedure 4 times, retrieving the initial page (with an offset of zero), the next two pages, and then the second page again, as shown in the results here:

SalesOrderID OrderDate CustomerName
------------ ---------- ------------------------------
1            2010-01-20 Kasumi Fernandez
2            2010-01-21 Rod Dechamps
3            2010-01-22 Jane Dechamps
4            2010-01-23 Freddy Dechamps
5            2010-01-24 Pierre Dechamps
6            2010-01-25 Kasumi Dechamps
7            2010-02-01 Rod Smith
8            2010-02-01 Jane Smith
9            2010-02-01 Freddy Smith
10           2010-02-01 Pierre Smith

(10 row(s) affected)

SalesOrderID OrderDate CustomerName
------------ ---------- ------------------------------
11           2010-02-01 Kasumi Smith
12           2010-02-01 Rod Jones
13           2010-02-01 Jane Jones
14           2010-02-01 Freddy Jones
15           2010-02-01 Pierre Jones
16           2010-02-01 Kasumi Jones
17           2010-02-01 Rod Yamamoto
18           2010-02-01 Jane Yamamoto
19           2010-02-01 Freddy Yamamoto
20           2010-02-01 Pierre Yamamoto

(10 row(s) affected)

SalesOrderID OrderDate CustomerName
------------ ---------- ------------------------------
21           2010-02-01 Kasumi Yamamoto
22           2010-02-01 Rod Fernandez
23           2010-02-01 Jane Fernandez
24           2010-02-01 Freddy Fernandez
25           2010-02-01 Pierre Fernandez
26           2010-02-01 Kasumi Fernandez
27           2010-02-01 Rod Dechamps
28           2010-02-01 Jane Dechamps
29           2010-02-01 Freddy Dechamps
30           2010-02-01 Pierre Dechamps

(10 row(s) affected)

SalesOrderID OrderDate CustomerName
------------ ---------- ------------------------------
11           2010-02-01 Kasumi Smith
12           2010-02-01 Rod Jones
13           2010-02-01 Jane Jones
14           2010-02-01 Freddy Jones
15           2010-02-01 Pierre Jones
16           2010-02-01 Kasumi Jones
17           2010-02-01 Rod Yamamoto
18           2010-02-01 Jane Yamamoto
19           2010-02-01 Freddy Yamamoto
20           2010-02-01 Pierre Yamamoto

(10 row(s) affected)

Note that this simple example doesn’t handle the issue of “falling off the end” of the underlying dataset, so an attempt to move forward beyond the last page will return an empty result set, and an attempt to move backward to a position before the first row in the dataset will result in an error (since the OFFSET value cannot be less than zero). You could easily add some basic validation checks in the stored procedure to account for this (for example, resetting @PrevPage to 0 if it becomes negative or setting @NextPage to COUNT(*) - @PageSize if it gets larger than the underlying dataset).

You can download SQL Server “Denali” CTP1 from here, a script to create the sample database I used for the above example from here, and the paging example shown in this article from here.

del.icio.us Tags: SQL Server Denali

SQL Server “Denali” – Promising the Earth!

2011-01-20T17:50:00.001+00:00

I’ve previously posted several articles about spatial data support in SQL Server, and as I continue my exploration of SQL Server “Denali” CTP1, I’ve encountered a few interesting new enhancements to the geometry and geography data types. I won’t go into full details here, because Ed Katibah and Milan Stojic have already saved me the trouble by writing a whitepaper that provides comprehensive round-up of the changes in this release; however, I do want to make a few observations relating to the new enhanced support for curved lines and shapes.

SQL Server “Denali” introduces a few new spatial shapes, including CIRCULARSTRING, COMPOUNDCURVE, and CURVEPOLYGON. A CIRCULARSTRING line is a sequence of an odd number of at least three points, which are connected to form a curved arc. For example, consider the following Transact-SQL:

DECLARE @g geography = 'CIRCULARSTRING(-4.115 55.778, -3.399 56.990, -2.237 54.009)'

This creates a geography instance that represents a curved line like this:

Compare this to the line produced by using the LINESTRING shape, as shown here:

DECLARE @g geography = 'LINESTRING(-4.115 55.778, -3.399 56.990, -2.237 54.009)'

Of course, both of these lines are “open”. You can create a closed CIRCULARSTRING line by defining at least five points and making the final point in the line the same as the first, as shown here:

DECLARE @g geography = 'LINESTRING(-4.115 55.778, -3.399 56.990, -2.237 54.009, -3.168 53.863, -4.115 55.778)'

A COMPOUNDCURVE is a curved shape that is composed of one or more CIRCULARSTRING arcs and linear sections that are contiguously joined by having the final point in each segment the same as the first point in the next segment. For example, the following Transact-SQL creates a COMPOUNDCURVE shape from two CIRCULARSTRING arcs and a linear section. Note that you do not specify a keyword for the linear sections.

DECLARE @g geography = 'COMPOUNDCURVE(
                          CIRCULARSTRING(-4.000 55.000, -4.500 54.500, -4.000 54.000),
                          (-4.000 54.000, 1.000 54.000),
                          CIRCULARSTRING(1.000 54.000, 1.500 54.500, 1.000 55.000))'

A CURVEPOLYGON is a surface area that is formed by a closed curved line, which can be defined by a CIRCULARSTRING or a COMPOUNDCURVE. When working with the geometry data type, the points in the curved area can be defined in any order, but when using he geography type, you must observe the “left foot rule”, which dictates that you must describe the shape as if you were pacing it out on the ground and the “inside” of the shape is always on your left. For example, here’s a Transact-SQL statement that defines a CURVEPOLYGON based on a CIRCULARSTRING:

SELECT geography::Parse('CURVEPOLYGON(

                           CIRCULARSTRING(-4.889 55.844,

                                          -3.924 55.738,

                                          -2.731 56.058,

                                          -4.201 56.134,

                                          -4.889 55.844)

                                       )')

This defines an area in the UK within the so-called “central belt” of Scotland, as shown here. Note that the points describe the shaded area, which is what would be on your left if you paced out the area from point to point in the sequence in which they are specified.

Now, what happens if I reverse the order of the points as shown here?

SELECT geography::Parse('CURVEPOLYGON(

                           CIRCULARSTRING(-4.889 55.844,

                                          -4.201 56.134,

                                          -2.731 56.058,

                                          -3.924 55.738,

                                          -4.889 55.844)

                                       )')

The “left foot rule” clearly tells SQL Server to include everything on my left side as I pace out the shape, and since the Earth is a sphere, this shape actually describes the entire surface of the planet except for the “hole” defined by the points in the CIRCULARSTRING.

In previous releases of SQL Server, this would have caused an error because shapes larger than a hemisphere were not supported. However, in SQL Server “Denali”, you can create a shape that covers as much of the surface of the Earth as you like, so this code is perfectly valid. Clearly, the message here is that you need to be very careful when defining surface areas (be they CURVEPOLYGON or regular POLYGON shapes) to apply the “left foot rule” to include the surface area you actually intend to, and not the rest of the world!

Speaking of which, one other new feature worth mentioning is the inclusion of a FULLGLOBE shape, which returns a surface area that covers, you guessed it, the full globe. For example, the following code returns the area of the planet’s surface in square kilometres:

DECLARE @theEarth geography = geography::STGeomFromText('FULLGLOBE', 4326)
SELECT @theEarth.STArea()/1000000

The Spatial Reference Identifier (SRID) 4326 specifies that the WGS84 standard model of the earth’s shape should be used, which means that the result of the call to the STArea method is returned in square metres, which we then divide by 1000000 to get the answer in square kilometres.

And just in case you’re interested, the Earth’s surface is just a little over 510,065,621 km2!

del.icio.us Tags: SQL Server Denali

Contained Databases in SQL Server “Denali”

2011-01-11T16:54:00.001+00:00

OK, so here’s a common scenario. You’ve developed an application that uses a SQL Server database on a development machine, and now you need to deploy the application (and its database) to a staging or production environment. Of course, you can generate scripts to recreate the database on another server, or you could simply back it up and restore it to the other server. However, while that would successfully move the database, it would not take across any server-level objects that the application depends on, such as logins or Agent jobs. SQL Server 2008 R2, introduced Data-Tier applications as a way to package up databases with their server-level dependencies and deploy them to another server (which I covered in a previous blog post), and this is a great step forward, but to be honest, wouldn’t it be nice if the application database was simply self-contained with no need to rely on server-level objects at all?

Well, in SQL Server “Denali”, you get your wish in the form of Contained Databases. When you create a database, you now have the option of setting its containment type property to partial, which enables you to create “contained” objects that would normally be defined at the server-level within the database itself – most notably logins. The “partial” containment type value specifies that the database can use a mixture of contained and non-contained objects (so for example, traditional server-level logins can be used to access the database as well as contained logins) – a further option of “full” containment (which disallows the use of non-contained objects) is promised but not supported in the current CTP 1 release.

So let’s see an example of how to create and use a contained database. First, you need to enable contained databases in the server. You can do this in the Server Properties dialog box as shown here, or you can use the following Transact-SQL code:

sp_configure 'show advanced options', 1 ;

GO

RECONFIGURE ;

GO

sp_configure 'contained database authentication', 1;

GO

RECONFIGURE ;

GO

sp_configure 'show advanced options', 0 ;

GO

RECONFIGURE ;

GO

After you’ve enabled contained databases, you can create one like this:

CREATE DATABASE [MyContainedDB]

CONTAINMENT = PARTIAL

GO

Or alternatively, you can use the New Database dialog box in SQL Server Management Studio and set the Containment type property as shown here.

Now that you have a contained database, you can create users within it. The important point here is that you can create users that do not map to server-level logins, so the traditional dependency between a user object in a database and a login object in the server instance where that database is hosted is broken. To create a contained SQL Server user that has a password, you can use a Transact-SQL CREATE USER statement in the contained database like this one:

USE [MyContainedDB]

GO

CREATE USER [MyContainedUser] WITH PASSWORD = ‘5up3r53cret’

GO

Or, you can create a contained user for a Windows account like this:

CREATE USER [DEVBOX\MyAppAccount]

GO

Of course, you can also use the graphical tools in SQL Server Management Studio to create a contained user, as shown here:

Note that if you want to use a SQL Server user, you need to enable so-called “mixed mode” authentication at the server-level – even though there’s no server-level SQL Server logins.

As stated before, contained users do not rely on server-level logins, they exist only within the contained database. This means that when you want to connect to the contained database from a client application, you need to specify the database name in the connection string or the connection will fail because SQL Server will attempt to default to the master database and try to authenticate your credentials as a server-level login. In a typical client application, you can specify the database in a connection string like this:

“SERVER=mysqlserver; DATABASE=MyContainedDB; USER ID=MyContainedUser; PASSWORD=5up3r53cret”

When connecting from tools like SQL Server Management Studio, you can specify the database name in the connection dialog box like this:

The ability to create contained databases that include users with no dependency on server-level logins can be extremely useful when you need to move the database from one server to another. However, you should be aware of the security implications of using database-specific authentication that effectively bypasses the usual security management architecture. You can find out more about these security implications here.

del.icio.us Tags: SQL Server Denali

Getting Started with SQL Azure

2011-01-10T18:28:00.001+00:00

Towards the end of last year, I was the lead author on Microsoft Learning course 10337A: Updating Your Microsoft SQL Server 2008 BI Skills to SQL Server 2008 R2. While this is a course primarily for BI developers, we included a module on SQL Azure; and to make the course work in a classroom, we created a Silverlight-based simulation that students can use to walk through the steps required to set up and use a SQL Azure database. Additionally, Hilton Giesnow presents a useful video introduction to SQL Azure on the MSDN site.

However, the release cycle for cloud-based technology moves even faster than that of traditional software products, and predictably enough, the Azure team at Microsoft has created a new version of the Web portal used to manage Azure platform subscriptions – including SQL Azure. So, I thought it might be useful to provide a short walkthrough based on the latest portal (before they go and change it again!)

To start with, you need to sign-up for an Azure subscription at http://www.microsoft.com/windowsazure/offers/. It may take as long as a couple of days to provision your account, but mine took less than half an hour. After you’ve signed up, you’ll be able to access the Azure portal using your Windows Live ID:

To create a SQL Azure server (which you can think of as a cloud-based computer running an instance of SQL Server – it’s not, but you can think of it like that!), click the New Database Server button in the toolbar (which I guess like everything else these days is probably more properly called the “ribbon”), which will display all of the Azure subscriptions your Windows Live ID is associated with that include SQL Azure services. In this case, I have a single subscription named CM Azure Subscription, which currently has no SQL Azure database servers defined in it.

No you can select the subscription in which you want to create the SQL Azure server, and click the Create button in the toolbar/ribbon, which will start a wizard, the first step in which is to select the geographical location where you want the server to be hosted:

After selecting a location, you need to specify the Administrator credentials for the server. There are some restrictions on login name and password complexity (for example, you can’t create a login named Administrator with the password password).

Next, the wizard prompts you to specify the firewall rules that control connectivity to your server. By default, nothing (including any other Azure services you may have) can access your server, so you’ll typically want to enable access for Windows Azure services as shown here:

Additionally, you’ll probably want to allow at least some computers to connect to the server across the Internet – even if initially this is limited to your own development workstation. To do this, you’ll need to add a firewall rule that specifies a range of IP addresses from which you want to enable connectivity. In this example, I’ve created a rule that allows connections from any computer on the Internet.

After completing the wizard, your SQL Azure server is provisioned with a name that looks like a random stream of characters and shows up in the Database section of the Azure portal as shown here:

Now that you have a SQL Azure server, you can select it in the Azure portal and create databases in it by clicking the Create button in the ribbon. Doing this results in a prompt for a name, edition, and size for your database. There are two editions available (Web and Business), each with their own range of possible sizes (and corresponding prices), so when you select an edition, the available sizes will reflect the sizes supported by that edition. In this example, I’ve created a 1GB Web edition database.

After creating your database, you can view its properties in the portal as shown here:

The latest version of the Azure portal includes a management tool that you can use to manage your SQL Azure database. To launch this, simply select the database you want to manage in the Azure portal and click Manage in the ribbon. The first time you do this you’ll be prompted to accept the terms and conditions for the management tool, and then the tool itself will open in a new browser window and prompt you to log in as shown here:

You can log in using the administrator credentials you specified when you provisioned the SQL Azure server, and start the manage the database by using the management user interface as shown here:

Additionally, if you have installed SQL Server Management Studio for any edition of SQL Server 2008 R2 (including the free Express edition), you can connect to your SQL Azure server and manage it by specifying the fully-qualified server name and SQL Server login credentials as shown here:

Note that the server name takes the form YourSQLAzureServerName.database.windows.net, and the login name takes the form YourAdminLoginName@YourSQLAzureServerName.

SQL Azure servers are displayed in SQL Server Management Studio like this:

From here, you can manage your SQL Azure database in a similar way to how you manage on-premise SQL Server instances, though you’ll find that there are some SQL Server features that are not supported by SQL Azure. You can also connect to your SQL Azure database from client applications in a similar fashion to on-premise SQL Server databases by specifying the fully-qualified SQL Azure server name in the connection string.

For more information about SQL Azure, see http://msdn.microsoft.com/en-us/library/ff937661.aspx.

del.icio.us Tags: SQL Azure

Creating a User-Defined Server Role in SQL Server “Denali”

2010-12-28T15:56:00.001+00:00

“Denali” is the code-name for the next release of Microsoft SQL Server, and a community technology preview (CTP) is available for download from here. My colleague Geoff Allix has already posted a couple of articles about the enhancements Denali includes for debugging Transact-SQL scripts here and here, and as the Content Master data platform team continues to investigate the CTP, I’m sure more posts will appear. In this post, I want to discuss a new feature that makes it easier to delegate server-level administrative tasks – user-defined server roles.

If you’re familiar with previous releases of SQL Server, you’ll know that there are essentially two levels of security principal within SQL Server (well alright, 3 if you include the operating system) – server-level principals, such as logins, and database-level principals, such as users. Permissions can be granted to these principals in order to allow them to use or manage resources (generally known as securables) at the relevant level. For example, you can grant permissions on server-level securables (such as endpoints and certificates) to server-level principals, and you can grant permissions on database-level securables such as (tables and views) to database-level principals. Obviously, managing permissions for individual principals can become complex (and error-prone) as the number of principals increases, so in common with most software systems, SQL Server supports the idea of grouping principals into roles, enabling you to grant the required permissions to the role, and simply add or remove principals from the role in order to allow or disallow them access to the securables.

So far, so ordinary.

Previous releases of SQL Server included a pre-defined set of server-level roles and database-levels roles that are already granted commonly required permissions, and to which you can simply add your principals (for example, logins at the server level or users at the database-level) in order to quickly enable people to access the resources they need while maintaining the principle of “least privilege” (i.e. not granting any permissions to anyone who doesn’t require them). Additionally, you can create your own user-defined database-level roles but crucially, until SQL Server “Denali” you could not create your own user-defined server-level roles.

To understand how the ability to create and manage your own server-level roles is useful, let’s consider a scenario where a corporation uses a SQL Server instance to host multiple application databases. Many of these databases are used by internal “home grown” ASP.NET Web applications or client/server applications that use Windows integrated authentication, and to control access to these databases, the DBA has simply created logins in SQL Server for the appropriate Windows Active Directory groups. However, the environment also includes a couple of off-the-shelf applications that do not support Windows-integrated authentication, and therefore require their own SQL Server logins. Let’s also suppose that these applications are supported by team of dedicated application administrators who need to be able to manage the SQL Server logins for the applications, for example to periodically change the password.

To accomplish this, I can create a user-defined server role by right-clicking the Server Roles folder in SQL Server Management Studio and clicking New Server Role, as shown below. Alternatively, I can use the new CREATE SERVER ROLE Transact-SQL statement.

Using the SQL Server Management Studio UI reveals the New Server Role dialog box, enabling me to define the server role. In this case, I want to create a role named SQLAccountsAdmin, which will be owned by the built-in sa login. I can also specify the server-level securables I want to assign permissions for, and I can select each securable and set the required permissions. In this case, I’ve selected the AcctsPackage and AppSvcAccount logins (yes, principals can also be securables!) and granted the full set of available permissions on these logins to the SQLAccountsAdmin role.

To grant permissions to a user-defined server role by using Transact-SQL, you can use the GRANT, DENY, and REVOKE Transact-SQL commands just like you would for any other server-level principal.

Now I need to add some server-level principals to the role, so that they can use their role membership to gain the permissions required to manage the two SQL Server logins. You can do this on the Members tab of the dialog box or by using the ALTER SERVER ROLE Transact-SQL statement.

Finally, it’s worth noting that you can nest user-defined server roles within other server-level principals, including the fixed server roles provided out-of-the-box by SQL Server. In general, I’d advise against this as you can often find yourself granting unnecessary and unintended permissions, but it’s shown here for completeness.

So, there you have it – user-defined server roles in SQL Server “Denali” provide a flexible way to delegate administrative tasks at the server-level.

Installing SharePoint 2010 on Windows 7

2010-12-24T18:22:00.001+00:00

I generally do most of my development and “technology exploration” in an environment that reflects the actual production environment as closely as possible – for example, by developing against multiple virtual servers running Windows Server 2008 in a domain configuration. This approach has the advantage of reducing the opportunity for “well, it works on my laptop” style configuration issues when trying to deploy the application into production, but, let’s be honest, it makes life difficult – especially when the “real world” configuration requirements are as onerous as those of SharePoint-based solutions.

Microsoft has documented a way to deploy SharePoint 2010 on a single Windows 7 (or Vista if you prefer) development box, so when I recently needed to do some basic SharePoint development, I decided to ignore my existing virtualized, multi-server SharePoint development and testing environment, and try out Microsoft’s instructions for creating a single-box development environment . For the most part, this went OK, but I did hit a few issues along the way, so I thought it might be useful to document my experience.

First, I installed Windows 7 (64-bit, since SharePoint is 64-bit only!) and then downloaded Microsoft SharePoint Foundation 2010. The download is an executable named SharePointFoundation.exe, which you can simply run if you intend to install on the supported Windows Server platform, but which you need to extract to the file system in order to install on Windows 7 (or Vista). For example, to extract the installation files to a folder named C:\SharePointFiles, I used the following command:

SharePointFoundation /extract:c:\SharePointFiles

Next, I needed to edit the config.xml file provided with the SharePoint files, and add a <Setting> entry to enable installation on a client OS, as shown below:

The SharePoint installation files include a tool to automatically install and configure SharePoint prerequisites, but this only works on the supported Windows Server OS – you can’t use it on Windows 7, so you need to install and configure the prerequisites manually. The first of these is the Microsoft Filter Pack, and it’s included in the extracted files, as shown here:

Links to the remaining prerequisites are in the Microsoft documentation, and I simply downloaded and installed the ones I required for SharePoint Foundation on a Windows 7 machine (which included the Sync Framework, the SQL Server 2008 Native Client, and the Windows Identity Foundation).

Next I needed to enable all of the IIS features that SharePoint requires. Microsoft provide the following command, which you can copy to a command prompt window (on a single line) and execute.

start /w pkgmgr /iu:IIS-WebServerRole;IIS-WebServer;IIS-CommonHttpFeatures;
IIS-StaticContent;IIS-DefaultDocument;IIS-DirectoryBrowsing;IIS-HttpErrors;
IIS-ApplicationDevelopment;IIS-ASPNET;IIS-NetFxExtensibility;
IIS-ISAPIExtensions;IIS-ISAPIFilter;IIS-HealthAndDiagnostics;
IIS-HttpLogging;IIS-LoggingLibraries;IIS-RequestMonitor;IIS-HttpTracing;IIS-CustomLogging;IIS-ManagementScriptingTools;
IIS-Security;IIS-BasicAuthentication;IIS-WindowsAuthentication;IIS-DigestAuthentication;
IIS-RequestFiltering;IIS-Performance;IIS-HttpCompressionStatic;IIS-HttpCompressionDynamic;
IIS-WebServerManagementTools;IIS-ManagementConsole;IIS-IIS6ManagementCompatibility;
IIS-Metabase;IIS-WMICompatibility;WAS-WindowsActivationService;WAS-ProcessModel;
WAS-NetFxEnvironment;WAS-ConfigurationAPI;WCF-HTTP-Activation;
WCF-NonHTTP-Activation

This enables the required features, which you can verify in the Windows Features Control Panel applet as shown below:

Now I was ready to install SharePoint Foundation. I ran Setup.exe and chose the Standalone installation option:

After installation is complete, I was prompted to run the SharePoint Product Configuration wizard, and this is where the wheels fell off! The Standalone installation of SharePoint includes the installation of a SQL Server 2008 Express database server instance (named SHAREPOINT) to host the configuration database, but somewhat annoyingly, you need to apply the Microsoft SQL Server 2008 KB 970315 x64 hotfix before you can run the configuration wizard. However, even after doing this, I still found that the SharePoint Products Configuration wizard failed to connect to the database server in order to create the configuration database. In desperation, I upgraded the SQL Server 2008 Express instance that had been installed to SQL Server 2008 R2 Express – still no luck.

My investigations resulted in finding a number of useful blog articles, which are listed below – none of these actually solved my specific problem, but they contain some really useful tips!

After some poking around, I discovered a command-line version of the configuration wizard in the C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\BIN folder named psconfig.exe, and by examining its parameter info I discovered a standaloneconfig value for the cmd parameter, as shown below:

This seemed to solve my problem, and I now have a fully configured SharePoint Foundation 2010 environment on a Windows 7 virtual machine, as shown below.

All-told, it took me the best part of an afternoon to create my “simple” SharePoint development environment – but to be fair, a large percentage of that was spent scrabbling around to try to figure out how to get the configuration wizard to work. Hopefully, your installation will go a little more smoothly!

Happy Holidays!

del.icio.us Tags: SharePoint

Drupal and SQL Server

2010-11-23T14:41:00.001+00:00

One of the more “out of left field” projects I’ve worked on recently involved creating a sample solution that demonstrates how you can integrate Drupal 7 with SQL Server. For those who don’t know, Drupal is an open source content management system that’s widely used on Linux/Apache sites. In recent times, Microsoft has sought to widen the appeal of its Web development platform by adding support for applications and technologies normally used by LAMP (Linux, Apache, MySQL, and PHP) developers, and a version of Drupal that runs in IIS on Windows and which uses SQL Server as its underlying content database is one outcome of that initiative.

Anyway, Content Master was asked to create two sample Drupal solutions to showcase the advantages of using SQL Server with Drupal – one that integrates Drupal with SQL Server Reporting Services, and another that incorporates location-based content with SQL Server’s spatial data support. I ended up working on this project together with a couple of colleagues named David Miles and James Millar – I designed and implemented the SQL Server reports as well as the spatial data and Bing Maps functionality, and David and James handled the PHP programming and Drupal-specific scripting elements of the solutions.

You can view more details of the Reporting Services solution and the Spatial Data solution on the Content Master blog.

Enjoy!

Using a Transparent Background in Reporting Services

2010-10-13T11:35:00.001+01:00

While watching the Japanese Formula 1 Grand Prix on Sunday, it struck me that TV sports broadcasters make a lot of use of transparent overlays when showing scores, results, times, statistics, or whatever. In the case of the Grand Prix, the driver rankings in the world championship were displayed on a semi-transparent overlay, behind which the live footage of the race circuit could be seen.

So, naturally I started to wonder how I could achieve a similar visual effect in a Reporting Services report, like this:

My first thought was to look at the BackgroundColor property of the Tablix data region and set the Transparency level. However, when I looked at the color picker control for the property, this is what I saw:

Note that the Transparency control is disabled. It turns out you can only set a transparency level for gauges and charts in Reporting Services – not for shapes or data regions. So, I needed to find an alternative approach.

The answer I came up with was to create a semi-transparent .png graphic, and use it as the background image for the data region. I created this with PowerPoint, though of course you can use any graphics tool you like. I also used PowerPoint to find a suitable clipart image to use as the background for the report (on which the semi-transparent data region will be overlaid). In this case, I’m using the Adventure Works Cycles sample data, so a photo of a cyclist seems like a good choice.

You can take one of two approaches when it comes to sizing the semi-transparent image – you can make an extremely small image and then set the BackgroundRepeat property of the data region to Repeat, or you can make it bigger than the data region is ever likely to be and set the BackgroundRepeat property to Clip (or Repeat – it won’t matter since the image will be bigger than the data region anyway!). I found that PowerPoint tends to add some whitespace to the edge of a shape when you save it as a .png image, which showed up when repeating the background image, so I went with a large background image. Of course, had I used a more comprehensive graphics tool, could have easily avoided this issue and got away with repeating a smaller image.

To embed the images to the report, I added them to the Images folder in the Report Data pane in Report Designer.

Then I set the BackgroundImage property of the tablix data region in which the report data is displayed, like so:

I’ve also used the semi-transparent image as the background for the report title textbox, which appears above the tablix data region.

The next challenge was to apply the cyclist image to the background of the report, and ensure that the layout of the report overlays the data neatly. If you have a small data set with a known number of records(for example, in a “top 10 products” report), then this is relatively straightforward. However, for a dataset with an unknown size, the data region will be resized dynamically, and automatic pagination may break the report into multiple pages. In my case, I want to ensure that the report title appears on all pages, and that the table of data has a suitable space above and below it on all pages.

To accomplish this, I added a page header and footer to the report and put the report title in the header. This ensures that if the report is paginated, the table on the second (and all subsequent pages) doesn’t start right at the top of the page. Similarly, the report footer ensures that there’s always a space after the table – it never goes all the way to the bottom of the page. I set the BackgroundImage of the report to the cyclist picture (clipped so it doesn’t repeat), and I set the InteractiveSize property of the report so that when viewed in the browser, the report has a maximum size that will keep the tablix well within the background image area. This was made tricky by the fact that Report Designer does not show the background image of the report in design view, so I had to preview the report and assess the right size through trial and error.

Obviously, the report size is optimized for interactive viewing, and though you can set the PageSize property of the report to an appropriate size for any other renderers you plan to use, my experience is that using background images and contrived layouts in reports you intend to render to a different format can result in some pretty horrible looking exported reports. One solution I have used in the past for this is to create the version that’s tailored for online viewing, and include a link to an offline version that has more conventional formatting for printing or exporting.

You can download the sample report I created from here. You’ll also need SQL Server 2008 R2 with Reporting Services (you can get the free Express edition from here) and the AdventureWorksDW2008R2 sample database (which you can get from here).

del.icio.us Tags: SQL Server,Reporting Services

Geocoding (and reverse-geocoding) with Bing Maps

2010-08-21T19:10:00.001+01:00

In previous posts, I’ve explored how to use Bing Maps with SQL Server spatial data. In this post, I want to explore the Bing Maps control a little more. Specifically, I want to look at how to use the Bing Maps control to geocode a street address (that is, find the latitude and longitude coordinates of the address), and how to reverse-geocode a spatial location to find the corresponding street address.

The first step in building a Web page that uses the Bing Maps control to geocode an address, is to add a <div> tag to host the map control and use the page body’s onload event to call a JavaScript function that loads the map – like this:

<head>


<script type="text/javascript"
        src="http://dev.virtualearth.net/mapcontrol/mapcontrol.ashx?v=6.3">
</script>

<script type="text/javascript">
    function GetMap() {
        map = new VEMap('mapDiv');
        map.LoadMap();
    }
</script>

</head>
<body onload="GetMap()">

    <div id="mapDiv" style="position:relative; width:600px; height:400px;">
    </div>
</body>

Next, you need a textbox so that users can enter the address they want to geocode, a button they can click to geocode the address, and two textboxes to show the resulting latitude and longitude coordinates.

Address:<input id="txtAddress" type="text" style="width:340px" />
<input style="width:60px" id="btnFind" type="button" value="Find" onclick="return btnFind_onclick()" />
Latitide:<input id="txtLat" type="text" style="width:400px" />
Longitide:<input id="txtLong" type="text" style="width:400px" />

Note the onclick property of the button control, this calls the function that uses Bing Maps to geocode the address. Here’s the code to do that:

function btnFind_onclick() {
//Geocode the address to find the Lat/Long location
map.Geocode(document.getElementById("txtAddress").value, onGeoCode, new VEGeocodeOptions())
}

Note that the code in the btnFind_onclick function calls the Geocode method of the map control, specifying the address to be geocoded, the name of the callback function to use to process the results (onGeoCode), and a VEGeocodeOptions object that ensures the user is shown a list of options when the address has multiple possible matches. The calback function looks like this:

function onGeoCode(layer, resultsArray, places, hasMore, veErrorMessage) {
    var findPlaceResults = null;

    // verify the search location was found
    if (places == null || places.length < 1) {
        alert("The address was not found");
    }
    else {
        // we've successfully geocoded the address, so add a pin
        findPlaceResults = places[0].LatLong;
        addPinToMap(findPlaceResults);
    }
}

The callback function is called when the geocode method returns, and assuming a location has been found the JavaScript calls the following addPinToMap function to display the results:

function addPinToMap(LatLon) {
    // clear all shapes and add a pin
    map.Clear()
    var pushpoint = new VEShape(VEShapeType.Pushpin, LatLon);
    map.AddShape(pushpoint);

    // center and zoom on the pin
    map.SetCenterAndZoom(LatLon, 13);

    // display the Lat and Long coordinates
    document.getElementById("txtLat").value = LatLon.Latitude;
    document.getElementById("txtLong").value = LatLon.Longitude;
}

This adds a pin to the map and centers and zooms to ensure it can be seen clearly. It then displays the latitude and longitude in the textboxes defined earlier.

We now have all the code required to geocode an address, but what about the opposite? Ideally, we also want the user to be able to click a location on the map and reverse-geocode the point that was clicked to find the address.

Of course, the Bing Maps map control already responds to user clicks, so our application will use right-clicks to enable users to specify a location. We’ll do this by attaching an onclick event handler to the map control in the GetMap function (which you will recall is called when the page loads to display the map), and then checking for a right-click before reverse-geocoding the clicked location:

// added to the GetMap function
map.AttachEvent("onclick", map_click);

function map_click(e) {
    // check for right-click
    if (e.rightMouseButton) {
        var clickPnt = null;

        // some map views return pixel XY coordinates, some Lat Long
        // We need to convert XY to LatLong
        if (e.latLong) {
            clickPnt = e.latLong;
        } else {
            var clickPixel = new VEPixel(e.mapX, e.mapY);
            clickPnt = map.PixelToLatLong(clickPixel);
        }

        // add a pin to the map
        addPinToMap(clickPnt)

        //reverse-geocode the point the user clicked to find the street address
        map.FindLocations(clickPnt, onReverseGeoCode);
    }
}

This code finds the latitude and longitude of the clicked location (in some views, the map control uses X and Y pixel coordinates so we need to check for that), displays a pin on the map at the clicked location, and then uses the FindLocations method of the map control to find the address. A callback function named onReverseGeoCode is used to process the results:

function onReverseGeoCode(locations) {
    // verify the search location was found
    if (locations == null || locations.length < 1) {
        document.getElementById("txtAddress").value = "Address not found";
    }
    else {
        // we've successfully found the address, so update the Address textbox
        document.getElementById("txtAddress").value = locations[0].Name;
    }
}

The completed application looks like this:

You can try the page out for yourself here, and you can download the source code from here.

del.icio.us Tags: Bing Maps

Creating Multi-Sheet Workbooks with SQL Server 2008 R2 Reporting Services

2010-06-08T13:42:00.001+01:00

One thing I’ve learned in over ten years of creating database and reporting solutions, is that no matter how dynamic and interactive you make online reports, no matter how much you embed live reporting into the user interface of applications, and no matter how funky a dashboard you design; many executives don’t believe data is real unless it’s in a spreadsheet. That’s why one of the most used features of Reporting Services is the ability to render reports in Excel format.

However, I recently encountered a situation where my company hosts a Luminosity learning management system, and uses SQL Server Reporting Services to generate reports of student activity in Excel format. The number of students has grown substantially over time, and we hit an unforeseen problem – The Excel 2003 format that Reporting Services renders the reports in supports a maximum of 65,536 rows per worksheet, and the report (which shows students and all training they have completed) has grown to exceed this limit.

After some head scratching, I investigated enhanced new page-break support in SQL Server 2008 R2 and came up with a solution that works, and which can enhance the ability to create complex reports in Excel format for those pesky executives – so I thought I’d share it here.

Let’s imagine your executives want a report in Excel format that lists every customer, along with their contact details. If you have less than 65,537 customers, you could design a report that simply lists them in a worksheet, but if you have more customers than that (or you want to include headers, spaces, or other elements in your report that will use rows when rendered to Excel), then you’ll need a better solution. Ideally, you might want to create something like this – an Excel workbook with multiple worksheets, consisting of a generic “cover page” and a tab for each letter of the alphabet so that you can view customers by last name.

You can download a copy of this workbook from here.

Each worksheet in the workbook lists customers with a last name that begins with the letter on the corresponding worksheet tab, as shown here:

To create this report, I used the AdventureWorks2008R2 sample database (which you can download from here) and the following Transact-SQL query:

SELECT Title, FirstName, LastName, AddressLine1, City, StateProvinceName, PostalCode, CountryRegionName
FROM Sales.vIndividualCustomer

The report includes a tablix data region that includes a details grouping (in which all fields are displayed) and a grouping based on the following expression (which returns the first character of the LastName field in upper-case):

=ucase(left(Fields!LastName.Value, 1))

I also added an image and a textbox to the report, and placed them above the tablix data region as shown here:

To create the page breaks that generate the worksheets when rendered to Excel, I’ve used some of the new page-break support in SQL Server 2008 R2. First of all, I’ve set the report’s InitialPageName property to Customer Addresses, as shown here:

This property defines the default name for the first page of the report (or for all pages if no explicit page breaks with page names are defined). That’s why in the Excel workbook, the “cover page” has this name on its worksheet tab (if the InitialPageName property wasn’t set, the worksheet tab would show the the report name).

Next, I created a page break at the start of the tablix as shown here:

This causes the data in the table to be displayed on a new page, effectively defining the “cover page” as “everything before this”.

Finally, I used the properties of the grouping I defined earlier to create a page break between each instance of the grouping, and apply a page name based on the same expression used to define the grouping. In other words, there will be a page for each first character of the LastName field, and the page name for this page will be the grouping character.

You can download the complete solution from here. You’ll need to have an instance of SQL Server 2008 R2 with the AdventureWorks2008R2 database (the DataSet in the report assumes that this is in the default instance of SQL Server 2008 R2 on the local computer).

Exporting this report to Excel creates the desired multi-sheet workbook, with a tab for each initial character of the last name, and a “cover page”.

Hopefully, you can see from this article how easy it is to create multi-sheet workbook reports that will add value to your reporting solutions.

Technorati Tags: SQL Server 2008 R2,Reporting Services,Excel

First Steps with the Silverlight Bing Maps Control

2010-01-23T22:53:00.001+00:00

A while back, I posted an article about displaying spatial data from SQL Server with what was then called the Virtual Earth Maps control. The article demonstrated an application that retrieves information about locations visited by a toy stuffed bear named Beanie, and displays those locations on a map. Since then, the Virtual Earth Map control has been renamed Bing Maps, and a Silverlight version of the map control is now available – so naturally, the time has come to update the Beanie Tracker application.

Unlike the Javascript version of the Bing Maps control, to use the Silverlight Bing Maps control, you need to sign up at the Bing Maps Account Center and obtain a key. However, this is a straightforward process (and free!). Once you have a key, you can create Silverlight applications that display and manipulate the Bing Maps control. To do this, download and install the Bing Maps control. Then create a new Silverlight application and add a reference to the assemblies provided with the control as shown here:

Now that you have a reference to the Map control, you can add its namespace to a XAML UserControl and include a map object in the Silverlight user interface as shown here, referencing the key you obtained from the Bing Maps Account Center:

        <Grid.RowDefinitions>
            <RowDefinition Height="Auto"/>
        </Grid.RowDefinitions>
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="200" />
            <ColumnDefinition Width="*"/>
        </Grid.ColumnDefinitions>

<m:Map Name="map" Grid.Column="1" Grid.Row="0" CredentialsProvider="YOUR_KEY" Width="475" Height="300" />

</Grid>
</UserControl>

Adding the map control displays a Bing Maps map in Silverlight user interface, enabling users to view the map and interact with it through it’s built in controls for changing the zoom level or view, and moving around the map. However, to add custom functionality, you need to write some code to manipulate the map control.

The Silverlight map control exposes a number of objects with properties and methods you can control programmatically, though some of the functionality in the Javascript version of the control has not been implemented in the Silverlight version. Unfortunately, the functionality that enables you to import a GeoRSS feed as a ShapeLayer onto the map is not implemented in the Silverlight control, so a simpler version of the Beanie Tracker application is required. In this version, I’ve written code to retrieve the GeoRSS feed, and then parse the XML feed and create a pushpin for each GML pos element, as shown here:

private void b1_Click(object sender, RoutedEventArgs e)
{
     Uri url = new Uri("../Feed.aspx?data=locations", UriKind.Relative);
     WebClient client = new WebClient();
     client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
     client.DownloadStringAsync(url);
}

void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
     if (e.Error == null)
     {
         StringReader stream = new StringReader(e.Result);
         XmlReader reader = XmlReader.Create(stream);
         string gmlURI = "http://www.opengis.net/gml";

         while (reader.Read())
         {
             if (reader.NodeType == XmlNodeType.Element)
             {
                 if (reader.NamespaceURI == gmlURI && reader.Name == reader.Prefix + ":pos")
                 {
                     string[] loc = reader.ReadInnerXml().Split(" ".ToCharArray());
                     double lat = Double.Parse(loc[0]);
                     double lon = double.Parse(loc[1]);
                     Pushpin p = new Pushpin();
                     p.Location = new Location(lat, lon);
                     map.Children.Add(p);
                 }
             }
         }
     }
}

You can see the resulting application at http://www.graemesplace.com/beanietracker.aspx.

Data-Tier Applications in SQL Server 2008 R2

2010-01-04T16:33:00.001+00:00

In a previous post, I discussed some of the new multi-server management capabilities in SQL Server 2008 R2. One of the new features I conspicuously side-stepped covering in that post is the concept of a data-tier application – and that’s what I want to describe in this post.

Data-tier applications provide a useful way to encapsulate all of the logical and physical components of an application that need to be deployed and managed as a unit on a SQL Server instance. For example, consider a typical business application. It probably consists of a number of tiers, including a presentation tier (which might be a Windows Form application or an ASP.NET Web application), a middle-tier (for example a library of .NET assemblies that provide objects to manage the business logic of the application), and a data-tier. The data-tier consists primarily of a logical database (and all the schemas, tables, views, and so on it contains) but it also includes server-level objects (such as any logins that the middle-tier uses to connect to the database server) and the physical database and log files used to store the database.

In the past, deploying or migrating the data-tier of an application has involved examining the database to find its server-level dependencies and physical storage properties, moving the database from its test/staging server to the production server (via backup and restore, SSIS, or a Transact-SQL script to recreate the database schema and data – taking into account any differences in physical storage media), and creating a script to recreate any server-level objects used by the database.

In SQL Server 2008 R2, this task has been simplified through the concept of data-tier applications. Software developers using Visual Studio 2010 will be able to create data-tier applications that encapsulate the entire data tier, or alternatively you can use new wizards in SQL Server Management Studio to create a data-tier application from an existing database, and deploy a data-tier application to a new database.

To create a data-tier application from an existing database, right-click the database you want to package and start the data-tier extraction wizard as shown in the following screenshot.

This opens the following wizard screen:

The first step is to set the properties of the data-tier application (note that the wizard uses the abbreviation “DAC” – technically, this stands for “Data Tier Application Component”, which you can think of as a unit of deployment, or a deployable package for a data-tier application. The term “data-tier application” is usually taken to mean a deployed instance of a DAC.

As well as standard properties such as a name, version, and description for your data-tier application, you specify the file location where the DAC package should be created.

The wizard then examines the database and its dependencies, and lists the objects that will included in the DAC. In the November CTP, not all database objects are supported in DACs – for example, you can’t include columns with spatial data types such as geometry or geography. The list of supported objects will no doubt expand over time. In this example, the wizard has identified the database objects included in the database, and also the users and associated logins that are required.

Finally, the wizard builds the package for the DAC. The package itself is a single file with the extension .dacpac, as shown here:

This file is actually a zip archive that contains a number of XML files describing the components of the DAC. If you append a.zip extension to the filename, you can examine these files as shown here:

To deploy the data-tier application defined in the DAC, right-click the server you want to deploy it to and click Deploy Data-tier Application:

This starts another wizard, as shown here:

The first step is to select the DAC package file you want to deploy:

Then you can change the database name and file locations if desired.

The wizard summarizes the settings, …

…,and then deploys the data-tier application to the server.

You can then use SQL Server Management Studio to confirm that the database and any dependent objects has been deployed. In this case, you can see that the MyAppLogin login has been recreated on the target server along with the database.

This ability to treat the entire data-tier as a single, encapsulated package should simplify database application deployment and management significantly.

del.icio.us Tags: SQL Server 2008 R2