Thursday, 25 July 2013

Power Query for Excel Demo

A couple of weeks ago, I posted a demo that instructors of Microsoft Learning course 20467B can use to demonstrate the Data Explorer add-in for Excel 2013. Since then, Microsoft has rebranded Data Explorer as “Power Query”, and announced that it will form part of the Power BI capabilities in Office 365.

A new version of the add-in is now available here so I’ve updated the demo steps, which you can download from my SkyDrive folder. Other than the renaming of the Data Explorer tab on the ribbon to Power Query, the steps are much the same as they were before, so the following video is still a reasonably good guide to the tool.

Friday, 12 July 2013

Migrating SQL Server Databases to Windows Azure

When the IT industry started getting excited about this thing called “the cloud” a few years ago, there were many (myself included) who were sceptical about the willingness of organizations to abandon their own on-premises infrastructure and start moving business applications to the web. At first, the idea of relying on IT services hosted and managed by some Internet provider seemed  dubious at best – sure, individual consumers could use an email server provided by their ISP, but businesses need to manage their own IT. Don’t they?
Then, gradually, we started to make concessions.
…OK, maybe a a hosted Exchange Server would reduce some administrative overheads.
…Yes, maybe using a hosting service can provide better availability for some Web applications.
…Alright, using a software-as-a-service solution for CRM might reduce licensing and hardware costs.
Fast forward to today, and we’ve come a long way – to the point where even that stalwart of enterprise software packages, Microsoft Office, is now delivered to 1 in 4 of Microsoft’s enterprise customers in the form of the Office 365 cloud service rather than as on-premises desktop software. Despite the initial doubts of the naysayers, it’s beginning to look like this cloud thing might just catch on after all! Most of the applications we use every day, as consumers and increasingly as employees, are delivered as cloud services that we can consume anywhere and on an increasing array of mobile devices. Organizations are seeing this extended reach, while at the same time reducing overheads for hardware, software licensing, maintenance, and other costs – A win-win scenario if ever I saw one.
However, there’s always been one section of the IT community that is even more conservative than the Finance department. One group of IT professionals that that regards even the smallest change with deep suspicion. One last bastion of fierce resistance to new fangled trends. I’m talking of course, about database administrators. You can move my Exchange Server to the cloud. You can store all of our documents in SharePoint Online. You can even deliver Office applications through a browser. But you’ll have to take my on-premises database from my cold, dead hands!
But even that resistance is beginning to crumble. It makes sense for web-hosted applications to store their data on the web, and as more and more IT services are moved to the cloud, it also makes sense to include traditional business application data stores as a part of that migration. They key issues that need to be addressed are:
  • Can the data be moved to the cloud without compromising security or compliance requirements?
  • Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
  • Can a hosted database meet our availability and disaster recovery requirements?
  • Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?
Then, assuming that the answer to those three questions is “yes” (and in many cases, it is), the only remaining question is:
  • Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?
Well, let’s see how Windows Azure shapes up in terms of these critical questions.
Windows Azure is Microsoft’s cloud platform, and it underpins many of the cloud services that the company offers. It also provides Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service(IaaS) solutions that organizations can use to build new solutions, and migrate existing IT services to the cloud. In terms of database offerings, there are two primary options to consider:
  • Windows Azure SQL Database – a PaaS offering that enables you to host data in a (mostly) SQL Server-compatible service without having to worry about configuring and managing hardware or the operating system.
  • SQL Server in a Windows Azure Virtual Machine – an IaaS offering that is exactly what it sounds like – a virtual machine running Windows with SQL Server installed in it.

Windows Azure SQL Database

OK, let’s start with Windows Azure SQL Database. This was formerly known as SQL Azure, and provides many of the data storage and management features of SQL Server without the need to manage the operating system or SQL Server instance. If your database consists of traditional relational data in tables, with views and stored procedures used to provide a layer of abstraction, then Windows Azure SQL Database may well be a good option for you. It can’t support “special” data, such as spatial or XML data types, and there are a few other limitations see General Guidelines and Limitations (Windows Azure SQL Database) on MSDN for details; but it supports all of the functionality required by a large percentage of typical business application databases.
So, how does it stack up against the questions we asked earlier?
Can the data be moved to the cloud without compromising security or compliance requirements?
It depends on your specific requirements; but network connectivity to the database can be restricted to a specific range of IP addresses, and can be performed over SSL connections. Client requests are authenticated using SQL Server native  authentication based on a login and password, and the same permissions-based authorization scheme used in SQL Server is used to control access to tables and other database objects. In terms of compliance policies, you have control over which geographic region in which the data center hosting your Windows Azure SQL database server is located.
Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
You specify the size of your database as you create it, but it can grow to a maximum of 150 GB. Additionally, you can use federations to partition data across multiple databases to increase scalability and performance.
Can a hosted database meet our availability and disaster recovery requirements?
Windows Azure provides built-in resiliency by internally replicating your database across three redundant storage locations within the data center where your server is hosted. You can back up a SQL Database by copying it to another SQL Database.
Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?
Because SQL database is a PaaS offering, much of the physical server administration you would typically need to manage for a SQL Server instance is handled for you by Windows Azure. For logical administration tasks, such as managing users or creating database objects, you can use SQL Server Management Studio to connect to Windows Azure SQL database – enabling you to manage on-premises SQL Server instances and cloud-based Windows Azure SQL Database instances in the same tool.
Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?
Well, let’s take a look at this demonstration and find out:

As you can see, it’s really pretty straightforward to migrate an on-premises SQL Server database to Windows Azure SQL Database. If your database doesn’t depend on any features of SQL Server that aren’t supported by Windows Azure SQL Database, and you’re happy to let Windows Azure look after the physical configuration of your database server, then Windows Azure SQL Database is a good option for you.

SQL Server in a Windows Azure Virtual Machine

So, what about applications where you need to support SQL Server capabilities that aren’t available in Windows Azure SQL Database? Or where you specifically want control over the operating system and server-level configuration of your database server?
In this case, provisioning a virtual machine in Windows Azure might be a better option. There are a number of pre-defined virtual machine images, some of which include an installation of SQL Server; and if none of them suits you, there’s always the option to create your own and install whatever software you require. So how does this option meet our data migration requirements?
Can the data be moved to the cloud without compromising security or compliance requirements?
As with Windows Azure SQL Database, you can choose the geographic region of the data center where your virtual machine will be hosted. Access to the server is controlled through an endpoint that you must define for your Windows Azure virtual machine, and all network connectivity to the virtual machine can be restricted by using Windows firewall. You can use Windows Azure virtual networking to integrate virtual machines in Windows Azure with your corporate Active Directory infrastructure, and use Windows authentication to connect to SQL Server. Or alternatively, you can use SQL Server native authentication through logins and passwords, or even set up certificate-based authentication – exactly the same as with an on-premises instance of SQL Server. Additionally, you can make use of SQL Server’s security-related capabilities such as transparent database encryption and auditing.
Can a hosted solution cope with the volume of data in our databases, and support our transactions without compromising performance?
When you provision a virtual machine in Windows Azure, you can specify the number of virtual cores and the amount of memory allocated to the VM. At the moment, the largest VM available has 8 cores and 56 GB of memory, but I’d expect that to get larger over time. The VM uses Windows Azure storage  for its virtual hard disks, and you can add multiple VHDs and use filegroups to stripe data across them. This technique has been shown to improve IOPS performance.
Can a hosted database meet our availability and disaster recovery requirements?
As with all Windows Azure storage blobs, the VHDs for the VM are replicated across three redundant physical data storage devices in the data center. Additionally, you can use SQL Server HA capabilities, such as AlwaysOn Availability Groups to protect against failure of a VM. You can back up databases in a Windows Azure VM just as you would for an on premises instance of SQL Server, and use the SQL Agent to automate backup tasks on a scheduled basis.
Can we migrate some, but not all, of our databases – and still retain centralized, consistent administration of all data stores?
SQL Server in a virtual machine in Windows Azure is still just SQL Server. You can use SQL Server Management Studio to connect to it, and you can use all of the same management tools and agents you use for your on-premises database servers.
Can we migrate our databases with minimal interruption of service, and without breaking our existing applications?
Once again, here’s a demonstration:

Note that this demonstration is based on pre-release software, and may not reflect what actually ships with SQL Server 2014. However, it’s clear that the intention is to include a simple, wizard-based tool that will help you easily migrate on-premises SQL Server databases to Windows Azure virtual machines.

Conclusion

Migration of IT services to the cloud is inevitable. There are simply too many cost, scalability, and mobility advantages to justify not doing it. However, I don’t think it will happen in one big mass movement – and in particular, I think corporate databases will be among the last elements to be migrated. For at least a while, probably many years, we’ll be living in a hybrid world where some data is managed on-premises, and other data is moved to the cloud. To support that scenario, we need tools and technologies that make it easy to move data from one place to the other, and to manage it consistently wherever it’s hosted.
The combination of SQL Server on-premises, Windows Azure SQL Database, and SQL Server in a Windows Azure virtual machine manages to pull this trick off well. With similar merging of private and public cloud network infrastructure support in Windows Server and System Center, the lines between “the cloud” and “the enterprise” are blurring to the point where, from an IT management perspective, it really doesn’t matter where a service is physically located.
If you want to learn more about Windows Azure database options, visit http://www.windowsazure.com/en-us/solutions/data-management/.

Friday, 5 July 2013

What the Heck is Hekaton?

SQL Server 2014 introduces a new in-memory OLTP capability that was previously known by its codename, “Hekaton”. The technology introduces two new concepts to SQL Server: memory-optimized tables and native stored procedures. This article explores these features and provides a simple demonstration of how to use them.

The idea of optimizing data access performance by using in-memory storage is not new. SQL Server has always used caching to keep recently accessed data in memory, and recent releases have seen the addition of in-memory technology for large volume data analytics (PowerPivot and tabular models in Analysis Services) and high-performance table indexes that primarily benefits data warehouse workloads (columnstore indexes). What’s new in SQL Server 2014 is the ability to optimize an entire table for in-memory storage, effectively eliminating disk i/o for CRUD operations and massively improving query performance.

Memory-Optimized Tables

Memory-optimized tables are tables that you define using CREATE TABLE statements, in a similar fashion to traditional disk-based tables. However, memory-optimized tables are different from disk-based tables in the following ways:

  • The CREATE TABLE statement is used to generate a C struct, which is in turn compiled into a DLL and loaded into memory.
  • All data for the table is stored in memory, and all operations on the data occur in memory. By default, memory-optimized tables are durable (so they’re persisted to disk in order to survive restarts and support high-availability); but when the database is online, the table is always accessed directly in memory with no need to read pages from disk.
  • Columns in memory-optimized tables are indexed using  hash indexes (range indexes may be supported in a later build), in which the result of hashing the  indexed value determines the in-memory “bucket” in which the row is stored. Rows with the same hashed value are stored as a linked list within the bucket.
  • Table data is persisted to disk as a stream, not in 8K pages like a traditional table. The data must be stored in a filegroup that is created with the CONTAINS MEMORY_OPTIMIZED_DATA option. Indexes are not persisted, and will be regenerated in the event of a restart.
  • Some data types -  notably text, image, and nvarchar(max) - are not supported. Similarly some features such as identity columns and foreign-key constraints cannot be used in memory-optimized tables.

Native Stored Procedures

Memory-optimized tables can co-exist with disk-based tables, and you can execute Transact-SQL queries that contain joins between disk-based tables and memory-optimized tables. In fact, you can use Transact-SQL to query memory optimized tables just like any other table, so you can improve the performance of some workloads by changing existing disk-based tables to memory-optimized tables without breaking existing applications that query them. The ability to use regular Transact-SQL to query memory optimized tables is provided by an interop layer in the SQL Server engine that does the necessary work to convert Transact-SQL statements into C code that can access the in-memory data structures in the compiled DLL for the table.

However, if your application code only needs to access data in memory optimized tables, you can further improve performance by using native stored procedures. Native stored procedures are created using the familiar CREATE STORED PROCEDURE statement to define the Transact-SQL statements you want to execute. The code is then translated into C and compiled into a DLL, just like a memory optimized table. The DLL is then loaded into memory, and since the instructions it contains are now compiled as native machine code, execution performance is greatly improved. There are some limitations in this release, and only the most commonly used Transact-SQL statements and functions are supported in native stored procedures; but for a large percentage of common database workloads, you should find that using memory optimized tables and native stored procedures can significantly improve application performance.

The following demonstration shows how to use memory optimized tables and native stored procedures.

So, should you convert all of your tables and stored procedures to take advantage of this new technology? Probably not (at least, not yet). There are some workloads where the new in-memory capabilities will bring enormous benefits in terms of improved performance; but there are also some cases where current limitations prevent them from being used. Even when an existing disk-based table is fully compatible with a memory optimized schema, you may find minimal improvement for some i/o workloads.

The important thing to understand when planning to use (or not use) memory optimized tables, is that the performance benefit is not purely a result of storing the data in memory. After all, SQL Server does a pretty good job of caching commonly accessed data in disk-based tables anyway. The crucial difference to the way data in a memory optimized table is accessed is that no locks or latches are used to support concurrency. In a disk-based table, if multiple transactions need to access the data concurrently, locks are used to ensure consistency and avoid one transaction’s results being affected by the data modifications of another transaction. Although SQL Server does support row-level locking, transactions that affect multiple rows can quickly escalate locking to page-level – causing concurrency issues that affect query performance. This can be especially acute in tables with so-called “hotspots” – for example a table with a clustered index on an incrementing key value, where all new rows are inserted at the end of the table. Memory optimized tables do not use locks to manage concurrency. Instead, a form of row-versioning is used to track modifications to rows by multiple transactions; which in any case usually happen so quickly (sub-millisecond) that concurrency clashes are extremely rare. If the i/o pattern for your table typically incurs a lot of locking, then making the table memory optimized will probably improve performance. If not, then you may not benefit significantly from changing the table. As an example, in the video demo, the 500,000 inserts were wrapped in a single transaction – which when executed against the disk-based table incurred locking to support isolation for 500,000 atomic INSERT statements. When creating the demo, I noticed that removing the BEGIN TRAN and COMMIT statements that enclose the loop (so that the inserts were done as 500,000 independent INSERT statements) resulted in a much less significant difference in the time taken to load the disk-based table and the time taken to load the memory optimized table (typically, the memory optimized table was around 5-6 seconds quicker).

This article is based on the CTP 1 release of SQL Server 2014, and details are liable to change between now and the eventual release of the product. You can download the preview of SQL Server 2014 from http://www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx.

The Transact-SQL code used in this demonstration is available from here.

Wednesday, 3 July 2013

GeoFlow Demo for Course 20467B

Yesterday I posted a demo that Microsoft Certified Trainers can use in course 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012 to show students how to use the Data Explorer add-in for Excel 2013. GeoFlow is another new Excel add-in that you might want to demonstrate in class. It enables users to visualize data that includes geographic and temporal dimensions on an animated map, showing how data points in specific geographic locations change over time. You can get the GeoFlow add-in from the Microsoft Office web site, and you can download the demo steps in a PDF document from here.

Click the thumbnail below to view the demonstration in a new window/tab.

Note that GeoFlow is a preview release at the moment, and is subject to change in the future. The demo is provided as-is, and no support will be provided for it by Microsoft Learning or Content Master.

Enjoy!

 

Update: GeoFlow has been renamed Power Map, and forms part of the Power BI capability being added to Microsoft Office 365.

Tuesday, 2 July 2013

Data Explorer Demo for Course 20467B

Earlier this year we released Microsoft Official Curriculum course 20467B: Designing Business Intelligence Solutions with Microsoft SQL Server 2012. Since then, Microsoft has released a preview of Data Explorer, an add-in for Excel that enables users to browse and query data in a variety of sources. Data Explorer builds on the self-service BI techniques taught in course 20467B, and if you are an instructor delivering the course, you can add value by demonstrating how to use it within the context of a Microsoft-based BI solution. To help you, I’ve put together a simple demo that should be easy to set up and perform in the virtual machine environment for the course. It will probably work best towards the end of module 8 (Designing a Microsoft Excel-Based Reporting Solution), and you can download the steps in a PDF document from here.

Click the thumbnail below to view the demo.

Of course, bear in mind that Data Explorer is a preview release at the moment, and is subject to change in the future. The demo is provided as-is, and no support will be provided for it by Microsoft Learning or Content Master. Nevertheless, I hope you find it useful!

Update: Data Explorer has been renamed Power Query, and forms part of the Power BI capability being added to Microsoft Office 365.