Friday 5 July 2013

What the Heck is Hekaton?

SQL Server 2014 introduces a new in-memory OLTP capability that was previously known by its codename, “Hekaton”. The technology introduces two new concepts to SQL Server: memory-optimized tables and native stored procedures. This article explores these features and provides a simple demonstration of how to use them.

The idea of optimizing data access performance by using in-memory storage is not new. SQL Server has always used caching to keep recently accessed data in memory, and recent releases have seen the addition of in-memory technology for large volume data analytics (PowerPivot and tabular models in Analysis Services) and high-performance table indexes that primarily benefits data warehouse workloads (columnstore indexes). What’s new in SQL Server 2014 is the ability to optimize an entire table for in-memory storage, effectively eliminating disk i/o for CRUD operations and massively improving query performance.

Memory-Optimized Tables

Memory-optimized tables are tables that you define using CREATE TABLE statements, in a similar fashion to traditional disk-based tables. However, memory-optimized tables are different from disk-based tables in the following ways:

  • The CREATE TABLE statement is used to generate a C struct, which is in turn compiled into a DLL and loaded into memory.
  • All data for the table is stored in memory, and all operations on the data occur in memory. By default, memory-optimized tables are durable (so they’re persisted to disk in order to survive restarts and support high-availability); but when the database is online, the table is always accessed directly in memory with no need to read pages from disk.
  • Columns in memory-optimized tables are indexed using  hash indexes (range indexes may be supported in a later build), in which the result of hashing the  indexed value determines the in-memory “bucket” in which the row is stored. Rows with the same hashed value are stored as a linked list within the bucket.
  • Table data is persisted to disk as a stream, not in 8K pages like a traditional table. The data must be stored in a filegroup that is created with the CONTAINS MEMORY_OPTIMIZED_DATA option. Indexes are not persisted, and will be regenerated in the event of a restart.
  • Some data types -  notably text, image, and nvarchar(max) - are not supported. Similarly some features such as identity columns and foreign-key constraints cannot be used in memory-optimized tables.

Native Stored Procedures

Memory-optimized tables can co-exist with disk-based tables, and you can execute Transact-SQL queries that contain joins between disk-based tables and memory-optimized tables. In fact, you can use Transact-SQL to query memory optimized tables just like any other table, so you can improve the performance of some workloads by changing existing disk-based tables to memory-optimized tables without breaking existing applications that query them. The ability to use regular Transact-SQL to query memory optimized tables is provided by an interop layer in the SQL Server engine that does the necessary work to convert Transact-SQL statements into C code that can access the in-memory data structures in the compiled DLL for the table.

However, if your application code only needs to access data in memory optimized tables, you can further improve performance by using native stored procedures. Native stored procedures are created using the familiar CREATE STORED PROCEDURE statement to define the Transact-SQL statements you want to execute. The code is then translated into C and compiled into a DLL, just like a memory optimized table. The DLL is then loaded into memory, and since the instructions it contains are now compiled as native machine code, execution performance is greatly improved. There are some limitations in this release, and only the most commonly used Transact-SQL statements and functions are supported in native stored procedures; but for a large percentage of common database workloads, you should find that using memory optimized tables and native stored procedures can significantly improve application performance.

The following demonstration shows how to use memory optimized tables and native stored procedures.

So, should you convert all of your tables and stored procedures to take advantage of this new technology? Probably not (at least, not yet). There are some workloads where the new in-memory capabilities will bring enormous benefits in terms of improved performance; but there are also some cases where current limitations prevent them from being used. Even when an existing disk-based table is fully compatible with a memory optimized schema, you may find minimal improvement for some i/o workloads.

The important thing to understand when planning to use (or not use) memory optimized tables, is that the performance benefit is not purely a result of storing the data in memory. After all, SQL Server does a pretty good job of caching commonly accessed data in disk-based tables anyway. The crucial difference to the way data in a memory optimized table is accessed is that no locks or latches are used to support concurrency. In a disk-based table, if multiple transactions need to access the data concurrently, locks are used to ensure consistency and avoid one transaction’s results being affected by the data modifications of another transaction. Although SQL Server does support row-level locking, transactions that affect multiple rows can quickly escalate locking to page-level – causing concurrency issues that affect query performance. This can be especially acute in tables with so-called “hotspots” – for example a table with a clustered index on an incrementing key value, where all new rows are inserted at the end of the table. Memory optimized tables do not use locks to manage concurrency. Instead, a form of row-versioning is used to track modifications to rows by multiple transactions; which in any case usually happen so quickly (sub-millisecond) that concurrency clashes are extremely rare. If the i/o pattern for your table typically incurs a lot of locking, then making the table memory optimized will probably improve performance. If not, then you may not benefit significantly from changing the table. As an example, in the video demo, the 500,000 inserts were wrapped in a single transaction – which when executed against the disk-based table incurred locking to support isolation for 500,000 atomic INSERT statements. When creating the demo, I noticed that removing the BEGIN TRAN and COMMIT statements that enclose the loop (so that the inserts were done as 500,000 independent INSERT statements) resulted in a much less significant difference in the time taken to load the disk-based table and the time taken to load the memory optimized table (typically, the memory optimized table was around 5-6 seconds quicker).

This article is based on the CTP 1 release of SQL Server 2014, and details are liable to change between now and the eventual release of the product. You can download the preview of SQL Server 2014 from http://www.microsoft.com/en-us/sqlserver/sql-server-2014.aspx.

The Transact-SQL code used in this demonstration is available from here.

No comments:

Post a Comment