Type to search

Share

How Snowflake’s Built-in Caching Mechanisms Improve Query Performance 

Introduction:

In cloud data warehousing environments, the cost of operations is a critical factor. Data storage, data transfer, and computational resources all contribute to expenses. Efficient resource usage is essential to keep costs in check.  

Query performance refers to the time it takes for a query to execute and return results. In traditional data warehousing systems, complex queries, especially those involving large datasets, can take a significant amount of time to complete. 

Running complex and resource-intensive queries can lead to extended query execution times, increased resource consumption, and higher costs. This can be particularly challenging when dealing with large volumes of data. 

Caching is like having a fast-access memory for your data. It allows you to store and retrieve frequently accessed data quickly, which can significantly speed up query performance. Think of it as keeping a smaller copy of the most important pages from a big book right next to you for quick reference. 

Snowflake caching is a powerful feature that helps enhance query performance by reducing the time it takes to access frequently used data. Snowflake cache mechanism automatically accelerates query performance without requiring manual intervention. By reusing cached query results, Snowflake reduces the need for redundant query execution. This optimization not only speeds up results but also minimizes the consumption of compute resources, resulting in potential cost savings.  

Default Caches in Snowflake:

In Snowflake, there are three primary types of caches that enhance query performance:  

  1. Query Result Cache
  2. Metadata Cache
  3. Virtual Warehouse Cache (or Local Disk Cache)

These caches come pre-enabled in your Snowflake environment, ready to optimize your data processing tasks. 

Cache Locations in Snowflake Architecture:

Cache location

The Query Result Cache and Metadata Cache are integral components of the Cloud Services Layer within Snowflake’s architecture. Due to their placement in this layer, they are accessible to all virtual warehouses in your Snowflake environment. This means that these caches can serve their purpose without the need for an active virtual warehouse, offering consistent benefits across the platform.  

On the other hand, the Virtual Warehouse Cache is specifically housed within the Query Processing Layer. This location makes it local to each individual virtual warehouse, ensuring that the cache’s benefits are tailored to the specific needs and workloads of that warehouse. 

1. Query Result Cache: 

In Snowflake, the “Result Cache” is one of the types of caching mechanisms used to improve query performance. It stores the results of previously executed queries, making them readily available for future queries.  

The following conditions should be met for the query results to be reused: 

  • The data within the tables used in the query should remain unchanged. If there have been any modifications to the data in any of the tables involved in the query, the Query Result Cache cannot be utilized. 
  • The new query must be a syntactic match with the previously executed query. Any variation in the syntax, such as changes in the order of columns, the introduction of table aliases, or the use of lowercase keywords, will prevent full cache reuse. 

For example, let’s use the following base query.

Query Result

Cache Locations

Now, let’s consider different scenarios: 

Identical Query:

Re-running the exact same query as the base query will benefit from 100% cache reuse. For example:

Identical Query

Identical query

Column Order Difference: 

If you change the order of columns in the SELECT statement, even if it retrieves the same data, it will not trigger cache reuse. For example:

Column Order

column order

Table Alias Introduction: 

Introducing a table alias to the query will prevent cache reuse, as the query structure is different. For example: 

table alias

table alias

Lowercase Keywords: 

Using lowercase keywords in the query, even if the table and column names are the same, will inhibit cache reuse due to syntax differences. For example:

Lowercase Keywords

lowercase keyword

Points to Keep in Mind: 

  • The query should not include functions that are evaluated at execution time, such as CURRENT_TIMESTAMP(), user-defined functions (UDFs), or external functions. 
  • The micro-partitions of the table should remain stable, meaning they have not been reclustered or modified. Any changes to the micro-partitions will affect the ability to reuse the cache. 
  • To benefit from cache reuse, the persisted result for the previously executed query must still be available in the Query Result Cache. 
  • The Query Result Cache is retained for 24 hours. Each time the cache is used by a subsequent query within that 24-hour period, the cache’s expiry is extended by another 24 hours and the maximum duration the cache can be retained is 31 days from the initial query execution. 
  • The Query Result Cache is enabled by default and can be disabled as shown below 

Alter Session

2. Metadata Cache

The Metadata Cache stores metadata information about tables, views, and databases. When you query metadata about your Snowflake objects, the Metadata Cache can speed up response times by reducing the need to access the underlying metadata tables. This table information stored in metadata is referred as Metadata Cache in Snowflake. 

Snowflake’s metadata stores valuable information about each table, including: 

  • The total number of rows in a table. 
  • The minimum (MIN) and maximum (MAX) values for columns. 
  • The count of distinct values in each column. 

Demonstration: 

Run the below SQL query which queries the total row count from a table. 

Demonstration

The query profile will show a single node with the message METADATA-BASED RESULT indicating that the query result was returned from metadata and time taken would be only few milliseconds.

Metadata and Time

3. Virtual Warehouse Cache / Local Disk Cache

Whenever a virtual warehouse retrieves data from a table, it creates a local cache containing that data. Subsequent queries have the option to reuse this cached data rather than fetching it from the cloud storage where the table resides. This local cache, stored on the virtual warehouse’s local disk, is known as the Virtual Warehouse Cache or Local Disk Cache in Snowflake. 

Accessing data from a local cache is notably more efficient than retrieving it from remote cloud storage. As a result, the warehouse cache significantly enhances the performance of queries that can leverage this cache, resulting in faster query execution and improved efficiency. 

Expiry Of Virtual Warehouse: 

The Virtual Warehouse Cache is automatically cleared when a virtual warehouse is suspended. 

Upon resuming the virtual warehouse, the cache gradually rebuilds as queries are executed. This dynamic behavior prompts consideration: whether suspending a virtual warehouse for cost savings outweighs maintaining it running for the cache’s performance enhancement. 

It’s important to note that the cache’s size is directly related to the virtual warehouse’s size. A larger virtual warehouse results in a larger cache. 

Demonstration: 

you initially run the query.  

Demonstration

After the query is complete, review the query plan of the query. The query profile indicates that the Percentage of data scanned from the cache is zero indicating that the virtual warehouse cache had no data in it initially. 

Run the query

Subsequently, when you run the below query.

Virtual Warehouse Cache

Note that the condition in the query is changed so that the query is not answered using Query Result Cache. But it is kept close enough that it can make use of the Virtual Warehouse Cache.

 Virtual Warehouse Cache

After the query is complete, review the query plan of the query. The query profile indicates that almost 78% of data is scanned from cache indicating that the virtual warehouse cache is reused. 

Let us suspend the virtual warehouse and run a slightly different query. Note that suspending the warehouse results in the existing virtual warehouse cache being purged. 

Suspend

After the query is complete, review the query plan of the query. The query profile indicates that the Percentage of data scanned from the cache is zero indicating that no virtual warehouse cache is used after suspending the warehouse. 

After Suspending

In summary, the performance of your queries depends on whether the data is cached in memory or stored in permanent storage, with cached data being retrieved much faster than data that needs to be fetched from permanent storage.