Skip to content

March 20, 2009

MySQL – Steps to break down large time series data & queries (Part 1/2)

by Joe Kuan

 databaseIn my previous article, I have experimented and optimized query using range data. In this article, I will discuss two approaches to improve the scalability and the performance of continuous large time series data. The scalability is achieved by first separating the time series data into multiple tables in fixed time span and the performance (also scalability) is improved by breaking a long range query into multiple short fix time range queries to take advantage of the query cache. In fact, this is quite a common approach on optimizing time series data.

Suppose your application uses a very large database table with time series data and the main method for querying is via specifying the time range. Using indices may not necessary speed up the query. The reason is that the table is continuously being updated with new entries which makes the associated query cache dirty. Effectively, each time a query is launched with the table, it always becomes a full search. There are things that we can do to improve the table and change the query to take advantage of query cache.

Basically, two steps are needed to improve the scalability of time series data:

  1. automatically create table with time suffix and then insert/update/delete the entries according to the appropriate table based on the date and time of the data.
  2. break the long range query into a sequence of smaller fixed time queries except the boundary case

Step 2 is really for the daily tables that are still too big for a single query. In some cases, our application can receive over 1GB of daily table. Hence breaking down into smaller steps is necessary to help the scalability and also the responsiveness of the application. This has been investigated in my previous article.

The following two graphs summarise the approach 

Long range query over large quantity of time series data

Long range query over large quantity of time series data

 

  
Short range queries

Divide the time series data into daily tables and break the long query into a sequence of fixed short time range queries

In the next part, a more detailed description for each step is provided.

 

I work for iTrinegy and here are my other technical blogs

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Note: HTML is allowed. Your email address will never be published.

Subscribe to comments

%d bloggers like this: