MySQL – Steps to break down large time series data & queries (Part 1/2)
In my previous article, I have experimented and optimized query using range data. In this article, I will discuss two approaches to improve the scalability and the performance of continuous large time series data. The scalability is achieved by first separating the time series data into multiple tables in fixed time span and the performance (also scalability) is improved by breaking a long range query into multiple short fix time range queries to take advantage of the query cache. In fact, this is quite a common approach on optimizing time series data.
Suppose your application uses a very large database table with time series data and the main method for querying is via specifying the time range. Using indices may not necessary speed up the query. The reason is that the table is continuously being updated with new entries which makes the associated query cache dirty. Effectively, each time a query is launched with the table, it always becomes a full search. There are things that we can do to improve the table and change the query to take advantage of query cache.
Basically, two steps are needed to improve the scalability of time series data:
- automatically create table with time suffix and then insert/update/delete the entries according to the appropriate table based on the date and time of the data.
- break the long range query into a sequence of smaller fixed time queries except the boundary case
Step 2 is really for the daily tables that are still too big for a single query. In some cases, our application can receive over 1GB of daily table. Hence breaking down into smaller steps is necessary to help the scalability and also the responsiveness of the application. This has been investigated in my previous article.
The following two graphs summarise the approach
In the next part, a more detailed description for each step is provided.