We have now reclaimed lot of unused space at the filsystem level for these tables. However, in production, a zero-downtime deployment method should be used to avoid the outage. When doing this type of optimization in a replication ring, I will typically turn off binary logging of these statements before doing the alter or optimize table. On which version of mysql your command is use, my version is 5. Agreed that it will be a big deal if doing this on production machine and ppl trying to reduce downtime.
It gives a rough ideal of duration required and can be added into planned production maintenance downtime such as patching etc. Defragmentation will not only recover space, it will also help the queries run faster. Using partitions is a better way if you want to avoid frequent downtime for optimization. Save my name, email, and website in this browser for the next time I comment. Notify me of followup comments via e-mail. All rights reserved Terms of Service.
If your application is performing lot of deletes and updates on MySQL database, then there is a high possibility that your MySQL data files are fragmented. An example is using summary tables discussed in the previous chapter. Rewrite a complicated query so the MySQL optimizer is able to execute it optimally. We discuss this later in this chapter. You can sometimes transform queries into equivalent forms and get better performance. However, you should also think about rewriting the query to retrieve different results, if that provides an efficiency benefit.
You may be able to ultimately do the same work by changing the application code as well as the query. In this section, we explain techniques that can help you restructure a wide range of queries and show you when to use each technique.
The traditional approach to database design emphasizes doing as much work as possible with as few queries as possible. This approach was historically better because of the cost of network communication and the overhead of the query parsing and optimization stages. Modern networks are also significantly faster than they used to be, reducing network latency. Connection response is still slow compared to the number of rows MySQL can traverse per second internally, though, which is counted in millions per second for in-memory data.
We show some examples of this technique a little later in the chapter. That said, using too many queries is a common mistake in application design. For example, some applications perform 10 single-row queries to retrieve data from a table when they could use a single row query. Purging old data is a great example. Chopping up the DELETE statement and using medium-size queries can improve performance considerably, and reduce replication lag when a query is replicated.
For example, instead of running this monolithic query:. Deleting 10, rows at a time is typically a large enough task to make each query efficient, and a short enough task to minimize the impact on the server [ 38 ] transactional storage engines may benefit from smaller transactions. It may also be a good idea to add some sleep time between the DELETE statements to spread the load over time and reduce the amount of time locks are held. Many high-performance web sites use join decomposition. You can decompose a join by running multiple single-table queries instead of a multitable join, and then performing the join in the application.
For example, instead of this single query:. However, such restructuring can actually give significant performance advantages:. Caching can be more efficient.
In this example, if the object with the tag mysql is already cached, the application can skip the first query. If you find posts with an id of , , or in the cache, you can remove them from the IN list.
The query cache might also benefit from this strategy. If only one of the tables changes frequently, decomposing a join can reduce the number of cache invalidations. For MyISAM tables, performing one query per table uses table locks more efficiently: the queries will lock the tables individually and relatively briefly, instead of locking them all for a longer time.
Doing joins in the application makes it easier to scale the database by placing tables on different servers. The queries themselves can be more efficient. We explain this in more detail later. You can reduce redundant row accesses. Doing a join in the application means you retrieve each row only once, whereas a join in the query is essentially a denormalization that might repeatedly access the same data. For the same reason, such restructuring might also reduce the total network traffic and memory usage.
To some extent, you can view this technique as manually implementing a hash join instead of the nested loops algorithm MySQL uses to execute a join. A hash join may be more efficient. You cache and reuse a lot of data from earlier queries. If you need to get high performance from your MySQL server, one of the best ways to invest your time is in learning how MySQL optimizes and executes queries. Once you understand this, much of query optimization is simply a matter of reasoning from principles, and query optimization becomes a very logical process.
Figure shows how MySQL generally executes queries. The server checks the query cache. The query execution engine executes the plan by making calls to the storage engine API.
Each of these steps has some extra complexity, which we discuss in the following sections. We also explain which states the query will be in during each step. The query optimization process is particularly complex and important to understand. The protocol is half-duplex, which means that at any given time the MySQL server can be either sending or receiving messages, but not both. It also means there is no way to cut a message short.
This protocol makes MySQL communication simple and fast, but it limits it in some ways too. The client sends a query to the server as a single packet of data. In contrast, the response from the server usually consists of many packets of data. When the server responds, the client has to receive the entire result set. It cannot simply fetch a few rows and then ask the server not to bother sending the rest.
But the truth is, the MySQL server is pushing the rows as it generates them. The client is only receiving the pushed rows; there is no way for it to tell the server to stop sending rows. Most libraries that connect to MySQL let you either fetch the whole result set and buffer it in memory, or fetch each row as you need it.
The default behavior is generally to fetch the whole result and buffer it in memory. This is important because until all the rows have been fetched, the MySQL server will not release the locks and other resources required by the query.
When the client library fetches the results all at once, it reduces the amount of work the server needs to do: the server can finish and clean up the query as quickly as possible.
You can use less memory, and start working on the result sooner, if you instruct the library not to buffer the result. The downside is that the locks and other resources on the server will remain open while your application is interacting with the library. The code seems to indicate that you fetch rows only when you need them, in the while loop. The while loop simply iterates through the buffer. Programming languages have different ways to override buffering.
You can also specify this when connecting, which will make every statement unbuffered:. Each MySQL connection, or thread , has a state that shows what it is doing at any given time. As a query progresses through its lifecycle, its state changes many times, and there are dozens of states. The MySQL manual is the authoritative source of information for all the states, but we list a few here and explain what they mean:.
The thread is waiting for a new query from the client. The thread is either executing the query or sending the result back to the client. The thread is waiting for a table lock to be granted at the server level. The thread is checking storage engine statistics and optimizing the query. This can mean several things: the thread might be sending data between stages of the query, generating the result set, or returning the result set to the client.
On very busy servers, you might see an unusual or normally brief state, such as statistics , begin to take a significant amount of time. This usually indicates that something is wrong. Before even parsing a query, MySQL checks for it in the query cache, if the cache is enabled.
This operation is a case sensitive hash lookup. If MySQL does find a match in the query cache, it must check privileges before returning the cached query. This is possible without parsing the query, because MySQL stores table information with the cached query. If the privileges are OK, MySQL retrieves the stored result from the query cache and sends it to the client, bypassing every other stage in query execution. The query is never parsed, optimized, or executed. You can learn more about the query cache in Chapter 5.
The next step in the query lifecycle turns a SQL query into an execution plan for the query execution engine. It has several sub-steps: parsing, preprocessing, and optimization. Errors for example, syntax errors can be raised at any point in the process. Our goal is simply to help you understand how MySQL executes queries so that you can write better ones.
Next, the preprocessor checks privileges. This is normally very fast unless your server has large numbers of privileges. See Chapter 12 for more on privileges and security. The parse tree is now valid and ready for the optimizer to turn it into a query execution plan. A query can often be executed many different ways and produce the same result. MySQL uses a cost-based optimizer, which means it tries to predict the cost of various execution plans and choose the least expensive.
The unit of cost is a single random four-kilobyte data page read. This result means that the optimizer estimated it would need to do about 1, random data page reads to execute the query. It bases the estimate on statistics: the number of pages per table or index, the cardinality number of distinct values of indexes, the length of rows and keys, and key distribution.
The statistics could be wrong. The server relies on storage engines to provide statistics, and they can range from exactly correct to wildly inaccurate. There are two basic types of optimizations, which we call static and dynamic. Static optimizations can be performed simply by inspecting the parse tree. For example, the optimizer can transform the WHERE clause into an equivalent form by applying algebraic rules.
They can be performed once and will always be valid, even when the query is reexecuted with different values. In contrast, dynamic optimizations are based on context and can depend on many factors, such as which value is in a WHERE clause or how many rows are in an index. They must be reevaluated each time the query is executed.
The difference is important in executing prepared statements or stored procedures. MySQL can do static optimizations once, but it must reevaluate dynamic optimizations every time it executes a query. MySQL sometimes even reoptimizes the query as it executes it. Here are some types of optimizations MySQL knows how to do:. MySQL can recognize this and rewrite the join, which makes it eligible for reordering.
MySQL applies algebraic transformations to simplify and canonicalize expressions. It can also fold and reduce constants, eliminating impossible constraints and constant conditions.
These rules are very useful for writing conditional queries, which we discuss later in the chapter. Indexes and column nullability can often help MySQL optimize away these expressions. It can even do this in the query optimization stage, and treat the value as a constant for the rest of the query.
Similarly, to find the maximum value in a B-Tree index, the server reads the last row. This literally means the optimizer has removed the table from the query plan and replaced it with a constant.
When MySQL detects that an expression can be reduced to a constant, it will do so during optimization. Arithmetic expressions are another example. Perhaps surprisingly, even something you might consider to be a query can be reduced to a constant during the optimization phase. One example is a MIN on an index.
This can even be extended to a constant lookup on a primary key or unique index. It will then treat the value as a constant in the rest of the query. MySQL executes this query in two steps, which correspond to the two rows in the output. The first step is to find the desired row in the film table. It can do this because the optimizer knows that by the time the query reaches the second step, it will know all the values from the first step. MySQL can sometimes use an index to avoid reading row data, when the index contains all the columns the query needs.
We discussed covering indexes at length in Chapter 3. MySQL can convert some types of subqueries into more efficient alternative forms, reducing them to index lookups instead of separate queries. MySQL can stop processing a query or a step in a query as soon as it fulfills the query or step. For instance, if MySQL detects an impossible condition, it can abort the entire query. You can see this in the following example:. This query stopped during the optimization step, but MySQL can also terminate execution sooner in some cases.
For example, the following query finds all movies without any actors: [ 42 ]. This query works by eliminating any films that have actors. Each film might have many actors, but as soon as it finds one actor, it stops processing the current film and moves to the next one because it knows the WHERE clause prohibits outputting that film.
For instance, in the following query:. In many database servers, IN is just a synonym for multiple OR clauses, because the two are logically equivalent.
Not so in MySQL, which sorts the values in the IN list and uses a fast binary search to see whether a value is in the list. This is O log n in the size of the list, whereas an equivalent series of OR clauses is O n in the size of the list i. You may end up just defeating it, or making your queries more complicated and harder to maintain for zero benefit.
In general, you should let the optimizer do its work. Some of the options are to add a hint to the query, rewrite the query, redesign your schema, or add indexes. The engines may provide the optimizer with statistics such as the number of pages per table or index, the cardinality of tables and indexes, the length of rows and keys, and key distribution information.
The optimizer can use this information to help it decide on the best execution plan. In sum, it considers every query a join—not just every query that matches rows from two tables, but every query, period including subqueries, and even a SELECT against a single table.
Each of the individual queries is a join, in MySQL terminology—and so is the act of reading from the resulting temporary table. This means MySQL runs a loop to find a row from a table, then runs a nested loop to find a matching row in the next table.
It continues until it has found a matching row in each table in the join. It tries to build the next row by looking for more matching rows in the last table. It keeps backtracking until it finds another row in some table, at which point, it looks for a matching row in the next table, and so on. This query execution plan applies as easily to a single-table query as it does to a many-table query, which is why even a single-table query can be considered a join—the single-table join is the basic operation from which more complex joins are composed.
Read it from left to right and top to bottom. Figure Swim-lane diagram illustrating retrieving rows using a join. MySQL executes every kind of query in essentially the same way. In short, MySQL coerces every kind of query into this execution plan. Still other queries can be executed with nested loops, but perform very badly as a result. We look at some of those later. Instead, the query execution plan is actually a tree of instructions that the query execution engine follows to produce the query results.
The final plan contains enough information to reconstruct the original query. Any multitable query can conceptually be represented as a tree. For example, it might be possible to execute a four-table join as shown in Figure This is what computer scientists call a balanced tree. This is not how MySQL executes the query, though. As we described in the previous section, MySQL always begins with one table and finds matching rows in the next table.
The most important part of the MySQL query optimizer is the join optimizer , which decides the best order of execution for multitable queries. It is often possible to join the tables in several different orders and get the same results. The join optimizer estimates the cost for various plans and tries to choose the least expensive one that gives the same result. You can probably think of a few different query plans. This should be efficient, right? However, knowing when to use each optimization function and how to apply them to your situation is key to viable table maintenance.
This article provides practical tips and functions for MySQL table optimization. The main reason for an unoptimized MySQL table is frequently performed update and delete queries. In turn, this causes fragmentation and there are some consequences:. MySQL table optimization techniques address the arrangement of data inside a database. The result is clean storage without redundant, unused space, which helps speed up queries.
Note: Tuning is another technique for improving the query performance. Tables where information in a database continually updates, such as transactional databases , are the most likely candidates for optimization.
However, depending on the size of the database, the optimization query takes a long time to finish with larger tables. Therefore, locking a table for a long number of hours is not beneficial for transactional systems.
There are multiple ways to show tables and analyze them for optimization. Start by connecting to your MySQL database:. The output shows some general information about the table. The following two numbers are important:. The information schema stores metadata about a database schema. To check the allocated unused data for all tables in a selected schema, run:.
0コメント