13 Dec

MySQL Query Optimization For Faster and More Efficient Queries

in MySQL

MySQL Query Optimization For Faster and More Efficient Queries. MySQL is a highly popular, open source database management system. It stores data for millions of applications and enterprise systems, and is a popular choice for companies. It’s open source, easy to use, and highly scalable.

Any organization using MySQL for its data storage needs fast and efficient queries. While it might be easy to achieve that with a simple application and few columns of data, the query response can become slower as the database grows. Therefore, it’s crucial to implement some query best practices for optimal database performance.

Improving query response times is one of MySQL performance optimization strategies. In this article, we explore 15 of the best MySQL query optimization strategies to help faster data retrieval. Read on!

Also Read

Build MySQL HA/Replication Fault-Tolerant Architecture

15 MySQL Query Optimizations For Faster and More Efficient Queries

1. Identify Slow Queries

Slow queries are due to hardware configurations, permissions settings, improper index usage, and schema design. Measure a query’s efficiency by its response time, or the duration it takes for the query to execute.

Identify slow queries using tools like INFORMATION_SCHEMA.PROFILING and slow_query_log. The INFORMATION_SCHEMA.PROFILING stores information about queries in the current session. On the other hand, slow_query_log records queries exceeding a set time limit. This helps pinpoint performance bottlenecks.

MySQL GUI tools like MySQL Workbench also provide detailed insights into the time taken, rows scanned, and rows returned.

Also Read

Redis vs MySQL – What’s the Difference? (Pros and Cons)

2. Use EXPLAIN and EXPLAIN ANALYZE Statements

The EXPLAIN statement is a powerful tool for understanding MySQL’s execution plan for a query. It helps you understand how tables are joined and the indexes used, providing a blueprint of the query’s execution path.

While EXPLAIN provides a hypothetical execution plan, EXPLAIN ANALYZE offers a more detailed analysis, including actual execution statistics. This deeper level of detail helps in understanding all the details of query execution, including time spent in various phases of the query life cycle. With detailed insights from these two commands, you can easily decide on how to tweak queries. This might involve restructuring joins, modifying query conditions, or refactoring certain SQL functions to enhance overall performance.

An example of how to use EXPLAIN ANALYZE (assume you have a table named uk_cities with relevant columns):

				
					-- query to select cities with a population over 500,000
SELECT name, population
FROM uk_cities
WHERE population &gt; 500000;

-- Using EXPLAIN ANALYZE for a detailed query execution
EXPLAIN ANALYZE
SELECT name, population
FROM uk_cities
WHERE population &gt; 500000;

3. Optimize Database Schema

When creating a database in MySQL choose a good schema design that improves query performance and data retrieval efficiency.

How to optimize database schema:

Limit Columns and Normalize Data

Normalization strategies, such as the third normal form (3NF), reduces data duplication and ensures data integrity.

Choose Appropriate Data Types and NULL Values

When creating tables in MySQL, choose the right data types and avoiding NULL values where possible to significantly impact performance. Smaller data types consume less space and improve processing speed. Avoiding NULL values simplifies index usage and reduces the overhead associated with NULL value processing.

MySQL cannot optimize queries for NULL columns effectively. So, replace NULL with an empty string, 0, or any special value.

Choose Smaller Data Types

By selecting data types that closely match the data’s nature and size, such as using VARCHAR(12) for short strings or INT for integers. This way, the database requires less memory and disk space. it then impacts CPU cache utilization and processing speed.

For instance, numeric operations on INT types are faster and less resource intensive than equivalent operations on numeric strings. Similarly, using dedicated data types for date information enhances query performance as it optimizes storage and built in date functions.

Also Read

How to Setup MySQL Server + phpMyadmin on Linux in Azure/AWS/GCP

4. Use Indexes Effectively

Proper indexing is crucial for improving data retrieval times. Besides creating indexes, it’s also crucial to maintain them. Over time, indexes can become fragmented or less effective due to data changes. Regular maintenance ensures they continue to function optimally. This includes analyzing index usage patterns and removing redundant or unused indexes.

To optimize queries, you need to add indexes to columns frequently used in GROUP BY, JOIN, ORDER BY, and WHERE clauses. This way, the MySQL server easily fetches results from a database significantly faster.

A simple demonstration of using indexes in MySQL:

				
					-- Add an index to the 'population' column
CREATE INDEX idx_population ON uk_cities(population);
SELECT name, county FROM uk_cities WHERE population &gt; 100000;
EXPLAIN SELECT name, county FROM uk_cities WHERE population &gt; 80000;

This index should reduce the time MySQL takes to execute the query, as it prevents a full table scan when filtering by the population column. Instead, MySQL quickly locates the relevant rows using the index, which is ideal with large datasets.

5. Optimize SELECT Statements and Use Wildcards

Using SELECT * statements is resource-intensive, especially for tables with many columns. It’s not a good idea to use SELECT * if possible. If you have to use it, specify only the necessary columns in the SELECT statement to reduce the amount of data processed.

Wildcards allow you to perform matching searches in a database. Use a wildcard with a LIKE query when defining search criteria for more flexibility. Also, use wildcards on FULLTEXT indexes on columns. Using wildcards on these queries can improve performance unlike SELECT * statements.

Example of an inefficient select statement:

				
					SELECT * FROM employees;

This query fetches all columns from the employees table, which is quite unnecessary. Optimize it further by specifying columns:

				
					SELECT first_name, last_name, department FROM employees;

To further optimize use wildcards for more flexible matching patterns:

				
					SELECT first_name, last_name FROM employees WHERE first_name LIKE 'Jo%';

Also Read

MySQL Sharding: Scaling Techniques for Handling Large Datasets

6. Avoid SELECT DISTINCT and Implement LIMIT Instead

The SELECT DISTINCT statement helps remove duplicate rows from your query result. However, it is not ideal when working with large datasets or multiple joins. It requires additional system resources as it compares each row against the other to remove duplicates. In some cases, using GROUP BY achieves the same result with better performance.

In scenarios where you only need a subset of data, you can use the LIMIT statement. LIMIT prevents the unnecessary processing of unrequired rows hence significantly reducing the workload on the database.

Example of an inefficient use of SELECT DISTINCT:

				
					SELECT DISTINCT first_name, last_name FROM employees;

This statement is resource intensive, if the table is large as it removes duplicate first_name and last_name pairs from the employees table. Optimize it using GROUP BY as shown below:

				
					SELECT first_name, last_name FROM employees GROUP BY first_name, last_name;

Further use LIMIT to restrict the number of rows returned:

				
					SELECT first_name, last_name FROM employees LIMIT 10;

Also Read

MySQL Performance Tuning: For Optimal Database Performance

7. Cache Query to Boost Performance

MySQL’s Query Caching stores the SELECT statement alongside the retrieved record in memory. If you query for a similar result later, the server retrieves the results from the cache rather than finding from the disk. The identical queries are served much faster as the commands aren’t executed from the database. The query cache is also shared among sessions.

Query caching is most effective in databases with frequent read operations and infrequent updates. However, in environments with frequent updates, caching may not be as beneficial due to the invalidation of cached data on table updates.

8. Convert OUTER JOINs to INNER JOINs

INNER JOINs are generally more efficient than OUTER JOINs as they process less data. They return rows with matching columns in both tables, while OUTER JOINs include rows without matches in one of the tables. Choose the right type of JOIN based on query requirements. If data outside specified columns is not necessary, using an INNER JOIN saves processing time and resources.

Outer join example:

				
					SELECT employees.name, departments.department_name
FROM employees
LEFT OUTER JOIN departments ON employees.department_id = departments.department_id;

As you see, this query might be inefficient as it retrieves all employees, including those without an associated department. To avoid that create an inner join instead:

				
					SELECT employees.name, departments.department_name
FROM employees
INNER JOIN departments ON employees.department_id = departments.department_id;

Also Read

MySQL Float vs Decimal – What’s the Difference ? (Pros and Cons)

9. Reduce the Use of OR Clauses

The use of OR clauses in MySQL queries significantly affects performance, especially in large and complex databases. When you use OR, the database engine scans a larger portion of the table or multiple indexes to satisfy each condition within the clause. This full table scan approach is less efficient than using indexes to find relevant rows. It leads to increased execution time and resource usage.

To optimize queries containing the OR operator, consider restructuring them using UNION to break down complex conditions into smaller, more manageable parts. Each part of the query then leverages indexes more efficiently. Additionally, index merging may occur when applicable. Finally, conditional refactoring helps to simplify the condition, making it easier for the database to use indexes.

Example of how not to use LIKE with OR operators. Let’s assume you have a products table and want products whose names contain ‘apple’, ‘berry’, or ‘cherry’:

				
					SELECT product_name
FROM products
WHERE product_name LIKE '%apple%' 
   OR product_name LIKE '%berry%' 
   OR product_name LIKE '%cherry%';

This query is largely inefficient as it requires a full table scan for each OR condition. To optimize it, use UNION statement instead:

				
					SELECT product_name
FROM products
WHERE product_name LIKE '%apple%'
UNION
SELECT product_name
FROM products
WHERE product_name LIKE '%berry%'
UNION
SELECT product_name
FROM products
WHERE product_name LIKE '%cherry%';

10. Analyze and Optimize Tables

The OPTIMIZE TABLE statement in MySQL helps maintain the efficiency of database tables for optimal query performance. This command reorganizes the physical storage of table data and associated index data. It helps reduce storage space and improve input/output (I/O) efficiency when accessing the table.

This optimization is most effective after extensive insert, update, or delete operations. It is especially effective with InnoDB tables with their own .ibd files, and where there are FULLTEXT indexes. Utilizing OPTIMIZE TABLE requires SELECT and INSERT privileges. Its efficiency varies across different storage engines, and is most effective for InnoDB, MyISAM, and ARCHIVE tables.

Also Read

How to Create a Table in MySQL on Ubuntu 20.04 (Display Data)

11. Use Covering Indexes

With covering indexes, the database engine can obtain all required information from the index itself. This reduces the need for accessing the table data. By avoiding extra table lookups, covering indexes minimize disk I/O operations.

Let’s assume we have a table us_cities with columns such as city_id, city_name, state, and population. A regular index looks like this:

				
					CREATE INDEX idx_state ON us_cities(state);

A covering index would look like this:

				
					CREATE INDEX idx_state_cityname_population ON us_cities(state, city_name, population);

With the covering index idx_state_cityname_population, MySQL gets all the necessary data (city_name and population) directly from the index. Since MySQL retrieves the needed data from the index without additional table lookups, it reduces disk I/O operations.

Also Read

MySQL Create Database – and User Command (How To)

12. Avoid Using Functions on Indexed Columns in WHERE Statements

Next with MySQL Query Optimization For Faster and More Efficient Queries topic is Indexes. By default they should speed up the retrieval of rows. As they allow the database to quickly find the starting point of the query conditions in the indexed column. However, when a function transforms the data in the indexed column, MySQL can no longer match the index with the raw data in the table. This leads to a full table scan instead of an efficient index based search.

Avoid putting functions on indexed columns in the WHERE clause to maximize index usage. Instead, modify the query to match the original data format to allow effective indexing. For instance, an index on a column date, a query like SELECT * FROM table WHERE YEAR(date) = 2023. This would not utilize the index efficiently, as the YEAR() function alters the original date data. Instead, restructuring the query to match the indexed column’s data format directly, such as SELECT * FROM table WHERE date BETWEEN ‘2023-01-01’ AND ‘2023-12-31‘, can leverage the index effectively.

Let’s assume we have a table named events with a column event_date, and an index on the event_date column. Create a query that uses a function on the indexed column:

				
					SELECT *
FROM events
WHERE YEAR(event_date) = 2023;

Optimize the query to not use a function on the indexed column. Example:

				
					SELECT *
FROM events
WHERE event_date BETWEEN '2023-01-01' AND '2023-12-31';

This query directly uses the indexed event_date column without applying any function. It allows MySQL to efficiently utilize the index, as the query condition matches the data format of the indexed column.

13. Optimize MySQL Configuration

Tune critical parameters such as buffer sizes, table cache, and sort buffer size. When optimized correctly, these parameters improve the efficiency of data retrieval. They determine how MySQL utilizes server resources and have a direct impact on how quickly and effectively it can process queries.

Here are some of the top parameter to optimize:

innodb_buffer_pool_size: Allocate 60-80% of RAM to this buffer for caching InnoDB data and indexes, reducing disk I/O.

table_open_cache: Increase this to keep more tables open simultaneously, reducing overhead in environments with frequent table accesses or joins.

sort_buffer_size: Adjust this to optimize memory for sorting rows and index creation, but avoid excessive sizing to prevent high memory usage.

join_buffer_size: Increase this for memory-intensive join operations without indexes, balancing size to avoid high memory use.

query_cache_size: Optimize this for faster query retrieval in read-heavy systems, noting reduced effectiveness with frequent table updates.

Here is an example of how to set these parameter in your configuration file (https://net.cloudinfrastructureservices.co.uk/etc/mysql/my.cnf):

				
					[mysqld]
innodb_buffer_pool_size = 8G
table_open_cache = 2000
sort_buffer_size = 2M
join_buffer_size = 4M
query_cache_size = 32M

Then, restart the MySQL server with the systemctl restart mysql command to apply the changes. Ensure these adjustments should be based on specific workload patterns, server specifications, and performance metrics.

Also Read

How to Setup MySQL Server on Windows Server 2016/2019 in Azure

14. Use Batch Processing for Large Updates

When performing large scale updates in MySQL, it’s more efficient to use batch processing rather than executing a single, large query. Large updates in a single query are resource intensive and can lock tables for long, leading to slow performance. They also increase the risk of timeouts or failures. However, batch processing can help mitigate these risks by breaking down the large update task into smaller bits.

Therefore, divide the task into smaller batches instead of executing a single statement that affects millions of rows. Also use loops in your code or script to implement batch updates. This reduces the load on the database and minimizes log and rollback issues.

15. Optimize MySQL Server Hardware

Proper hardware optimizations lead to significant improvements in response times. This involves ensuring that the server has adequate memory, a powerful CPU, and fast storage solutions like Solid State Drives (SSDs).

Memory (RAM): Having sufficient RAM allows more data to be cached. This reduces the need for disk I/O operations and improves query execution time.

CPU: A fast CPU with numerous cores handle more simultaneous queries and complex calculations. In multi-core systems, MySQL parallelize some of its processes, further enhancing performance.

Storage: Replacing hard drives with solid state drives (SSDs) improves database performance. SSDs provide faster read/write speeds and reduce latency in transaction processing.

RAID configuration: Implementing a RAID (Redundant Array of Independent Disks) setup offers data redundancy and performance enhancement by distributing I/O operations across multiple disks.

Network bandwidth and latency: High bandwidth and low latency network connections increase data transfer speeds between the application and database server.

With these hardware optimization strategies, you create a robust and high performing environment for your MySQL server.

Thank you for reading MySQL Query Optimization For Faster and More Efficient Queries. Let’s summarize.

Also Read

Backup and Restore a MySQL Database (Command Line)