Mental Model
Query optimization is about helping the database do as little work as possible. Every block of memory loaded, every disk seek, and every temporary table created is a bottleneck.
A perfectly designed schema will still crumble if the queries executed against it are written poorly. A Staff Engineer's job is to rewrite queries so the MySQL Optimizer can build the most efficient execution plan.
1. Avoid Correlated Subqueries
A correlated subquery is a subquery that depends on the outer query for its values. It executes once for every single row returned by the outer query.
BAD (O(N^2) Performance):
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary) FROM employees WHERE department_id = e.department_id
);
GOOD (O(N) Performance via CTE or JOIN): We can use a Common Table Expression (CTE) to pre-calculate the averages once.
WITH DeptAverages AS (
SELECT department_id, AVG(salary) as avg_salary
FROM employees
GROUP BY department_id
)
SELECT e.name, e.salary
FROM employees e
JOIN DeptAverages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
2. The Deep Pagination Trap (Offset vs Cursor)
When users click through pages of search results, you typically use OFFSET and LIMIT.
BAD:
SELECT * FROM users ORDER BY created_at DESC LIMIT 1000000, 10;
MySQL has to fetch 1,000,010 rows from the disk, sort them, discard the first 1,000,000, and return 10. This is horribly slow.
GOOD (Cursor-Based Pagination): Instead of an offset, use the last seen ID or timestamp as a cursor.
SELECT * FROM users
WHERE created_at < '2026-05-01 12:00:00'
ORDER BY created_at DESC
LIMIT 10;
This query jumps directly to the index node for that timestamp and reads the next 10 rows. It is $O(1)$ fast regardless of how many pages the user clicks.
3. Breaking SARGability
SARGable stands for Search ARGument ABLE. It means a query is written in a way that allows the database engine to use an index.
BAD (Forces a Full Table Scan):
SELECT * FROM orders WHERE YEAR(created_at) = 2026;
Because you wrapped the indexed created_at column in a function YEAR(), MySQL cannot use the B-Tree index. It must scan the entire table, apply the function to every row, and then filter.
GOOD (SARGable, Uses the Index):
SELECT * FROM orders
WHERE created_at >= '2026-01-01 00:00:00'
AND created_at < '2027-01-01 00:00:00';
4. The N+1 Problem in ORMs
If you are using an ORM like Hibernate (Java), Prisma (Node), or ActiveRecord, you must be hyper-aware of lazy loading.
Fetching 100 users, and then lazily fetching each user's profile picture triggers 101 separate SQL queries. Always configure your ORM to use Eager Loading (Batching or JOIN FETCH) to reduce this to a single query.
Practice Question
Scenario: You have a massive transactions table. The query SELECT * FROM transactions WHERE status != 'FAILED' is running very slowly, even though there is an index on status.
Question: Why is the index not helping?
View Answer
Indexes are designed to find specific values quickly. An inequality operator (`!=` or `<>`) forces the database engine to find everything *except* a specific value. If 95% of transactions are not 'FAILED', the optimizer realizes that scanning the index and then looking up 95% of the table rows is slower than just doing a Full Table Scan directly.Technical Trade-offs: Architectural Decision
| Strategy | Scalability | Complexity | Operational Cost | Performance |
|---|---|---|---|---|
| Monolithic | Low | Low | Low | Fast (Local) |
| Microservices | Very High | High | High | Slower (Network) |
| Serverless | Infinite | Medium | Variable | Variable (Cold Starts) |
Production Readiness Checklist
Before deploying this architecture to a production environment, ensure the following Staff-level criteria are met:
- High Availability: Have we eliminated single points of failure across all layers?
- Observability: Are we exporting structured JSON logs, custom Prometheus metrics, and OpenTelemetry traces?
- Circuit Breaking: Do all synchronous service-to-service calls have timeouts and fallbacks (e.g., via Resilience4j)?
- Idempotency: Can our APIs handle retries safely without causing duplicate side effects?
- Backpressure: Does the system gracefully degrade or return HTTP 429 when resources are saturated?
Verbal Interview Script
Interviewer: "How do you identify a slow query in a production MySQL database, and what steps do you take to fix it?"
Candidate: "First, I configure the MySQL Slow Query Log to capture any queries taking longer than a specific threshold, say 100ms. I use tools like pt-query-digest to aggregate and identify the most frequent offenders. Once identified, I prefix the query with EXPLAIN ANALYZE to inspect the execution plan. I look for red flags: type: ALL (Full Table Scans), Using filesort (sorting without an index), or Using temporary (building internal temp tables). My first step to fix it is ensuring SARGability and adding Covering Indexes. If that fails, I look at refactoring subqueries into CTEs or introducing application-level caching."