How to delete lots of rows from a MySQL database without indefinite locking

Interesting post over at the Flare blog, where they had to delete over 900 million records from a MySQL table. They couldn’t simply run a DELETE FROM table query, as that would lock it:

When a deletion query using a WHERE clause starts, it will lock that table. All other queries that are run against the table will have to wait.

Because of the massive size of the table, the deletion query would probably run for hours. The lock would be in place for hours, essentially halting the entire app.

What I learnt from the post is that you can add a LIMIT clause to a DELETE query.

# Will only delete the first 1000 rows that match
DELETE FROM `events`
WHERE `added_on` < "2020-12-13 23:00:00"
LIMIT 1000;

That way you can chunk your DELETEs, so that your table isn’t locked for an indefinite amount of time. Here’s a little script that I built to run locally:

<?php

$cutoffDate = new \DateTime('-1 month');
$batchSize = 1000;

$db = …;

echo "♺ Deleting events before " . $cutoffDate->format(\DateTime::RFC3339) . PHP_EOL;
$total = 0;
do {
    $res = $db-> executeStatement('DELETE FROM `events` WHERE `added_on` < ? LIMIT ' . $batchSize . ';', [ $cutoffDate->format('U') ]);
    echo "  - [" . date(\DateTime::RFC3339) . "] Deleted " . $res . " rows" . PHP_EOL;
    $total += $res;
} while ($res == $batchSize);

echo "\x07✔ Deleted a total of $total rows" . PHP_EOL . PHP_EOL;

In their blog post they mention other ways to do so, including Laravel’s built-in jobs mechanism.

How to safely delete records in massive tables on AWS using Laravel →

Need to Connect to a Local MySQL Server? Use Unix Domain Socket!

The folks at Percona have benchmarked TCP/IP vs. Unix Connections to a local MySQL server.

When connecting to a local MySQL instance, you have two commonly used methods: use TCP/IP protocol to connect to local address – localhost or 127.0.0.1 – or use Unix Domain Socket.

If you have a choice (if your application supports both methods), use Unix Domain Socket as this is both more secure and more efficient.

Not that the exact numbers really matter, but the message should be clear: Use a Unix Domain Socket to connect, if you have the chance.

Need to Connect to a Local MySQL Server? Use Unix Domain Socket! →

Connect to Remote MySQL Server with SSL Certificates from PHP: Fixing the error "Terminated due to signal: ABORT TRAP (6)"


Photo by Clem Onojeghuo on Unsplash

To connect to a MySQL Server that requires SSL from PHP with PDO, you can use this piece of code:

try {
	$db = new PDO('mysql:host=DB_HOST;dbname=DB_NAME', $user, $pass, [
		PDO::MYSQL_ATTR_SSL_KEY => 'path/to/client_private_key',
		PDO::MYSQL_ATTR_SSL_CERT => 'path/to/client_cert',
		PDO::MYSQL_ATTR_SSL_CA => 'path/to/server_ca_cert',
	]);
} catch (PDOException $e) {
	print "Error!: " . $e->getMessage() . "<br/>";
	die();
}

To my surprise I was greeted with an abort signal (not an Exception!) when running this piece of code when trying to connect to a MySQL 5.7 Server on Google Cloud SQL:

Terminated due to signal: ABORT TRAP (6)

Thinking it was a problem with the used certificates I turned to the mysql binary to verify their contents:

$ mysql \
	--ssl-cert='path/to/client_cert' \
	--ssl-key='path/to/client_private_key' \
	--ssl-ca='path/to/server_ca_cert' \
	--host=DB_HOST \
	--user=DB_USERNAME \
	--password

And guess what: that command, of course, worked just fine … but what is the problem then? Is it Google Cloud SQL? Is it PHP acting up?

At the root of the problem is the fact that I’m connecting to the host using an IPv4 address and that the Common Name in the certificate – which one needs to choose manually when generating an SSL Certificate for a Cloud SQL instance – does not match (ref).

To work around this issue you have two options:

  1. (recommended) Connect to your Cloud SQL instance using the Cloud SQL proxy
  2. (not recommended, but needed when solution 1 is not an option) Disable the verification of the server SSL certificate

For the first solution follow Google’s instructions on the Cloud SQL Proxy.

For the second solution, set the PDO::MYSQL_ATTR_SSL_VERIFY_SERVER_CERT option to false:

try {
	$db = new PDO('mysql:host=DB_HOST;dbname=DB_NAME', $user, $pass, [
		PDO::MYSQL_ATTR_SSL_KEY => 'path/to/client_private_key',
		PDO::MYSQL_ATTR_SSL_CERT => 'path/to/client_cert',
		PDO::MYSQL_ATTR_SSL_CA => 'path/to/server_ca_cert',
		PDO::MYSQL_ATTR_SSL_VERIFY_SERVER_CERT => false,
	]);
} catch (PDOException $e) {
	print "Error!: " . $e->getMessage() . "<br/>";
	die();
}

💁‍♂️ Do note that in this case your connection is encrypted, but you’re not validation the server SSL certificate, and therefore cannot be guaranteed that you’re connecting to the correct server …

With all these set, your PHP code should be able to connect to the MySQL instance without being terminated prematurely …

Did this help you out? Like what you see?
Thank me with a coffee.

I don't do this for profit but a small one-time donation would surely put a smile on my face. Thanks!

☕️ Buy me a Coffee (€3)

To stay in the loop you can follow @bramus or follow @bramusblog on Twitter.

Unsafe SQL functions in Laravel

Recently the folks from Spatie released a security update for their laravel-query-builder package. Turns out it was vulnerable to SQL Injection.

At the core of the vulnerability is the fact that Laravel offers a shorthand for querying only certain fields of JSON data, but that these do not get escaped when converted to a json_extract function.

Brent has a detailed writeup on this:

Instead of manually writing json_extract, we can use the simplified -> syntax, which Laravel will convert to the correct SQL statement.

Blog::query()
    ->addSelect('title->en');
SELECT json_extract(`title`, '$."en"') FROM blogs;

Be careful though: Laravel won’t do any escaping during this conversion.

If you were to change title->en – which could come from a URL or user input – to title->en'#, you’re in …

Thankfully by now a fix authored by Brent has landed in Laravel 5.8.11 🙂

Unsafe SQL functions in Laravel →
An important security release for laravel-query-builder

MySQL ST_Distance_Sphere Polyfill

One of the cool things about MySQL 5.7 is the fact that it sports a few spatial convenience functions (since 5.7.6), allowing one to do operations on geometric values.

One of those convenience functions is ST_Distance_Sphere, which allows one to calculate the (spherical) distance between two points.

Recently I was working an project where I needed said function, yet unfortunately they were running some version of MySQL 5.6 … on which ST_Distance_Sphere isn’t available.

Instead of requesting to upgrade the MySQL server, I decided to polyfill ST_Distance_Sphere instead. It’s really easy since it’s a function, and it basically is nothing more than the Haversine Formula which I, as a mapping aficionado, know how to juggle.

DELIMITER $$

DROP FUNCTION IF EXISTS `ST_Distance_Sphere`$$

CREATE FUNCTION `ST_Distance_Sphere` (point1 POINT, point2 POINT)

	RETURNS FLOAT
	no sql deterministic
	BEGIN
		declare R INTEGER DEFAULT 6371000;
		declare `φ1` float;
		declare `φ2` float;
		declare `Δφ` float;
		declare `Δλ` float;
		declare a float;
		declare c float;
		set `φ1` = radians(y(point1));
		set `φ2` = radians(y(point2));
		set `Δφ` = radians(y(point2) - y(point1));
		set `Δλ` = radians(x(point2) - x(point1));

		set a = sin(`Δφ` / 2) * sin(`Δφ` / 2) + cos(`φ1`) * cos(`φ2`) * sin(`Δλ` / 2) * sin(`Δλ` / 2);
		set c = 2 * atan2(sqrt(a), sqrt(1-a));

		return R * c;
	END$$

DELIMITER ;

Run the little snippet above on your server after having selected a database first. Once executed it’ll persistently be available (just like a Stored Procedure) for you to use. Usage is the same as if it were natively available:

SELECT ST_Distance_Sphere(
  POINT(-87.6770458, 41.9631174),
  POINT(-73.9898293, 40.7628267)
);

// ~> 1148978.6738241839 (in metres)

Happy Mapping! 🙂

Did this help you out? Like what you see?
Thank me with a coffee.

I don't do this for profit but a small one-time donation would surely put a smile on my face. Thanks!

☕️ Buy me a Coffee (€4)

To stay in the loop you can follow @bramus or follow @bramusblog on Twitter.

Context aware MySQL pools via HAProxy

At GitHub they use MySQL as their main datastore. The setup is a typical “single-writer-multiple-readers” design. They loadbalance between server pools using HAProxy, with some cleverness built in:

Instead [of checking whether a MySQL server is live with mysql-check], we make our HAProxy pools context aware. We let the backend MySQL hosts make an informed decision: “should I be included in a pool or should I not?”

The HAProxy config contains all known hosts. The hosts themselves can answer to checks running with either HTTP 200 (OK), HTTP 503 (Not OK), or HTTP 404 (Maintenance). Based upon those answers HAProxy will assess if there are enough hosts in the main mysql_ro_main pool to handle the load, and automatically switch to the failover mysql_ro_backup pool if need be.

frontend mysql_ro
  ...
  acl mysql_not_enough_capacity nbsrv(mysql_ro_main) lt 3
  use_backend mysql_ro_backup if mysql_not_enough_capacity
  default_backend mysql_ro_main

Clever.

As an extra, they’ve also integrated it all into their chatops.

github-mysql-haproxy-chatops

Context aware MySQL pools via HAProxy →

Why Uber Engineering Switched from Postgres to MySQL

MySQL_Index_Property_Header_

Postgres served us well in the early days of Uber, but we ran into significant problems scaling Postgres with our growth. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

Why Uber Engineering Switched from Postgres to MySQL →

The mailinglist post “Why we lost Uber as a user” that came as a response is also worth a read.

MySQL behaviour you should be aware of

I suggest changing the syntax for NOT NULL to “NOT” NULL.

If you’re not checking all values in your code logic (viz. PHP code or something like that) before attempting to perform the database manipulation you can fix it by (manually) add triggers to your tables to actually prevent a faulty insert/update:

mysql> CREATE TRIGGER check_constraint BEFORE INSERT ON table_name
    -> FOR EACH ROW
    -> BEGIN
    ->   DECLARE error_msg varchar(255);
    ->   IF NEW.column_name = '' THEN
    ->     SET error_msg = 'Constraint my_constraint violated: ...';
    ->     SIGNAL SQLSTATE '45000' SET message_text = error_msg;
    ->   END IF;
    -> END;

(via)

Storing MD5 values

A common occurrence I have noticed in MySQL apps is that MD5 values are stored as 32 byte values rather than 16. Just to ‘rehash’, an MD5 value is a 16 byte hexadecimal value, typically used as a unique fixed-length signature of a string, useful for identifying unique strings or one-way encryption of passwords. The binary representation takes 16 bytes (e.g. BINARY(16)), though a human readable hexadecimal version takes twice as many (e.g. CHAR(32)).

Same goes for IPv4 address, which can be stored as UNSIGNED INT (4 bytes) instead of VARCHAR(16) (16 bytes)

Tip: Storing MD5 values →

WebScaleSQL

WebScaleSQL is a collaboration among engineers from several companies that face similar challenges in running MySQL at scale, and seek greater performance from a database technology tailored for their needs.

Those “several companies” are Facebook, Google, LinkedIn, and Twitter. And the collaboration mentioned is a true collaboration:

For example, to introduce a code change, a WebScaleSQL engineer can propose a change. Then a WebScaleSQL engineer from another company will review the code and provide feedback. If both engineers agree the change makes sense and is functional, it will be pushed into the WebScaleSQL branch for everyone to use.

The engineers over at Facebook have posted a list of improvements that have already made it into WebScaleSQL.

WebScaleSQL: A collaboration to build upon the MySQL upstream →
WebScaleSQL →
WebScaleSQL (GitHub) →