How to optimize ORDER BY RANDOM()

Doing a ORDER BY RAND() in SQL is bad. Very bad. As Tobias Petry details (and Bernard Grymonpon always used to tell at local meetups):

Ordering records in a random order involves these operations:

  1. Load all rows into memory matching your conditions
  2. Assign a random value RANDOM() to each row in the database
  3. Sort all the rows according to this random value
  4. Retain only the desired number of records from all sorted records

His solution is to pre-add randomness to each record, in an extra column. For it he uses a the Geometric Datatype POINT type. In Postgres he then uses the following query that orders the records by distance measured against a new random point.

SELECT * FROM repositories ORDER BY randomness <-> point(0.753,0.294) LIMIT 3;

~

In MySQL you also have a POINT class (ever since MySQL 5.7.6) that you can use. However I don’t really see how that would work there, as you’d need to calculate the distance for each record using a call to ST_Distance:

SET @randomx = RAND();
SET @randomy = RAND();
SELECT *, ST_Distance(POINT(@randomx, @randomy), randomness) AS distance FROM repositories ORDER BY distance DESC LIMIT 0,3;

💁‍♂️ Using EXPLAIN on the query above verifies it doesn’t use an index, and thus goes over all records.

What I do see working instead, is use of a single float value to hold pre-randomness:

-- Add column + index
ALTER TABLE `repositories` ADD `randomness` FLOAT(17,16) UNSIGNED NOT NULL AFTER `randomness`;
ALTER TABLE `repositories` ADD INDEX(`randomness`);

-- Update existing records. New records should have this number pre-generated before inserting
UPDATE `repositories` SET randomness = RAND() WHERE 1;

With that column in place, you could then do something like this:

SET @randomnumber = RAND(); -- This number would typically be generated by your PHP code, and then be injected as a query param
SELECT * FROM repositories WHERE randomness < @randomnumber ORDER BY randomness DESC 0,3;

Unlike the query using POINT(), this last query will leverage the index created on the randomness column 🙂

~

How to optimize ORDER BY RANDOM()

Via Freek

You might as well timestamp it

Jerod Santo:

There are plenty of times in my career when I’ve stored a boolean and later wished I’d had a timestamp. There are zero times when I’ve stored a timestamp and regretted that decision.

Hear hear! Over the years I’ve come to include 9 meta fields for most of the tables I create: added_at, added_by, added_ip, edited_at, edited_by, edited_ip, deleted_at, deleted_by, and deleted_ip. Handy for whenever you receive a phone call saying that a record has disappeared, allowing you to pinpoint it to a user and a specific timestamp.

You might as well timestamp it →

How to delete lots of rows from a MySQL database without indefinite locking

Interesting post over at the Flare blog, where they had to delete over 900 million records from a MySQL table. They couldn’t simply run a DELETE FROM table query, as that would lock it:

When a deletion query using a WHERE clause starts, it will lock that table. All other queries that are run against the table will have to wait.

Because of the massive size of the table, the deletion query would probably run for hours. The lock would be in place for hours, essentially halting the entire app.

What I learnt from the post is that you can add a LIMIT clause to a DELETE query.

# Will only delete the first 1000 rows that match
DELETE FROM `events`
WHERE `added_on` < "2020-12-13 23:00:00"
LIMIT 1000;

That way you can chunk your DELETEs, so that your table isn’t locked for an indefinite amount of time. Here’s a little script that I built to run locally:

<?php

$cutoffDate = new \DateTime('-1 month');
$batchSize = 1000;

$db = …;

echo "♺ Deleting events before " . $cutoffDate->format(\DateTime::RFC3339) . PHP_EOL;
$total = 0;
do {
    $res = $db-> executeStatement('DELETE FROM `events` WHERE `added_on` < ? LIMIT ' . $batchSize . ';', [ $cutoffDate->format('U') ]);
    echo "  - [" . date(\DateTime::RFC3339) . "] Deleted " . $res . " rows" . PHP_EOL;
    $total += $res;
} while ($res == $batchSize);

echo "\x07✔ Deleted a total of $total rows" . PHP_EOL . PHP_EOL;

In their blog post they mention other ways to do so, including Laravel’s built-in jobs mechanism.

How to safely delete records in massive tables on AWS using Laravel →

Need to Connect to a Local MySQL Server? Use Unix Domain Socket!

The folks at Percona have benchmarked TCP/IP vs. Unix Connections to a local MySQL server.

When connecting to a local MySQL instance, you have two commonly used methods: use TCP/IP protocol to connect to local address – localhost or 127.0.0.1 – or use Unix Domain Socket.

If you have a choice (if your application supports both methods), use Unix Domain Socket as this is both more secure and more efficient.

Not that the exact numbers really matter, but the message should be clear: Use a Unix Domain Socket to connect, if you have the chance.

Need to Connect to a Local MySQL Server? Use Unix Domain Socket! →

Connect to Remote MySQL Server with SSL Certificates from PHP: Fixing the error "Terminated due to signal: ABORT TRAP (6)"


Photo by Clem Onojeghuo on Unsplash

To connect to a MySQL Server that requires SSL from PHP with PDO, you can use this piece of code:

try {
	$db = new PDO('mysql:host=DB_HOST;dbname=DB_NAME', $user, $pass, [
		PDO::MYSQL_ATTR_SSL_KEY => 'path/to/client_private_key',
		PDO::MYSQL_ATTR_SSL_CERT => 'path/to/client_cert',
		PDO::MYSQL_ATTR_SSL_CA => 'path/to/server_ca_cert',
	]);
} catch (PDOException $e) {
	print "Error!: " . $e->getMessage() . "<br/>";
	die();
}

To my surprise I was greeted with an abort signal (not an Exception!) when running this piece of code when trying to connect to a MySQL 5.7 Server on Google Cloud SQL:

Terminated due to signal: ABORT TRAP (6)

Thinking it was a problem with the used certificates I turned to the mysql binary to verify their contents:

$ mysql \
	--ssl-cert='path/to/client_cert' \
	--ssl-key='path/to/client_private_key' \
	--ssl-ca='path/to/server_ca_cert' \
	--host=DB_HOST \
	--user=DB_USERNAME \
	--password

And guess what: that command, of course, worked just fine … but what is the problem then? Is it Google Cloud SQL? Is it PHP acting up?

At the root of the problem is the fact that I’m connecting to the host using an IPv4 address and that the Common Name in the certificate – which one needs to choose manually when generating an SSL Certificate for a Cloud SQL instance – does not match (ref).

To work around this issue you have two options:

  1. (recommended) Connect to your Cloud SQL instance using the Cloud SQL proxy
  2. (not recommended, but needed when solution 1 is not an option) Disable the verification of the server SSL certificate

For the first solution follow Google’s instructions on the Cloud SQL Proxy.

For the second solution, set the PDO::MYSQL_ATTR_SSL_VERIFY_SERVER_CERT option to false:

try {
	$db = new PDO('mysql:host=DB_HOST;dbname=DB_NAME', $user, $pass, [
		PDO::MYSQL_ATTR_SSL_KEY => 'path/to/client_private_key',
		PDO::MYSQL_ATTR_SSL_CERT => 'path/to/client_cert',
		PDO::MYSQL_ATTR_SSL_CA => 'path/to/server_ca_cert',
		PDO::MYSQL_ATTR_SSL_VERIFY_SERVER_CERT => false,
	]);
} catch (PDOException $e) {
	print "Error!: " . $e->getMessage() . "<br/>";
	die();
}

💁‍♂️ Do note that in this case your connection is encrypted, but you’re not validation the server SSL certificate, and therefore cannot be guaranteed that you’re connecting to the correct server …

With all these set, your PHP code should be able to connect to the MySQL instance without being terminated prematurely …

Did this help you out? Like what you see?
Thank me with a coffee.

I don't do this for profit but a small one-time donation would surely put a smile on my face. Thanks!

☕️ Buy me a Coffee (€3)

To stay in the loop you can follow @bramus or follow @bramusblog on Twitter.

Unsafe SQL functions in Laravel

Recently the folks from Spatie released a security update for their laravel-query-builder package. Turns out it was vulnerable to SQL Injection.

At the core of the vulnerability is the fact that Laravel offers a shorthand for querying only certain fields of JSON data, but that these do not get escaped when converted to a json_extract function.

Brent has a detailed writeup on this:

Instead of manually writing json_extract, we can use the simplified -> syntax, which Laravel will convert to the correct SQL statement.

Blog::query()
    ->addSelect('title->en');
SELECT json_extract(`title`, '$."en"') FROM blogs;

Be careful though: Laravel won’t do any escaping during this conversion.

If you were to change title->en – which could come from a URL or user input – to title->en'#, you’re in …

Thankfully by now a fix authored by Brent has landed in Laravel 5.8.11 🙂

Unsafe SQL functions in Laravel →
An important security release for laravel-query-builder

MySQL ST_Distance_Sphere Polyfill

One of the cool things about MySQL 5.7 is the fact that it sports a few spatial convenience functions (since 5.7.6), allowing one to do operations on geometric values.

One of those convenience functions is ST_Distance_Sphere, which allows one to calculate the (spherical) distance between two points.

Recently I was working an project where I needed said function, yet unfortunately they were running some version of MySQL 5.6 … on which ST_Distance_Sphere isn’t available.

Instead of requesting to upgrade the MySQL server, I decided to polyfill ST_Distance_Sphere instead. It’s really easy since it’s a function, and it basically is nothing more than the Haversine Formula which I, as a mapping aficionado, know how to juggle.

DELIMITER $$

DROP FUNCTION IF EXISTS `ST_Distance_Sphere`$$

CREATE FUNCTION `ST_Distance_Sphere` (point1 POINT, point2 POINT)

	RETURNS FLOAT
	no sql deterministic
	BEGIN
		declare R INTEGER DEFAULT 6371000;
		declare `φ1` float;
		declare `φ2` float;
		declare `Δφ` float;
		declare `Δλ` float;
		declare a float;
		declare c float;
		set `φ1` = radians(y(point1));
		set `φ2` = radians(y(point2));
		set `Δφ` = radians(y(point2) - y(point1));
		set `Δλ` = radians(x(point2) - x(point1));

		set a = sin(`Δφ` / 2) * sin(`Δφ` / 2) + cos(`φ1`) * cos(`φ2`) * sin(`Δλ` / 2) * sin(`Δλ` / 2);
		set c = 2 * atan2(sqrt(a), sqrt(1-a));

		return R * c;
	END$$

DELIMITER ;

Run the little snippet above on your server after having selected a database first. Once executed it’ll persistently be available (just like a Stored Procedure) for you to use. Usage is the same as if it were natively available:

SELECT ST_Distance_Sphere(
  POINT(-87.6770458, 41.9631174),
  POINT(-73.9898293, 40.7628267)
);

// ~> 1148978.6738241839 (in metres)

Happy Mapping! 🙂

Did this help you out? Like what you see?
Thank me with a coffee.

I don't do this for profit but a small one-time donation would surely put a smile on my face. Thanks!

☕️ Buy me a Coffee (€4)

To stay in the loop you can follow @bramus or follow @bramusblog on Twitter.

Context aware MySQL pools via HAProxy

At GitHub they use MySQL as their main datastore. The setup is a typical “single-writer-multiple-readers” design. They loadbalance between server pools using HAProxy, with some cleverness built in:

Instead [of checking whether a MySQL server is live with mysql-check], we make our HAProxy pools context aware. We let the backend MySQL hosts make an informed decision: “should I be included in a pool or should I not?”

The HAProxy config contains all known hosts. The hosts themselves can answer to checks running with either HTTP 200 (OK), HTTP 503 (Not OK), or HTTP 404 (Maintenance). Based upon those answers HAProxy will assess if there are enough hosts in the main mysql_ro_main pool to handle the load, and automatically switch to the failover mysql_ro_backup pool if need be.

frontend mysql_ro
  ...
  acl mysql_not_enough_capacity nbsrv(mysql_ro_main) lt 3
  use_backend mysql_ro_backup if mysql_not_enough_capacity
  default_backend mysql_ro_main

Clever.

As an extra, they’ve also integrated it all into their chatops.

github-mysql-haproxy-chatops

Context aware MySQL pools via HAProxy →

Why Uber Engineering Switched from Postgres to MySQL

MySQL_Index_Property_Header_

Postgres served us well in the early days of Uber, but we ran into significant problems scaling Postgres with our growth. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

Why Uber Engineering Switched from Postgres to MySQL →

The mailinglist post “Why we lost Uber as a user” that came as a response is also worth a read.

MySQL behaviour you should be aware of

I suggest changing the syntax for NOT NULL to “NOT” NULL.

If you’re not checking all values in your code logic (viz. PHP code or something like that) before attempting to perform the database manipulation you can fix it by (manually) add triggers to your tables to actually prevent a faulty insert/update:

mysql> CREATE TRIGGER check_constraint BEFORE INSERT ON table_name
    -> FOR EACH ROW
    -> BEGIN
    ->   DECLARE error_msg varchar(255);
    ->   IF NEW.column_name = '' THEN
    ->     SET error_msg = 'Constraint my_constraint violated: ...';
    ->     SIGNAL SQLSTATE '45000' SET message_text = error_msg;
    ->   END IF;
    -> END;

(via)