How to optimize ORDER BY RANDOM()

Doing a ORDER BY RAND() in SQL is bad. Very bad. As Tobias Petry details (and Bernard Grymonpon always used to tell at local meetups):

Ordering records in a random order involves these operations:

  1. Load all rows into memory matching your conditions
  2. Assign a random value RANDOM() to each row in the database
  3. Sort all the rows according to this random value
  4. Retain only the desired number of records from all sorted records

His solution is to pre-add randomness to each record, in an extra column. For it he uses a the Geometric Datatype POINT type. In Postgres he then uses the following query that orders the records by distance measured against a new random point.

SELECT * FROM repositories ORDER BY randomness <-> point(0.753,0.294) LIMIT 3;

~

In MySQL you also have a POINT class (ever since MySQL 5.7.6) that you can use. However I don’t really see how that would work there, as you’d need to calculate the distance for each record using a call to ST_Distance:

SET @randomx = RAND();
SET @randomy = RAND();
SELECT *, ST_Distance(POINT(@randomx, @randomy), randomness) AS distance FROM repositories ORDER BY distance DESC LIMIT 0,3;

💁‍♂️ Using EXPLAIN on the query above verifies it doesn’t use an index, and thus goes over all records.

What I do see working instead, is use of a single float value to hold pre-randomness:

-- Add column + index
ALTER TABLE `repositories` ADD `randomness` FLOAT(17,16) UNSIGNED NOT NULL AFTER `randomness`;
ALTER TABLE `repositories` ADD INDEX(`randomness`);

-- Update existing records. New records should have this number pre-generated before inserting
UPDATE `repositories` SET randomness = RAND() WHERE 1;

With that column in place, you could then do something like this:

SET @randomnumber = RAND(); -- This number would typically be generated by your PHP code, and then be injected as a query param
SELECT * FROM repositories WHERE randomness < @randomnumber ORDER BY randomness DESC 0,3;

Unlike the query using POINT(), this last query will leverage the index created on the randomness column 🙂

~

How to optimize ORDER BY RANDOM()

Via Freek

Why Uber Engineering Switched from Postgres to MySQL

MySQL_Index_Property_Header_

Postgres served us well in the early days of Uber, but we ran into significant problems scaling Postgres with our growth. In this article, we’ll explore some of the drawbacks we found with Postgres and explain the decision to build Schemaless and other backend services on top of MySQL.

Why Uber Engineering Switched from Postgres to MySQL →

The mailinglist post “Why we lost Uber as a user” that came as a response is also worth a read.