How to optimize ORDER BY RANDOM()

Doing a ORDER BY RAND() in SQL is bad. Very bad. As Tobias Petry details (and Bernard Grymonpon always used to tell at local meetups):

Ordering records in a random order involves these operations:

  1. Load all rows into memory matching your conditions
  2. Assign a random value RANDOM() to each row in the database
  3. Sort all the rows according to this random value
  4. Retain only the desired number of records from all sorted records

His solution is to pre-add randomness to each record, in an extra column. For it he uses a the Geometric Datatype POINT type. In Postgres he then uses the following query that orders the records by distance measured against a new random point.

SELECT * FROM repositories ORDER BY randomness <-> point(0.753,0.294) LIMIT 3;

~

In MySQL you also have a POINT class (ever since MySQL 5.7.6) that you can use. However I don’t really see how that would work there, as you’d need to calculate the distance for each record using a call to ST_Distance:

SET @randomx = RAND();
SET @randomy = RAND();
SELECT *, ST_Distance(POINT(@randomx, @randomy), randomness) AS distance FROM repositories ORDER BY distance DESC LIMIT 0,3;

💁‍♂️ Using EXPLAIN on the query above verifies it doesn’t use an index, and thus goes over all records.

What I do see working instead, is use of a single float value to hold pre-randomness:

-- Add column + index
ALTER TABLE `repositories` ADD `randomness` FLOAT(17,16) UNSIGNED NOT NULL AFTER `randomness`;
ALTER TABLE `repositories` ADD INDEX(`randomness`);

-- Update existing records. New records should have this number pre-generated before inserting
UPDATE `repositories` SET randomness = RAND() WHERE 1;

With that column in place, you could then do something like this:

SET @randomnumber = RAND(); -- This number would typically be generated by your PHP code, and then be injected as a query param
SELECT * FROM repositories WHERE randomness < @randomnumber ORDER BY randomness DESC 0,3;

Unlike the query using POINT(), this last query will leverage the index created on the randomness column 🙂

~

How to optimize ORDER BY RANDOM()

Via Freek

Published by Bramus!

Bramus is a frontend web developer from Belgium, working as a Chrome Developer Relations Engineer at Google. From the moment he discovered view-source at the age of 14 (way back in 1997), he fell in love with the web and has been tinkering with it ever since (more …)

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.