Doing a ORDER BY RAND()
in SQL is bad. Very bad. As Tobias Petry details (and Bernard Grymonpon always used to tell at local meetups):
Ordering records in a random order involves these operations:
- Load all rows into memory matching your conditions
- Assign a random value
RANDOM()
to each row in the database- Sort all the rows according to this random value
- Retain only the desired number of records from all sorted records
His solution is to pre-add randomness to each record, in an extra column. For it he uses a the Geometric Datatype POINT type. In Postgres he then uses the following query that orders the records by distance measured against a new random point.
SELECT * FROM repositories ORDER BY randomness <-> point(0.753,0.294) LIMIT 3;
~
In MySQL you also have a POINT class (ever since MySQL 5.7.6) that you can use. However I don’t really see how that would work there, as you’d need to calculate the distance for each record using a call to ST_Distance
:
SET @randomx = RAND();
SET @randomy = RAND();
SELECT *, ST_Distance(POINT(@randomx, @randomy), randomness) AS distance FROM repositories ORDER BY distance DESC LIMIT 0,3;
💁♂️ Using EXPLAIN
on the query above verifies it doesn’t use an index, and thus goes over all records.
What I do see working instead, is use of a single float value to hold pre-randomness:
-- Add column + index
ALTER TABLE `repositories` ADD `randomness` FLOAT(17,16) UNSIGNED NOT NULL AFTER `randomness`;
ALTER TABLE `repositories` ADD INDEX(`randomness`);
-- Update existing records. New records should have this number pre-generated before inserting
UPDATE `repositories` SET randomness = RAND() WHERE 1;
With that column in place, you could then do something like this:
SET @randomnumber = RAND(); -- This number would typically be generated by your PHP code, and then be injected as a query param
SELECT * FROM repositories WHERE randomness < @randomnumber ORDER BY randomness DESC 0,3;
Unlike the query using POINT()
, this last query will leverage the index created on the randomness
column 🙂
~
How to optimize ORDER BY RANDOM()
→
Via Freek