Don’t use OFFSET

The heading says it already: don’t use LIMIT rowcount OFFSET offset or LIMIT offset, rowcount (replace offset and rowcount by integers) in your SQL queries on our system. You won’t get the results you expect.

OFFSET is usually used for doing pagination, i.e. for cases when you want to get everything, but piecewise. E.g. you could say: make a query and return the first 10 result rows, then in the next query set offset=10 and retrieve the next 10 result rows and so on:

SELECT mass FROM MDR1.FOF WHERE snapnum=85 ORDER BY mass DESC LIMIT 0,10;
SELECT mass FROM MDR1.FOF WHERE snapnum=85 ORDER BY mass DESC LIMIT 10,10;
SELECT mass FROM MDR1.FOF WHERE snapnum=85 ORDER BY mass DESC LIMIT 20,10;
...

This works fine on single node servers, but not on a sharded database systems like the one behind CosmoSim. Imagine you asked for LIMIT 100,10. What would happen? The query would be submitted as is to the shard nodes, which would do the ordering just on the piece of data set which they store locally, skip the first 100 and then return 10 rows back to the head node. In the end we would have 10×10 rows on the head node (a package of 10 rows from each of the 10 shard nodes).

Currently, PaQu applies any LIMIT condition also on the head node table in the final step, so in the end, nothing would be returned to you, which is not what you expected. Even if you got the full 100 rows from the head node, this is not really what you expected. We could only provide the same behaviour as on a single node server, if we copied all the data first tot he head node and then applied the OFFSET – which would involve too much data copying to be useful,

In fact, OFFSET is not as useful as it may seem: in order to apply the OFFSET, every time one would need to calculate the complete result set beforehand anyway, so there is no gain in query execution time.

If you need to do queries with complicated calculations or joins piece by piece (which may occur quite often, especially when avoiding the timeout limit), please use a (unique) identifier from the table for this. For example use fofId for FOF-tables, rockstarId for Rockstar-tables etc. Consult the table documentation to find the right one for you.

Previous post:     Next post:
Proudly powered by Daiquiri
©2016 The CosmoSim databaseImprint and Data Protection Statement