Limit on fetches vs performance

Hello!

I have gotten conflicting information, so thought I would clarify. Does adding a limit to a fetch in LBs improve performance?

  • In my use case I know there will only be 2 lines for a given fetch (core charge/core return), will adding a limit of 2 on the fetch improve performance or add any benefit?

The reason for conflicting information is likely because the answer is quite nuanced, with most nuance being based on the shape of the dataset.

Limits say “stop/return this query as soon as I find N records.” Generally, the performance of LIMITs is based on whether the query returned because of the limit or because it ran out of records to process.

That is to say: if we assume that your limit is less than the number of records on the table, and that it is less than or equal to the number of records you will get back from the fetch, then it is faster to use a limit in the fetch. The query stops due to hitting a limit.

If we assume the limit is greater than or equal to the number of records on the table, OR that it is greater than the number of records you will get back from the fetch, it is slower to use a limit in the fetch. The query stops due to running through all available records.

In your case, do you find 2 records because only 2 exist? Or do you find 2 records via a filter? If the former, using a LIMIT will not be faster. If the latter, using a LIMIT will be faster.

Remember that filters are applied behind the scenes for details and subtables. If you have 2 records in a detail table, and you fetch those, then assuming those aren’t the only 2 detail records that exist across all headers, it will be faster to use a limit; as it will return as soon as it finds the records you are looking for rather than scanning all detail records.

Let me know if want any more info!

5 Likes

@samuel.sovereign I too have gotten conflicting information in the past. I generally subscribe to what you’ve listed above. But I’ve heard something related to the Limit adding unnecessary query parameters being detrimental. Is that a myth?

1 Like

@Aaron.C

Nothing specifically comes to mind for “adding unnecessary query parameters.” I will say I neglected mentioning OFFSET, which adds to the general rules above (use LIMIT * OFFSET as your limit value rather than just LIMIT).

It is an extra parameter to bind (edit: add) in the prepared statement, but this should not be detrimental at all to the performance of the query. EDIT: There is no binding, which makes this even less of a concern

I am inclined to say that statement, “the Limit adding unnecessary query parameters being detrimental”, is a myth. But if you see, hear, or experience, differently, please let us know and we can look at the specific case. As I mentioned, query performance is quite nuanced and there’s a lot of factors that can affect it.

2 Likes

I agree with @Aaron.C . We were first told to use them all the time if we only expect a certain amount of recordsto be retrieved. Then we were told not to use them as it does not help with performance since it is another parameter. We do not have a specific use case but have been using NOT using them as a rule of thumb. Perhaps @paul.villanueva can speak more to it since he has been more involved on performance enhancements.

1 Like

If you know your fetch is only going to return only 1 record, adding the limit 1 does not improve performance. For instance, if you are fetching by nwid there is not much difference whether your add limit 1 or not.

3 Likes

That’s not strictly true, partially due to what @samuel.sovereign was hinting at above - adding the limit tells the server to stop looking after it’s found n records.

The most common query by nwId with a limit of 1 says to the server “look for a record in this table with this nwId. Stop when you’ve found 1 record.”
This is more performant than not including a limit because that statement says “look for all records in this table with this nwId.” Since the server does not implicitly know that nwId is unique, it will keep looking for more records even after finding the 1 record it will find. The performance improvement is likely insignificant, but it is there and will become more apparent at scale.

There are lots of nuances to these cases and the answer to the questions is very frequently situational. Typically, if you expect the query to return 1 record, you should always include a limit for two reasons:

  1. Performance (sometimes this is an improvement, sometimes it isn’t)
  2. Readability of code (at a glance, “Limit 1” tells you what the developer was expecting when they wrote that code and makes understanding the code or debugging simpler)
3 Likes

@lauren Yeah I can understand the frustration, which is why I avoided a straight yes/no in my answer. There’s just so much it can depend on and I’d be cautious to say “always/never use a limit.”

@ross hit exactly what I was going to say about @paul.villanueva ‘s post.

The only time it is equally performant to use a LIMIT as to not in the LIMIT 1 case is when you exactly match the Unique Index with your filters, AND Postgres decides to use the Unique Index, which is not a given (it uses whatever it decides is best at runtime based on data distribution). In this case, Postgres knows it can stop and return the first record it finds.

I would generally agree with Ross’s advice to use a limit in this case, for the same reasons he listed, and that it fits into Use Case 1 I listed in my original answer: The limit is LESS THAN the number of records in the table AND less than or EQUAL to the number of records found from the fetch.

Again this is nuanced. For example if you add a sort order/ORDER BY to the query, it is possible (nuanced) that a LIMIT degrades performance on an otherwise well-performing query since it might now have to scan the whole dataset to get the correct order.

3 Likes

If that ain’t a set of words to live by :rofl:

2 Likes

I asked ChatGPT about this - ChatGPT - When to use LIMIT. See in particular the follow up question on how LIMIT interacts with indexes.

Aligns with most of the above comments. I personally would not be recommending LIMITs when asked about this situation. The marginal performance benefit it provides, to me, is not worth the risk of the user defining a LIMIT 2 because they believe there will only ever be 2, then surprise surprise, somehow there’s actually 3 and they can’t figure out what’s going wrong.

2 Likes

With respect to the original question, I am in agreement with @ian.p . Unless you know for certain that there will be exactly 2 records, using specific fetch parameters but not a limit is probably the way to go.
Limits do also tend to hide the fact that extraneous / incorrect data exists because you don’t unintentionally get back more records than expected.

5 Likes

In my use case I do only have 2 records, so I guess no limit would be faster. Thank you for your detailed response!

Update: I wrote this before reading all of the other comments and now I am confused haha. So I guess I DO want to use a limit if I know for sure there will be 2 or less records? That is what I am gathering. @samuel.sovereign

That answer depends on context, so it isn’t a “one answer fits all.”

  • If your table only has 2 records in it and you are doing a blanket fetch → No Limit
  • If your table has many records but your fetch expressions will narrow the data set down to exactly 2 records → no limit required, but it can be good to implement one for readability
  • If your table has many records and your fetch might narrow down the data set but not to only 2 records → sort + limit

If you provide some more context about the table your fetching from, whether you have fetch expressions, how many records should return (or could return), and whether you have a sort, we can better answer whether your fetch should or should not include a limit (or if it doesn’t really matter).

Okay I am thinking this is a really doesn’t matter scenario, but to confirm basically we will have a sales order records with 1-2 other SO details fetched through the nwHeaderID/parentPartSequenceNumber on each detail. @ross

I would tend to say “No Limit” if you’re really looking for all of the associated details by that part number - if those systematically can only be 2 in number, the nwHeaderId and other filter criteria should guarantee you get both records back