I’m running into a scenario where a logic block’s Fetch Distinct that is built over a join table is timing out with an INEFFICIENT_QUERY_CANCELLED_INSIDE_AN_LB error. I’d like to better understand the performance considerations specific to Fetch Distinct, especially when joins are involved or when the underlying tables contain very large datasets.
I may be missing existing documentation, but I haven’t been able to find comprehensive guidance on performance criteria for fetch operations in general (chunk fetch, distinct fetch, etc.). If anyone can point me to relevant resources or share best practices, it would be incredibly helpful for making more performance-aware design decisions.
I don’t know of any documentation around performance optimization but we do have a debugging tool that may help you. “Application Monitoring” (under the grid menu in the top left → Monitoring → Application) will track all of the queries that execute while it is active. Once data is collected, you can view it from the Application Monitoring Workbench app. One of the tabs in there will actually allow you to run an explain analyze on the executed query’s SQL, which can be extremely informative when diagnosing query slowness. (here’s a quick article on reading explain analyzes)
The way I tend to digest explain analyzes these days is to ask AI to interpret the explain analyze output and make index recommendations for any performance issues.
They do have some general fetch documentation there that is helpful, but it doesn’t really dive into the different types of fetches (like distinct, chunk, aggregate, etc.) and the performance considerations/best practices for those.
@cass the short answer is that there is no deep-dive documentation on the nuances of individual actions at this time. The different fetch actions all contain some unique components that make it hard to group them into general statements.
For a Fetch Distinct in particular, there’s a higher database load than a typical fetch because the DB needs to find every record satisfying your filter criteria and then sort and filter those rows to remove duplicate records since the goal is to return one record per combination of distinct values. With no clean index to use, the DB must do this process manually. That grows in complexity when join tables are involved and inflating the overall number of records in the structure.
If you create an index that has all of the distinct fields, this can flip that process on its head and allow the DB to find your distinct set much more quickly. In this case, it’s worth trying the following.
Example:
Fetch Distinct on fields A, B, and C from table xyz
Fetch expressions over fields A, D, E, and F
Do not include a sort, if possible
Do not include a limit, if possible
Create an index over the table containing A, B, and C as its first three fields (any order).
The index can have additional fields, like a sorted DateTime, and that might change which records you get back but will not change the distinct values you get back.
I do not know whether joins will actually work within a fetch distinct, i.e. distinct over xyx_A, xyz_B, and abc_C. If that’s what you’re trying to do, it’s worth testing and ensuring that there are indices for xyz: A, B and abc: C individually.