Also, I'll add my perspective: I think "EXPLAIN (PLAN_ADVICE)" is a key piece to making this a plan stability feature, not (just) a hinting feature. The extensibility/framework pg_plan_advice adds is a foundation, that over time will over time address the age-old "Postgres doesn't have hints" problem, even if the initial release doesn't check all the boxes yet, e.g. no way to use advice for adjusting row/join estimates.
To give an example on extensibility: Some people that I've spoken to are asking "but why is it not a comment-style hint". There are reasons why Postgres didn't go that way for this release (comment parsing in core is non-existent today, and comments don't work correctly e.g. for functions), but its easy to write an extension that sets up an advisor hook to parse comments: https://github.com/pganalyze/pg_advice_comment
alexpotato 12 hours ago [-]
At a past job, I managed the DBA team that owned the books and records database for a major hedge fund.
Some quick stories about databases in general and type hints in particular:
1. Some of the joins involved 10+ tables AND had query hints
That sounds cool until you realize that the queries were designed for the planner from 3 versions ago
2. Was in multiple conversations with head of the DBA team and application owners that went something like this:
Developer: "This query hasn't changed in YEARS! Why is it not performing well now??"
Me: "Have you considered that maybe the data cardinality has changed?" and then I or the lead DBA would explain how database indices worked.
I mention the above only as a warning to folks newer to DBA land that some of these features can seem great but can also be a crutch. What should have happened is that the schema, indices or application code should have been reviewed by the app teams (with the DBAs) periodically to ensure assumptions were correct.
(This was several years ago so I'm curious what kind of impact the LLMs have had on this)
nijave 10 hours ago [-]
LLMs are an interesting call out. I've been trying with the idea of a query optimizer agent that analyzes PG slow query/auto explain logs.
I've only tried Claude Opus but it does a pretty good job interpreting the plan. It's also really powerful being able to bring in telemetry context and code to help make a more balanced tradeoff with usage metrics (much easier to callout anomolies, edge cases, and non performance sensitive contexts)
That said fixing the queries which usually originate from Django ORM is a bit hit or miss. Been multiple times the LLM wants to rewrite to something similar but not identical or wants to make large, structural changes instead of focusing on low hanging surgical improvements
Once it was even able to pull PG mailing lists to figure out I was hitting a planner predicate comparison limitation where switching from "bool = false" to "not bool" or maybe the other way around led to a massive 100x+ improvement (the planner was skipping using an index because the predicate in the query didn't "match" the predicate in the partial index despite the conditions being logically equivalent)
Would have taken days or weeks for me to figure that out on my own
Atotalnoob 9 hours ago [-]
You can tell the LLM to optimize the Django ORM. With ORMs, large structural changes are sometimes needed for small query text changes due to how ORMs generate the SQL.
I’m a SWE and at my previous job it fell to me to optimize some queries that used EF Core ORM. I gave opus a local db with a small export of anonymized prod data, had it generate a billion rows with similar cardinality.
Then I told opus how to get the raw SQL and told it to write unit tests with various optimizations.
It got the query down from 5+ minutes to a few seconds. I verified the final SQL and ORM structure.
All it needed was to modify some indices and fix up the ORM to properly generate the lateral join.
However, to get the join working properly required a full restructure of the ORM query.
nijave 8 hours ago [-]
>You can tell the LLM to optimize the Django ORM
I have and have come to the conclusion it's too risky. On too many attempts it hallucinates correctness and subtly breaks edge cases
>large structural changes are sometimes needed for small query text changes due to how ORMs generate the SQL
Sometimes that's true but a lot of times you can get 90% the benefit with significantly lower risk with a much smaller targeted changes.
It can check correctness against real data but I've still hit problems where you can just look at the rewrite and identify behavior differences that don't necessarily show up in 1 off execution comparison unless you brute force your way through tons of test data.
A lot of the cases I'm running into are somewhat older code (5+ years) with poor test coverage that few people understand or remember. Some of these probably have legitimate bugs but it's risky assuming and safer to ensure an optimization is identical.
hylaride 10 hours ago [-]
Having a good DB monitoring setup can also help a lot. We (over)pay for datadog's Database monitoring APM, but it is decent in letting you see how queries are run and change over time. It also checks query plans and lets you know when too many sequential scans are being performed, as well as indexes that are no longer used (in large DBs removing them can alleviate a lot of write IOPs).
It's not perfect and still requires thought and understanding (it'll recommend removing unused indexes on the primary keys, for example), but SOMETHING needs to continuously monitor anything but the smallest, simplest relational DBs.
nijave 8 hours ago [-]
Yeah slowly building up to that, we noped out of DD APM because leadership does not like seeing the $ amount of the bill (despite it being a small fraction of AWS and consisting of 10s of services in 1 platform)
crimsonnoodle58 1 days ago [-]
> How many of us have toggled enable_seqscan to off to force an index scan? Or thrown an OFFSET 0 into a subquery to prevent the planner from flattening it?
enable_nestloop = off here.
For us, joining many complex views quickly trips the planner up, so I'm really glad to see this.
> They break on upgrades.
The irony is so does the planner. I've seen queries working perfectly fine in older PG's suddenly run away in newer versions. So hints will actually bring stability.
da_chicken 21 hours ago [-]
The planner breaking on updates is common for almost all RDBMSs. They introduce optimizations that work great for 95% of customers, and some will just have queries that now act like cardinality is way off or covering indexes are missing.
mpyne 20 hours ago [-]
This issue was one of AWS's listed reasons for tending to prefer NoSQL style databases over "more performant" RDBMS, because of the more consistent worst-case performance, even if the result is worse average-case performance, which was important in their assumptions for scalability planning.
throwatdem12311 20 hours ago [-]
Every single time I’ve thought I’ve needed to try these it made it worse.
Every time Claude tries to tell me to try these, it made it worse.
Not once has it made it better for me. I’m doing materialized view refreshes with a billion rows, which is small enough maybe that this doesn’t come into play…but so far the planner knows best.
If the database can’t make it fast with just smart joins and filtering then it’s the architecture that’s a problem, not the database.
Usually the only thing I need to do is increase work_mem.
silon42 19 hours ago [-]
The hints are needed when you have a web scale / online transaction processing application, where you want to guarantee no table scan ever. Ideally, I'd want to rrun without statistics on the tables in that case (ie, no smart joins).
Someone 11 hours ago [-]
> where you want to guarantee no table scan ever.
If hints are what they say they are, they cannot guarantee anything.
And they indeed are hints. FTA: “The documentation is explicit: advice "can only produce plans the core planner considers viable." Advice only nudges the planner toward one it already considered.”
setr 16 hours ago [-]
honestly I dont want plan hinting so much as being able to ban table-scanning / nested loops / etc on specific tables, and be able to set that independent of actual table-size so I can coerce its failure in dev environments
silon42 15 hours ago [-]
+1 I agree... (except nested loops are rarely the problem, they are often the best/simplest alternative for join).
skywhopper 13 hours ago [-]
“I’ve never run into it so it can’t be a problem for other people” is an interesting take.
trollbridge 21 hours ago [-]
Shudder. Flashbacks to having to write optimiser hints in Oracle (and the resulting fun times when you'd upgrade the database, something would change, and your hints would make a query slower).
winrid 19 hours ago [-]
As opposed to not upgrading and the planner picking a plan at 3am that is slower?
16 hours ago [-]
jeffbee 20 hours ago [-]
I can't believe we're still doing this. You should be able to dictate the query execution without involving a planner.
timacles 20 hours ago [-]
I really don’t think you’re considering all the edge cases in such a scenario.
Most devs have problems writing decent queries (in some situations), now you want to introduce writing the query execution plan into the mix ?
akoboldfrying 19 hours ago [-]
No one is taking away the existing automatic planning that works well 95% of the time. You're welcome to continue using that.
The worst thing that could possibly happen is that you give it bad advice leading to slow queries, and then the obvious first step to fixing that is to drop the manual advice and see whether the automatic planner handles it better.
It baffles me that PostgreSQL, which is so deeply customisable in almost every other way, resisted this form of customisability for so long. This is great news.
zackmorris 8 hours ago [-]
I'm against this because hints should be a last resort in declarative programming languages like SQL.
Our productivity is proportional to our ability to recruit abstractions. The more we deal with pure concepts like relational algebra and data-driven development, the more bang we get for our buck.
If we get lost in the weeds having to worry about doing the planner's job, it's like we're paying a tax that doesn't need to exist.
This is why the syntactic sugar of Ruby, the async design patterns of JavaScript, the footgun avoidance best practices of C++, even the impure workarounds of functional languages, (all meant to improve developer and/or execution performance) don't really do it for me. They hint at avoidance of deeper understanding. Once we learn higher abstractions like copy-on-write, compare-and-swap, higher-order methods, etc, we start to see that languages pass the cost of their externalities on to us.
I'd prefer that Postgres move the opposite direction. For example, databases need a universal index that turns as many operations as possible into O(1) at the cost of memory, since resource prices tend to always fall on a long enough timescale. Stuff that works more like a content-addressable memory for ludicrous scaling. In other words, whatever it takes to make planner hints obsolete, is what Postgres maintainers should be putting their efforts into.
I guess a stopgap might be to have an automated way to profile an app during testing and generate planner hints for the main use cases. Or maybe be able to cache them to avoid cold start latency. But if my work ever requires me to deal with them directly, I'll be treating that as a code smell.
-
After writing this out, I realized that performance is an orthogonal concern to conceptual correctness. So a more appropriate phrasing might be that the planner is none of SQL's business. So technically, anyone's opinion on it is valid. In which case, we should choose the path of kindness. If allowing access to the planner saves someone's bacon, than we should allow it. But work to alleviate whatever pain necessitated its use in the first place.
robertlagrant 24 hours ago [-]
I'm not an expert in database hints, but the syntax looks very readable and composable. That's great thing to have got right.
cryptonector 19 hours ago [-]
> The advice language is surprisingly expressive for something the community resisted for decades.
FINALLY!
I like this design.
And yes, the community resisted this for way too long.
skywhopper 13 hours ago [-]
Sounds like they were waiting for a good design.
cryptonector 8 hours ago [-]
The rejections over the years that I saw were not about design.
tschellenbach 16 hours ago [-]
Imagine you have a SAAS app. Microsoft and Apple are customers. You have a table of devices. Type is either windows or osx. How does postgres know how to handle the query devices, where type = OSX properly? How does it know that this matches ~0 or ~100 of rows depending on the customer?
This is the main thing the planner doesn't handle well. Postgres was built before SAAS was as big. You have different distributions per customer, and thousands of customers. In most cases the query planner will guess right, but sometimes it will fail and scan millions of rows.
fabian2k 15 hours ago [-]
You can manually add more advanced statistics in PostgreSQL. That includes statistics over multiple columns.
Give the customer what they want, even if it sucks to do so. The alternative it be cast into irrelevance over time. You can run an OSS project however you want, but you can't avoid the consequences of doing so.
Principles driven development (we will never or always do X regardless of context) typically comes off as a petty ego trip. The point of the technology is to serve some kind of downstream business. Most people who download Postgres are seeking to solve a real world problem, not to demonstrate their ideological purity.
skywhopper 13 hours ago [-]
What are you talking about? This has nothing to do with the article, which describes a query plan advice provider that solves lots of problems with typical hint systems, and does so in a manner good enough to be worthy of including in core, rather than throwing in a half-baked broken clone of Oracle because people are scared to install an extension.
bob1029 11 hours ago [-]
> A Brief History of "Never"
> The Postgres community's position on query hints has been, shall we say, firm. The official wiki page on the subject states it plainly:
"We are not interested in implementing hints in the exact ways they are commonly implemented on other databases. Proposals based on "because they've got them" will not be welcomed."
Are we reading the same article?
simmschi 15 hours ago [-]
FINALLY!
I never understood the issues PG had with hints. Running a non-trivial DB with a non-trivial schema and scale is, well, non-trivial at all. At some point the DB stops being a black box and starts being a tool that you have to know inside and out to avoid performance issues.
>The optimizer is usually smarter than you think.
Except for when it isn't, and moves heavy calculation inside a nested loop inside a nested loop to avoid an index scan. Nothing is perfect.
williamdclt 14 hours ago [-]
> > The optimizer is usually smarter than you think.
> Except for when it isn't, and moves heavy calculation inside a nested loop inside a nested loop to avoid an index scan. Nothing is perfect.
Yeah that's also been my experience. It's true that Postgres is usually smarter than I think, when I try to figure out why it's not using a better query plan I eventually find out that it wouldn't be better at all. But from time to time, it genuinely is taking a bad decision and having no power over that at all is a problem
skywhopper 13 hours ago [-]
Yeah, this article explains the reasons why, and how this solution avoids them.
shay_ker 10 hours ago [-]
I have a naive question: why did this take 15 years? I understand that good APIs need time and thoughtful design, but I struggle to understand why we couldn’t get to the same (or better) solution faster.
andriy_koval 5 hours ago [-]
It was ideological goal: they wanted to build query optimizer which would always be smarter than human, but looks like they admitted defeat after many years of stubbornness.
aeontech 20 hours ago [-]
Very interesting - I just installed pg_hint_plan [0] extension a few months ago to get around a query that was confusing the planner too much. Edge case, but when you need it you really need it.
Haven't seen pg_plan_advice before, TIL!
jbellis 23 hours ago [-]
man, Tom Lane has hated query hints for literally decades
did he finally come around?
lfittl 22 hours ago [-]
I don't think Tom's perspective has necessarily changed (and there is certainly concern from others that this could cause less reports on planner bugs), but Tom is pretty good about not standing in the way of others (i.e. Robert Haas in this case) trying to make things work, and being open to new perspectives.
I do know that one of the important criteria for getting this in was that a bad advice can't cause the planner to fail, and that's something that was explicitly included in the design of pg_plan_advice.
ksec 11 hours ago [-]
OT: This reminded me of OrioleDB, what happened to it?
akorotkov 10 hours ago [-]
We keep going. Planning GA till the end of this year. Only a couple of missing features left. The main focus is stability.
ksec 7 hours ago [-]
Thanks for the update. Since there wasn't any new blog post I was a little worried.
Also, I'll add my perspective: I think "EXPLAIN (PLAN_ADVICE)" is a key piece to making this a plan stability feature, not (just) a hinting feature. The extensibility/framework pg_plan_advice adds is a foundation, that over time will over time address the age-old "Postgres doesn't have hints" problem, even if the initial release doesn't check all the boxes yet, e.g. no way to use advice for adjusting row/join estimates.
To give an example on extensibility: Some people that I've spoken to are asking "but why is it not a comment-style hint". There are reasons why Postgres didn't go that way for this release (comment parsing in core is non-existent today, and comments don't work correctly e.g. for functions), but its easy to write an extension that sets up an advisor hook to parse comments: https://github.com/pganalyze/pg_advice_comment
Some quick stories about databases in general and type hints in particular:
1. Some of the joins involved 10+ tables AND had query hints
That sounds cool until you realize that the queries were designed for the planner from 3 versions ago
2. Was in multiple conversations with head of the DBA team and application owners that went something like this:
Developer: "This query hasn't changed in YEARS! Why is it not performing well now??"
Me: "Have you considered that maybe the data cardinality has changed?" and then I or the lead DBA would explain how database indices worked.
I mention the above only as a warning to folks newer to DBA land that some of these features can seem great but can also be a crutch. What should have happened is that the schema, indices or application code should have been reviewed by the app teams (with the DBAs) periodically to ensure assumptions were correct.
(This was several years ago so I'm curious what kind of impact the LLMs have had on this)
I've only tried Claude Opus but it does a pretty good job interpreting the plan. It's also really powerful being able to bring in telemetry context and code to help make a more balanced tradeoff with usage metrics (much easier to callout anomolies, edge cases, and non performance sensitive contexts)
That said fixing the queries which usually originate from Django ORM is a bit hit or miss. Been multiple times the LLM wants to rewrite to something similar but not identical or wants to make large, structural changes instead of focusing on low hanging surgical improvements
Once it was even able to pull PG mailing lists to figure out I was hitting a planner predicate comparison limitation where switching from "bool = false" to "not bool" or maybe the other way around led to a massive 100x+ improvement (the planner was skipping using an index because the predicate in the query didn't "match" the predicate in the partial index despite the conditions being logically equivalent)
Would have taken days or weeks for me to figure that out on my own
I’m a SWE and at my previous job it fell to me to optimize some queries that used EF Core ORM. I gave opus a local db with a small export of anonymized prod data, had it generate a billion rows with similar cardinality.
Then I told opus how to get the raw SQL and told it to write unit tests with various optimizations.
It got the query down from 5+ minutes to a few seconds. I verified the final SQL and ORM structure.
All it needed was to modify some indices and fix up the ORM to properly generate the lateral join.
However, to get the join working properly required a full restructure of the ORM query.
I have and have come to the conclusion it's too risky. On too many attempts it hallucinates correctness and subtly breaks edge cases
>large structural changes are sometimes needed for small query text changes due to how ORMs generate the SQL
Sometimes that's true but a lot of times you can get 90% the benefit with significantly lower risk with a much smaller targeted changes.
It can check correctness against real data but I've still hit problems where you can just look at the rewrite and identify behavior differences that don't necessarily show up in 1 off execution comparison unless you brute force your way through tons of test data.
A lot of the cases I'm running into are somewhat older code (5+ years) with poor test coverage that few people understand or remember. Some of these probably have legitimate bugs but it's risky assuming and safer to ensure an optimization is identical.
It's not perfect and still requires thought and understanding (it'll recommend removing unused indexes on the primary keys, for example), but SOMETHING needs to continuously monitor anything but the smallest, simplest relational DBs.
enable_nestloop = off here.
For us, joining many complex views quickly trips the planner up, so I'm really glad to see this.
> They break on upgrades.
The irony is so does the planner. I've seen queries working perfectly fine in older PG's suddenly run away in newer versions. So hints will actually bring stability.
Every time Claude tries to tell me to try these, it made it worse.
Not once has it made it better for me. I’m doing materialized view refreshes with a billion rows, which is small enough maybe that this doesn’t come into play…but so far the planner knows best.
If the database can’t make it fast with just smart joins and filtering then it’s the architecture that’s a problem, not the database.
Usually the only thing I need to do is increase work_mem.
If hints are what they say they are, they cannot guarantee anything.
And they indeed are hints. FTA: “The documentation is explicit: advice "can only produce plans the core planner considers viable." Advice only nudges the planner toward one it already considered.”
Most devs have problems writing decent queries (in some situations), now you want to introduce writing the query execution plan into the mix ?
The worst thing that could possibly happen is that you give it bad advice leading to slow queries, and then the obvious first step to fixing that is to drop the manual advice and see whether the automatic planner handles it better.
It baffles me that PostgreSQL, which is so deeply customisable in almost every other way, resisted this form of customisability for so long. This is great news.
Our productivity is proportional to our ability to recruit abstractions. The more we deal with pure concepts like relational algebra and data-driven development, the more bang we get for our buck.
If we get lost in the weeds having to worry about doing the planner's job, it's like we're paying a tax that doesn't need to exist.
This is why the syntactic sugar of Ruby, the async design patterns of JavaScript, the footgun avoidance best practices of C++, even the impure workarounds of functional languages, (all meant to improve developer and/or execution performance) don't really do it for me. They hint at avoidance of deeper understanding. Once we learn higher abstractions like copy-on-write, compare-and-swap, higher-order methods, etc, we start to see that languages pass the cost of their externalities on to us.
I'd prefer that Postgres move the opposite direction. For example, databases need a universal index that turns as many operations as possible into O(1) at the cost of memory, since resource prices tend to always fall on a long enough timescale. Stuff that works more like a content-addressable memory for ludicrous scaling. In other words, whatever it takes to make planner hints obsolete, is what Postgres maintainers should be putting their efforts into.
I guess a stopgap might be to have an automated way to profile an app during testing and generate planner hints for the main use cases. Or maybe be able to cache them to avoid cold start latency. But if my work ever requires me to deal with them directly, I'll be treating that as a code smell.
-
After writing this out, I realized that performance is an orthogonal concern to conceptual correctness. So a more appropriate phrasing might be that the planner is none of SQL's business. So technically, anyone's opinion on it is valid. In which case, we should choose the path of kindness. If allowing access to the planner saves someone's bacon, than we should allow it. But work to alleviate whatever pain necessitated its use in the first place.
FINALLY!
I like this design.
And yes, the community resisted this for way too long.
This is the main thing the planner doesn't handle well. Postgres was built before SAAS was as big. You have different distributions per customer, and thousands of customers. In most cases the query planner will guess right, but sometimes it will fail and scan millions of rows.
https://www.postgresql.org/docs/current/planner-stats.html#P...
Principles driven development (we will never or always do X regardless of context) typically comes off as a petty ego trip. The point of the technology is to serve some kind of downstream business. Most people who download Postgres are seeking to solve a real world problem, not to demonstrate their ideological purity.
> The Postgres community's position on query hints has been, shall we say, firm. The official wiki page on the subject states it plainly: "We are not interested in implementing hints in the exact ways they are commonly implemented on other databases. Proposals based on "because they've got them" will not be welcomed."
Are we reading the same article?
I never understood the issues PG had with hints. Running a non-trivial DB with a non-trivial schema and scale is, well, non-trivial at all. At some point the DB stops being a black box and starts being a tool that you have to know inside and out to avoid performance issues.
>The optimizer is usually smarter than you think.
Except for when it isn't, and moves heavy calculation inside a nested loop inside a nested loop to avoid an index scan. Nothing is perfect.
> Except for when it isn't, and moves heavy calculation inside a nested loop inside a nested loop to avoid an index scan. Nothing is perfect.
Yeah that's also been my experience. It's true that Postgres is usually smarter than I think, when I try to figure out why it's not using a better query plan I eventually find out that it wouldn't be better at all. But from time to time, it genuinely is taking a bad decision and having no power over that at all is a problem
Haven't seen pg_plan_advice before, TIL!
did he finally come around?
I do know that one of the important criteria for getting this in was that a bad advice can't cause the planner to fail, and that's something that was explicitly included in the design of pg_plan_advice.