Be mindful of defaults – A Brief Tale of PostgreSQL

Default values are like the secret sauce in a recipe – they make everything taste better, until they don’t. When you expose new configurations, it’s like adding a new ingredient to the mix. But changing default values can lead to a performance headache, as seen in PostgreSQL 14. So, be cautious with defaults, they can turn a sweet dish sour 😬.

The Dangers of Default Values in Software Development 🚫

I spend some time to talk about default values their value and their uh limitation and uh and then at the end I’ll give you uh an interesting story that happened a regression in post CL 14 that is caused by by one of those default rues you see when you build an API or an application and and it’s pretty you know a it’s a newborn uh application you really you don’t really know much about the app so you do your best guesses you expose limited number of configuration or parameters if that’s an API and you say I think that’s what the users want and you build it and you ship it and then later you understand as this software is being used in the wild either because of direct user requirements says hey uh can you allow me to change this thing or can you allow me change that thing or because of a understanding of certain performance bottleneck that you say you know what this parameter really need to be exposed so that we give the users the choice to do whatever they want and the moment regardless of the reason of why did you expose that parameter or configuration because like that’s how you expose either as a configuration or as a parameter the moment you do that you faced with a dilemma because that extra configuration you can force users one choice is you can force users to explicitly set that parameter or configuration now you have to set it guess what all the users will start to freak out because like wait a minute in the old release it used to work just fine now it’s telling me I have to set this thing and most 95% of the users were freak out and was say I don’t know what to do I don’t know what to say I me just set it to some value and then move on so you breaking things and uh to Engineers this is uh this is scary we don’t like to break things Asen we don’t we don’t really like to break users it’s scary to break users by adding an extra parameter so we do the latter which is to set a default value for this new configuration based on a old existing Behavior so right in in this existing Behavior this configuration would have been X and we said that that default value X and mostly this works right because you don’t really changed that anything nothing changed you just Expos something that is already that is the default value anyway it was hardcoded at some point and you exposed it but then you gave a user a choice to change it which is powerful okay uh and you can see this uh pattern right exposing configuration that didn’t exist before I take postag for example if you go to the postgress documentation page and in the editing I’ll try to add that uh you will see in the first release release 8.3 that’s the as far as we can go right you’ll see probably in the in the I don’t know the wall configuration you’ll probably see two parameters but then jump into the 16 and you’ll see pages of just scrolling of the configuration so that’s that’s a natural uh Evolution pattern we evolve this way right and we set default values for what we think is the best and these the danger happen when these default values change and that’s what happened in in postris uh 14 for the longest time uh vacuum if you know know vacuum is a is a utility is a postprocess utility uh that runs in FIS and it cleans up dead TS in in the database as long as they are no longer needed by older transactions they no longer satisfy the uh multiversion concurrency control so if that the case they say all right this is safe to remove because you see if if you insert a row if you delete a row or you update a row in post you always get a you to pull for the row so a row can have multiple two PS representing its current state and you can read the old things if you would like to like if I’m a transaction that I started in a certain moment then someone another transaction started after me and deleted their row and committed as long as I’m pinned to that snapshot to that moment I should still see that deleted all and that’s how tupes and multiversion concurrency I talk about all that in my database course check it out databases. win made it short so people can remember it and now what happened is and so that’s vacuum the how vacuum works so vacuum you specify a table and then it vacuums the entire table by kind the entire tail but it also uh does so many other things you know update the visibility map uh freezes things to avoid transaction wrap around so much thing but also optionally well I say optionally but it also finds all the indexes or IND indices if you the UK and and and vacuums those two because you see the the tuples poter also live in the index so you need to clean those up but the maintainers of postgis found out that by default until postgis 14 vacuum always cleaned up all the indexes when you clean up their corresponding table and they found on that this is slows down the vacuum process uh at an expense of uh basically additional cost to clean up those indexes for little uh for little gain the IND cleaning the index is in certain cases there is an you know there is some heris they use all right if the if the index has only two diples but I have to the problem is like I have to scan the entire index to find that dang two dead tipples and I took the hit to scan the entire index for nothing so they what they is like it’s all right I’m going to do a conservative approach so that was the old Behavior always clean up the indexes but then in pogus 14 they changed that they made a configuration and they exposed it and called it index cleanup I think that configuration was always there maybe introducing po 9 or 10 but it was the default value was on like hey always always clean up the indexes but then in post 44 was changed to a value called Auto which means let me decide as a post uh database that I think you’re good I think you I think I’m going to skip cleaning up the index and this created a performance regression and markk cigan he’s basically the performance Guru when it comes to databases he found that so he says like hey something is happening po 14 I have no idea why in pus 13 fast pus 13 in my regression my my runs it gets slower what happened the what would happen is po 14 the default has changed to Auto which resulted in indexes being extra bloated because vacuum didn’t clean up I’m going to repeat that it’s a car pass in bgus 14 the CH the index cleanup option was changed to Auto which resulted in indexes being extra bloated which means what does bloat mean need to do more ios’s more iOS work more slow down very very very interesting but of course he didn’t rec you didn’t realize that a default value has changed it’s hard to document all that stuff probably it is right and you can go through the document and see what’s okay what but nobody that I know experts know every single comparation of every single database it’s impossible to know so after talking to the postgress maintainers he had a dialog and document in his blog and he said hey actually didn’t nothing CH we just changed the default values from on to Auto so he flipped that in his configuration to on the old default and and guess what everything went fine so that’s what I want to talk about default values are amazing they are you know developer Eric I Pusher that word but you know what I mean but watch out because every time we hide something we get bit eventually and that’s what it’s called leaky abstraction the abstraction here was the index cleanup and it was hidden what was that it was hidden from us so it is you have to find a sweet spot at the end of the day between having default values or having explicit values and it’s um I think I don’t think there is an answer for this but what do you guys think about this can see you in the next one bye

| Key Takeaways |

|————————|

| Default values in software development |
| Dangers of changing default values in databases |
| Vacuum utility in PostgreSQL |
| Leaky abstractions in programming |
| Finding the balance between default and explicit values |

Conclusion

In the realm of software development, default values play a crucial role in providing users with a seamless experience. However, the dangers of altering default values, as demonstrated in the PostgreSQL 14 release, are a stark reminder of the potential pitfalls. It is imperative for developers and engineers to find a balance between the convenience of default values and the stability of explicit configurations. This cautionary tale serves as a powerful lesson in understanding the impact of seemingly innocuous changes on the performance of a database.

"Default values are amazing, but watch out because every time we hide something, we get bit eventually." – Anonymous

With this in mind, the future of software development lies in striking a harmonious balance between default and explicit values, ensuring a smooth user experience without compromising on functionality.

About the Author

About the Channel:

Share the Post:
en_GBEN_GB