Please note that the following applies to PostgreSQL databases exclusively (this is not to say other database servers are immune to this issue, but we are not aware of this issue occurring in any of them and besides that, the scope of this document is limited to PostgreSQL index bloating). We use PostgreSQL databases in the Mendix Cloud.
If you ever run into the issue of your index growing much faster than your data, as seen in the following graph, your application might be the victim of index bloating and this article becomes worth reading.
This was an actual graph we created using an application affected by index bloating.
So what happened? Did someone create thousands of indexes to include each possible combination of attributes and create a lightning fast table? No, not quite. And if that sounds like a good idea to you, it isn’t, so don’t do it (the “why not” is outside the scope of this document).
Instead of explaining what happened here, let’s answer 4 relevant question that apply to index bloating as a whole instead:
What is index bloating exactly?
We can consult the PostgreSQL documentation and find the following explanation:
An index has become "bloated", that it is contains many empty or nearly-empty pages. This can occur with B-tree indexes in PostgreSQL under certain uncommon access patterns. REINDEX provides a way to reduce the space consumption of the index by writing a new version of the index without the dead pages. See Section 23.2 for more information.
In PostgreSQL 7.4 and later, index pages that have become completely empty are reclaimed for re-use. There is still a possibility for inefficient use of space: if all but a few index keys on a page have been deleted, the page remains allocated. So a usage pattern in which all but a few keys in each range are eventually deleted will see poor use of space. For such usage patterns, periodic reindexing is recommended.
Why does it occur?
This was already described in the PostgreSQL documentation listed above, but in case that was not clear enough, an example:
Say we have a table called Invoice with (among others) a day, month and year attribute and a combined index on these attributes (in SQL code this would result in the following query on creation (simplified for readability): CREATE INDEX someIndex ON “day”, “month”, “year”).
At first day of the month we retrieve all invoices of the previous month, combine them into a monthly invoice dated “the last day of the previous month”, commit it and delete all retrieved daily invoices.
If we take January as an example, 31 daily invoices (December) get deleted and 1 remains (the new monthly invoice dated December 31st 2013). The index goes from 31 entries (01-12-13, 02-12-13, x-12-13, …, 31-12-13) to 1 entry (31-12-13). However since not all entries of this subset (“month”) were deleted the index is not cleaned up and all entries remain stored (!). This is what causes index bloating.
How I can detect it?
By looking at the graph “Database table vs. index size” in the Mendix cloud.
Or by running the following query on your PostgreSQL database (backup, download and restore locally first in case of a Mendix Cloud app): https://wiki.postgresql.org/wiki/Show_database_bloat
What can I do about it?
The model can be changed such that the issue will be prevented. For example, by creating a separate MonthlyInvoice entity which holds the monthly aggregates instead of keeping them in the original table.
If you discover you are already the victim of index bloating you can do the same and resolve the current bloat in two ways:
In the Mendix Cloud this can be resolved by performing a full backup and restore of the database.
- In a local environment a REINDEX can be executed on all affected tables.
And that is all there really is to know about PostgreSQL index bloating in a Mendix application.