Their are many good reasons to always have a primary key, even if it is just an automatic serial number, but the one that hit me personally is that it is surprisingly difficult to deduplicate a relational database.
When I was first learning SQL I was pretty firmly in the "use natural keys" department. And when the natural key was every single column I would go "whats the point?" shrug and have no primary key. Until I started getting duplicated rows
insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
insert into customer_email (name, address) values ('bob', 'bob@bobco.com');
Duplicate rows a. tend to mess up your query results and b. are surprisingly difficult to remove. If I remember correctly after spending far too long trying to find a pure sql solution I ended up writing a program that would find the duplicates, delete them(all of them as there is no way to delete all but one) then re insert them. and adding that missing primary key.
I still like natural keys more than I probably should. (you still need a key to prevent functional duplicates, even when using a surrogate key, why not cut out the middle man?) But am no longer so militant about it(mainly because it makes having dependent tables a pain)
I'm a fan of always including unique indexes in the DB, even if it must exclude soft deleted rows. At a minimum it can keep functional duplicate out. Those seem especially insidious when there are races.
This looks like it's targeted at finding some obvious things, and if you know your table doesn't need a primary key, you could always exclude it from the report.
Log tables with pkey on date can be searched faster in typical log table use of "tell me what happened in this time range", tho of course you have to make it unique
I'd rather logs not fail because for some weird reason the date of 2 records is exactly the same.
Time savings adjustments / clock drift correction sound like the most obvious candidate to produce such a case. Granted, chances are not high, but I'd usually prefer knowing that the chance of it failing for this reason is 0.
1) But with quite a hefty penalty on writes... I'd think you would be better off without a primary key and just using a BRIN index for that use case?
2) Even if you did really want that B-tree, you can still have it and not have to have to awkwardly make it unique if you don't make it a "primary" key.
This is inspired by the first responder kit from Brent Ozar for SQL Server, which is an amazing tool for accidental DBAs that do not know where to start optimizing.
I'm looking forward to trying this out on my postgres databases.
Why are indexes on foreign keys required? If I'm doing a join, it's going to select the primary key of the other table, how will an index on the foreign key help?
Referential integrity checks by the DB engine (e.g. when deleting from the foreign table) require reverse look-ups of foreign keys, which would necessarily become full table scans without an index. Apart from that, applications also often do look-ups like this.
I'm not the author, but I think you could by using UNION ALL instead of temp tables. You could also make a view that just calls this function. I'm not sure why it would matter though.
You may also want to try check_postgres [1] and pg_insights [2]
[1] https://bucardo.org/check_postgres/
[2] https://github.com/lob/pg_insights
I would disagree on the fact that a table without a primary key is a critical problem.
There are multiple reasons for tables not having primary keys. Log tables are one example.
Excessive sequential scans is also not a problem for small tables.
Logical replication requires a primary key. We found that out the bad way. It _is_ a critical problem.
Their are many good reasons to always have a primary key, even if it is just an automatic serial number, but the one that hit me personally is that it is surprisingly difficult to deduplicate a relational database.
When I was first learning SQL I was pretty firmly in the "use natural keys" department. And when the natural key was every single column I would go "whats the point?" shrug and have no primary key. Until I started getting duplicated rows
Duplicate rows a. tend to mess up your query results and b. are surprisingly difficult to remove. If I remember correctly after spending far too long trying to find a pure sql solution I ended up writing a program that would find the duplicates, delete them(all of them as there is no way to delete all but one) then re insert them. and adding that missing primary key.I still like natural keys more than I probably should. (you still need a key to prevent functional duplicates, even when using a surrogate key, why not cut out the middle man?) But am no longer so militant about it(mainly because it makes having dependent tables a pain)
I'm a fan of always including unique indexes in the DB, even if it must exclude soft deleted rows. At a minimum it can keep functional duplicate out. Those seem especially insidious when there are races.
This looks like it's targeted at finding some obvious things, and if you know your table doesn't need a primary key, you could always exclude it from the report.
Log tables with pkey on date can be searched faster in typical log table use of "tell me what happened in this time range", tho of course you have to make it unique
I'd rather logs not fail because for some weird reason the date of 2 records is exactly the same. Time savings adjustments / clock drift correction sound like the most obvious candidate to produce such a case. Granted, chances are not high, but I'd usually prefer knowing that the chance of it failing for this reason is 0.
1) But with quite a hefty penalty on writes... I'd think you would be better off without a primary key and just using a BRIN index for that use case?
2) Even if you did really want that B-tree, you can still have it and not have to have to awkwardly make it unique if you don't make it a "primary" key.
> Unused Large Indexes - Indexes consuming significant disk space but never used (>10MB, 0 scans)
Is this a typo? I would think that 10MB seems ridiculously small for a threshold here.
This is inspired by the first responder kit from Brent Ozar for SQL Server, which is an amazing tool for accidental DBAs that do not know where to start optimizing.
I'm looking forward to trying this out on my postgres databases.
Why are indexes on foreign keys required? If I'm doing a join, it's going to select the primary key of the other table, how will an index on the foreign key help?
If you care about performance, they are required.
https://dev.to/jbranchaud/beware-the-missing-foreign-key-ind...
Again, another thing we learnt the hard way. All FKs now require a index for us.
Referential integrity checks by the DB engine (e.g. when deleting from the foreign table) require reverse look-ups of foreign keys, which would necessarily become full table scans without an index. Apart from that, applications also often do look-ups like this.
Very nice!
Did you consider making this a view instead? Just curious if there is a reason why you couldn't.
I'm not the author, but I think you could by using UNION ALL instead of temp tables. You could also make a view that just calls this function. I'm not sure why it would matter though.