The short log is missing one of the most important and awesome new feature to application writers: a bunch of new PQresultErrorFields[0] on constraint failure, providing access to (context-dependent) the raw schema name, table name, column name, constraint name and datatype involved.
Previously these remained locked into the database and to get programmatically-useable info (for logging, better error messages or to translate in terms of e.g. your ORM) you had to parse error messages which were potentially localized and usually lacked half the necessary information.
And even better, for python developers, psycopg2 already supports these fields[1][2]
edit: the underlying connector might be a better idea to start with, as it'll need to expose the fields somehow before an ORM can take advantage of them.
Is that true? I might be wrong about this, but I think ActiveRecord checks for uniqueness (and other constraints) in application code, generating an extra database query if needed. If you _actually_ violate the unique index, a PGError is raised (if using postgres).
No. ActiveRecord checks for constraints on top of what the database offers. You are free to specify database constraints if you like. In older versions of ActiveRecord, uniqueness errors generate a generic database exception. In recent versions, uniqueness errors generate the specific exception RecordNotUnique.
Wow, that is awesome and is something I've been worrying about recently. However, there is a second part that is still missing that would allow for actual postgresql application development. There needs to be a way to check all constraints and report all failures, rather than just stopping at the first error. Without that, users submit data, get an error for one field, fix it and resubmit, get an error for the next field, etc, etc. Reporting all the errors the first time is really important for usability. So right now you have to duplicate all the database constraints in your app, which is a huge source of code duplication (and thus bugs).
Interesting point. Foreign key failures are usually the result of a program bug. Unique and exclusion[1] constraints depend on concurrent actions, and can't be caught ahead of time as easily. So I assume you are mostly talking about CHECK constraints.
Also, we're talking about a single statement, which probably means a single table.
It does sound like a good idea to validate all of the check constraints on a table at once, and report all of the failures.
Yes and no. Yes in that I am talking specifically about check constraints when I say that we have to duplicate them all in our apps. You can't really duplicate the others. But no in that I don't mean to validate only check constraints and report all failures, ideally it would do unique and FK too. If there's an underlying reason that isn't feasible, then doing all the check constraints at once would certainly be a big improvement though. We're already in the position of having data validation type checks done separately from uniques now, so that would get users of our apps the current behaviour while letting us delete a bunch of code.
FWIW, we try very hard to not write any check constraints and do any such validation at the protocol level, before the request even hits the authn/authz layer. (Spyne makes it easy ! http://spyne.io)
>FWIW, we try very hard to not write any check constraints
I like check constraints. I would never consider removing them under any circumstances, I've dealt with far too many databases full of inconsistent data. I'd just rather not have to duplicate them (or triplicate, etc) in each of my apps that accesses the database.
create table items (id uuid primary key, attributes json);
I'd have a webapp using javascript for adding/editing new items. If the user wants to add a new item, the client code would generate the uuid.
Editing and adding new rows would both use the same UPSERT/MERGE code -- no need for separate insert or update statements -- if the uuid already exists, it's assumed to be an update, if it doesn't, it's inserted as a new row.
I believe both of these still have race conditions, and are not the same as truly atomic upserts. Granted the window is considerably smaller, but their is still a race condition.
Upsert is already possible in a single statement, though a bit unwieldy.
create table people (uuid varchar, name varchar);
with new_rows as (select (unnest(ARRAY[('xxx','John Smith'), ('yyy', 'Ben Bitdiddle')]::people[])).*), updated as (UPDATE people SET name=new_rows.name FROM new_rows WHERE people.uuid=new_rows.uuid RETURNING people.uuid) insert into people select new_rows.* from new_rows left join updated using (uuid) where updated.uuid is null;
UPDATE Table SET ... WHERE id=$1
if (rows_updated == 0) {
INSERT INTO Table ... WHERE NOT EXISTS (SELECT 1 FROM Table WHERE id=$1)
if (rows_inserted == 0) {
throw "Conflict!"
}
}
It's a bit of a hack, but works well enough. I agree would be nice to have a more transparent way of doing it.
> susceptible to race conditions, which is why many of us want true UPSERT semantics
The snippet provided in the postgres doc is the current best, but it's still has a number of flaws:
> This coding assumes the unique_violation error is caused by the INSERT, and not by, say, an INSERT in a trigger function on the table. It might also misbehave if there is more than one unique index on the table, since it will retry the operation regardless of which index caused the error. More safety could be had by using the features discussed next to check that the trapped error was the one expected.
Last time I looked into this, a big thing stopping it from happening was that the SQL standard for this kind of sucks and there is no consensus for a better syntax among the databases that implement it. I would also love to see it, and I'd be amazed if it were as technically complex as, say, materialized views.
I am currently using PostgreSQL for all my work, but have always been missing a convenient way to do upserts. In some systems are large precentage of all write operations are upserts which makes coding those against PostgreSQL a hassle. I am not convinced though of the SQL standard MERGE syntax. It looks way too messy when used in simple cases.
Another feature I would wish to see is ORDER BY + skip for update (or any equivalent solution) to easily implement queues with multiple consumers.
EDIT: Just ORDER BY (without skipping locked entries) for UPDATE would be excellent too to avoid deadlocks when doing multi row updates.
I'd love to see on-disk bitmap indexes. I noticed there was an attempt to add these around 2009 but it was abandoned after being (apparently) nearly completed. [1,2] I'd like to see these because it would make Postgres better at data warehouse applications. I've been thinking about volunteering to add them myself, but I've never looked at the Postgres codebase before, so it seems a bit ambitious.
A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks. If you have two multi-row UPDATEs that hit the same rows but in different order, you've got a classic deadlock setup. Apparently you can sometimes control UPDATE ordering by persuading Postgres to use certain query plans if you really know what you're doing, but that seems like an undesirable requirement for most users.
To answer your bullet points, Postgres is my #1 database choice, but I've also used Oracle and MySQL many times and sometimes still do. Neither of these features are things that are preventing me from using Postgres, but I've sure wished they were there. In particular bitmap indexes seem useful to many people.
"on-disk bitmap indexes ... thinking about volunteering to add them myself"
That would be reasonable to do even if you don't know the internals very well. The index types (called "index access methods") are well-isolated (API-wise) from most of the rest of the code. Technically, you could even add a new index access method as an extension without even touching the core.
The exception to both of those things is the WAL recovery code, which can't be written as an extension and also requires more knowledge of the rest of the system.
But the WAL recovery code is not too bad for those that know it. So if you did the rest, then I wouldn't be surprised if someone jumped in to write it or help you write it.
"A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks."
One solution here is to do a "SELECT ... FOR UPDATE" first. Unfortunately, that is redundant (and does redundant work).
Thank you for the encouragement! Odds are I'll be doing a lot of data warehouse work in the next 6 months, so I'll be motivated to get involved.
EDIT: One thing that intimidated me was browsing through the mailing list thread in my [1] above and seeing people ask about lock correctness. That seems like something where you'd need a broad view of the system. I might give it a try anyway, though. One thing about Postgres is your team is incredibly helpful, professional, and kind. Tom Lane has given me all kinds of answers on the mailing list in the past. I have enough good will for the project that I'm pretty much just looking for an excuse to contribute something. It would be an honor to give something back, and I feel safe asking for help from you all.
I'm curious about what the performance ramifications of that would look like. I recently had to disable bitmap scans in a data warehouse-type-thing I run because filesystem fragmentation was apparently misleading Postgres into thinking it would be cheaper than an index scan. The bitmap scan plan takes ~15 minutes to run cold, and the index scan takes ~20 seconds cold. (Really odd thing is that index scan takes ~1.5 seconds warm, versus ~0.4 seconds warm with the bitmap scan).
We are currently using MS SQL Server, but have looked at the possibility of using Postgres in the future.
One major stopping point for us is the fact that on SQL Server we use a Case Insensitive, Accent Insensitive collation. Postgres doesn't support this, and apparently has no plans to.
We have many stored procedures in TSQL which are built around the assumption of CI,AI. Converting these will be a lot of work, but the lack of CI,AI collation would be by far the biggest pain point.
I use Postgres for pretty much every project possible, as well as using Oracle for consulting clients.
The main feature which I'd like the PG development team to look at next is implementing a more comprehensive partitioning solution.
As far as remaining competitive with the commercial database vendors, the one thing that is the primary reason I have seen for enterprises choosing to reluctantly pay for Oracle over using Postgres for free is RAC (Real Application Clusters). Active/active clustering of DB nodes seems to hit a sweet spot for businesses wanting uptime and scalability without some of the downsides of physical replication.
HTTP APIs like http://wiki.postgresql.org/wiki/HTTP_API. Using a javascript framework like AngularJs would permit me to get rid of middleware (Java, PHP, dotNET, etc.) almost completely. I need a relational db and I'm not a big fan of NoSQL, so MongoDB or others are not a viable option. It would also permit to use only javascript from the front end to the back end (https://code.google.com/p/plv8js/).
Oracle-like partitioning syntax support (including composite) would be great - although possible through inheritance and triggers/rules, it is the only bigger feature missing in PostgreSQL in my opinion. The current form requires a lot of overhead but is definitely worth the effort when working with 1B+ rows.
I use postgres for every hobby project I have, and oracle at work.
What I really miss in postgres is row level security like it's implemented in oracle.
To implement RLS in postgres I need to either install veil and go through the pain of configuring it or create a lot of views and work on them (but some scenarios can not be implemented like that).
What I would find helpful is some sort of operator to merge two JSON values. I use a JSON field to collect various statistics relating to a table's rows, and it'd be nice to not have to worry about overwriting old values. Something like:
No missing per-se, but I wish I could use postgres as a client db in iOS (like sqlite) so I can have a single codebase (like is possible with firebird) even if that mean only a subset (what have sqlite, plus psql store procedures!)
I'm missing ts_rank support in the GIN index (part of the GIN generalization proposal[1]). I had to give up ranking by significance in a full-text search because it was too slow.
I wish there were more built in replication options. Setting up Master-Master (data existing in multiple data centers) with hot slaves for example- I don't think you can do this with Postgres alone currently. Need to use something like Bucardo (a perl program) to make it happen.
I'd love to see plan locking. I've read many horror stories where Postgres just by itself took an application down because its query planner decided to execute a particular query using seqscans because some statistic somewhere went above a certain threshold.
Temporal tables conforming closely to SQL:2011.
The ability to use temporary tables created with ON COMMIT DROP in non-serializable prepared transactions (so probably only READ-COMMITTED transactions).
* we currently use postgresql, oracle, mysql, teradata, mssql
Have strong consistency by default with the option to define eventual consistency per query/session. It's so useful with Cassandra to have that flexibility to favour performance over accuracy.
Also OOTB should support EC2, ZeroConf/Bonjour and make it understand racks/data centres.
A well supported, polished multi-master implementation would definitely bring PostgreSQL into the 21st century. It's been far too long not to have this. And look to databases like Cassandra for ease of setup and maintenance.
You can use PostgreSQL together with ZFS and with lz4 compression enabled and maybe gzip will improve performance even more. http://www.citusdata.com/blog/64-zfs-compression
Btrfs also supports compression.
This solution doesn't look production solid for linux server.
I am not sure about lz4, but gzip in my cassandra experience is significantly reducing performance, Db needs spend too much CPU on compression. Snappy works just fine.
Postgres type system is so much better than any other RDBMS that it's a shame NoSQL guys didn't spend more time building on top of it. While their optimizer could use some work here and there it's improving all the time.
We are doing some pretty obscure stuff to it and it's serving us well. Oracle on the other hand requires so much attention and special handling with it's type system, that if there were an alternative, I wouldn't even bother with it.
I don't believe hstore supports nested structures, which would make it less than a drop in replacement for many NoSQL use cases. Native JSON support should fix that though, so good news everyone!
Postgres supports native nested structures. Well, as long as your structures are not circular.
JSON everywhere is really a bad choice for most data structures. It's like having no structure at all.
Well, I'm arguing that flexible schema is better than no schema. And in practice schemaless is only about schema defined outside of database. It's a solution to the wrong problem.
hstore is just based on a text format, so if it doesn't support it normally, you could just cast the results of each query to a new hstore, something like `select (data -> foo)::hstore -> bar`.
It's probably not a good idea though ;) You don't get any benefit of indices in that case, and I'm not sure what would happen with nulls.
It seems you could easily get a nosql postgres if you limit yourself to one table with two columns (id, json). Snark apart, it would work quite ok, no?
the json support is fairly recent and doesn't support operating directly on the data yet (e.g. you have to create functional indexes through plv8js and you can't directly query on the document's content as you can with array, xml or hstore columns)
Another intepretation of zapov's comment is that it's a shame the NoSQL projects didn't try achieving their goals by building postgres extensions (or even adding new features).
A reasonable question. MongoDB could have focused their energies on really good JSON support in postgres and making postgres sharding easier.
Postgresql has prepared queries - they give you the same type of optimization (skip the query parsing on each execution) with more flexibility and with SQL standard syntax.
Postgresql documentation is terrific, really [1]. Then you can also use pgsql wiki [2]. Other useful resources are the searchable mailing lists archive [3]. Have a look also to pgcon.org web site, it's fully packed of presentation material (mainly pdf slides) gathered since 2007, just to give you an example look at Tom Lane's "http://www.pgcon.org/2011/schedule/events/350.en.html [4]
Speaking of books a few of the best I'm aware of are:
- PostgreSQL 9 Admin Cookbook (Simon Riggs, Hannu Krosing)
- PostgreSQL 9.0 High Performance (Gregory Smith)
- Instant PostgreSQL Backup and Restore How-to (Shaun M. Thomas)
The docs really are great. I once downloaded the whole lot as a PDF and read it from the beginning on my phone during downtime (like 15 mins on the train). Read all the way through to the parts that discuss internals then stopped. I now appreciate it as a relational database, not just as a storage backend.
Great idea, you've inspired me to do the same just now. I've been using PostgreSQL since ~2000 and love it, but I'm still finding new little corners of the docs worth reading.
If anyone else wants to do this and has Calibre installed, the commandline ebook-convert utility does a decent job. Change to the postgresql docs directory and then run
(or substitute postgresql_docs.epub). There are a lot of tweaks that'd make it better, but it's readable and properly handles the code font, etc. If you want much more than that using the Calibre gui might be a good idea. Biggest issue I'm seeing is that "Up" links & such are broken, so I'm working on that now.
The postgresql.org docs are a great starting point: http://www.postgresql.org/docs/9.1/static/ for example. (They also serve as a good reference when you're forgetting some finer point of syntax or admin later... the manual's nicely "phased" in terms of introductory and reference content.)
I'm lucky enough to work at a company that tends to use Pg as our default RDBMS; speaking strictly as a dev it's a heck of a lot more ergonomic and sophisticated than some of the things I've dealt with.
As far as open source documentation goes, I consider Postgres to easily be the standard. The official docs are so good that a book writer would have to try hard just to not plagiarize them, let alone surpass their content.
There is a really old O'Reilly book, I think, but there is also an O'Reilly book called "PostgreSQL: Up and Running" that came out last year that covers 9.2.
I just finished with it. It's actually quite good to get you started with Postgres. Besides installing and command line usage it introduces many Postgres features and how they differ from MySQL, Oracle and the like.
Materialized Views! Well, sort of; it appears they don't automatically refresh when data has been updated in the underlying tables ala Oracle's implementation, but this gets you rather close.
FWIW, since Posgtres has hard version cycles (6 months I believe?) the point was to have a working base out and about, future versions will likely add auto-update:
> This is a minimal implementation, but should still be useful in many cases. Currently data is only populated "on demand" by the CREATE MATERIALIZED VIEW and REFRESH MATERIALIZED VIEW statements.
> It is expected that future releases will add incremental updates with various timings, and that a more refined concept of defining what is "fresh" data will be developed. At some point it may even be possible to have queries use a materialized in place of references to underlying tables, but that requires the other above-mentioned features to be working first.
Implementing efficient updates of materialized views is difficult. They have to support recursive queries, aggregation, and probably even recursive views that they list as new in "Additional Features."
Algorithms for updates of materialized views exist since at least 20 years but maybe they don't cover all these cases efficiently.
That is awesome. I think caching at the ends of your app is a lot more elegant and easier to maintain than littering your app with cache logic. I use Varnish for caching HTTP responses and have been waiting forever for more databases to support materialized views.
Eversince I heard Mike Krieger of Instagram Tech Talk at airbnb about how scaleable Postgres was for them, I've become very interested in Postgres. Probably will use it for my next startup.
They're a little less than what you might be expecting since they require you to manually signal when they should be updated, but otherwise they sound pretty great.
I was working professionally with Oracle and MSSQL databases (with way bigger focus on the latter). I did some freelance projects with MySQL databases, heck, I managed to hack some small things on Pervasive once or twice.
PostgreSQL was always an outsider which I had ignored, but in my recent position PostgreSQL is the main database engine for our whole stack. It took my about a month to get used to the differences in the syntax and PostgreSQL related stuff and I'm amazed about the features and quality of it, especially when you remember that this is a free and open source database engine.
On the other hand, if PostgreSQL devs would start a Kickstarter project for PGSQL specific version of Microsoft SQL Server Management Studio, my both hands (and feet) would pay. I had tried Database Visualiser (paid version), PGAdmin and few others, but none of those as is even close to the capabilities of Management Studio.
Does anybody maybe know any alternative or have any tips to share?
"I wonder if it would be possible to plug a python/node interpreter into that."
Sure, why not?
This is a great feature because it is a significant extensibility improvement. And that's what postgres is really about: the idea that any user/developer with an idea has many powerful APIs that can help them achieve that without waiting for anyone else to agree that the idea is good.
Previously these remained locked into the database and to get programmatically-useable info (for logging, better error messages or to translate in terms of e.g. your ORM) you had to parse error messages which were potentially localized and usually lacked half the necessary information.
And even better, for python developers, psycopg2 already supports these fields[1][2]
[0] http://www.postgresql.org/docs/devel/static/libpq-exec.html#...
[1] http://psycopg.lighthouseapp.com/projects/62710/tickets/149
[2] http://initd.org/psycopg/docs/extensions.html#psycopg2.exten...