Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, what's still missing in postgres?

To make the responses as constructive as possible, please also specify:

* Whether you currently use postgres, and what other systems you use (presumably ones that do have the feature that is missing in postgres)

* Whether you currently need the feature, or whether you anticipate the need in the future, or whether you expect that other people will need it

* Whether the missing feature is preventing you from using postgres, or just a significant pain point (which could include ease-of-use issues)



I could really use UPSERT.

I'm using postgresql right now.

My use case: I could have a table like:

    create table items (id uuid primary key, attributes json);
I'd have a webapp using javascript for adding/editing new items. If the user wants to add a new item, the client code would generate the uuid.

Editing and adding new rows would both use the same UPSERT/MERGE code -- no need for separate insert or update statements -- if the uuid already exists, it's assumed to be an update, if it doesn't, it's inserted as a new row.


Here's an UPSERT in postgresql: https://gist.github.com/paul/855efdecaaa2ec4deec7

While we're at it, here's a "find or create" in a single atomic statement: https://gist.github.com/paul/75ec84d131e36492b17b


I believe both of these still have race conditions, and are not the same as truly atomic upserts. Granted the window is considerably smaller, but their is still a race condition.


Upsert is already possible in a single statement, though a bit unwieldy.

    create table people (uuid varchar, name varchar);
    with new_rows as (select (unnest(ARRAY[('xxx','John Smith'), ('yyy', 'Ben Bitdiddle')]::people[])).*), updated as (UPDATE people SET name=new_rows.name FROM new_rows WHERE people.uuid=new_rows.uuid RETURNING people.uuid) insert into people select new_rows.* from new_rows left join updated using (uuid) where updated.uuid is null;


For this I currently use:

    UPDATE Table SET ... WHERE id=$1
    if (rows_updated == 0) {
        INSERT INTO Table ... WHERE NOT EXISTS (SELECT 1 FROM Table WHERE id=$1)
        if (rows_inserted == 0) {
            throw "Conflict!"
        }
    }
It's a bit of a hack, but works well enough. I agree would be nice to have a more transparent way of doing it.


This is very susceptible to race conditions, which is why many of us want true UPSERT semantics. http://www.postgresql.org/docs/current/static/plpgsql-contro... is the current best way to do UPSERT in PG.


So if there is a way to do UPSERT what is the problem? Are people just complaining about the lack of syntactic sugar?


> what is the problem?

Quoting ocharles:

> susceptible to race conditions, which is why many of us want true UPSERT semantics

The snippet provided in the postgres doc is the current best, but it's still has a number of flaws:

> This coding assumes the unique_violation error is caused by the INSERT, and not by, say, an INSERT in a trigger function on the table. It might also misbehave if there is more than one unique index on the table, since it will retry the operation regardless of which index caused the error. More safety could be had by using the features discussed next to check that the trapped error was the one expected.


Last time I looked into this, a big thing stopping it from happening was that the SQL standard for this kind of sucks and there is no consensus for a better syntax among the databases that implement it. I would also love to see it, and I'd be amazed if it were as technically complex as, say, materialized views.


I am currently using PostgreSQL for all my work, but have always been missing a convenient way to do upserts. In some systems are large precentage of all write operations are upserts which makes coding those against PostgreSQL a hassle. I am not convinced though of the SQL standard MERGE syntax. It looks way too messy when used in simple cases.

Another feature I would wish to see is ORDER BY + skip for update (or any equivalent solution) to easily implement queues with multiple consumers.

EDIT: Just ORDER BY (without skipping locked entries) for UPDATE would be excellent too to avoid deadlocks when doing multi row updates.


I'd love to see on-disk bitmap indexes. I noticed there was an attempt to add these around 2009 but it was abandoned after being (apparently) nearly completed. [1,2] I'd like to see these because it would make Postgres better at data warehouse applications. I've been thinking about volunteering to add them myself, but I've never looked at the Postgres codebase before, so it seems a bit ambitious.

A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks. If you have two multi-row UPDATEs that hit the same rows but in different order, you've got a classic deadlock setup. Apparently you can sometimes control UPDATE ordering by persuading Postgres to use certain query plans if you really know what you're doing, but that seems like an undesirable requirement for most users.

To answer your bullet points, Postgres is my #1 database choice, but I've also used Oracle and MySQL many times and sometimes still do. Neither of these features are things that are preventing me from using Postgres, but I've sure wished they were there. In particular bitmap indexes seem useful to many people.

[1] http://www.postgresql.org/message-id/20090111104442.GA3503@e... [2] http://www.postgresql.org/message-id/20081101000154.GO27872@...


"on-disk bitmap indexes ... thinking about volunteering to add them myself"

That would be reasonable to do even if you don't know the internals very well. The index types (called "index access methods") are well-isolated (API-wise) from most of the rest of the code. Technically, you could even add a new index access method as an extension without even touching the core.

The exception to both of those things is the WAL recovery code, which can't be written as an extension and also requires more knowledge of the rest of the system.

But the WAL recovery code is not too bad for those that know it. So if you did the rest, then I wouldn't be surprised if someone jumped in to write it or help you write it.

"A minor feature I'd like to see is to allow ORDER BY on UPDATE statements to avoid deadlocks."

One solution here is to do a "SELECT ... FOR UPDATE" first. Unfortunately, that is redundant (and does redundant work).


Thank you for the encouragement! Odds are I'll be doing a lot of data warehouse work in the next 6 months, so I'll be motivated to get involved.

EDIT: One thing that intimidated me was browsing through the mailing list thread in my [1] above and seeing people ask about lock correctness. That seems like something where you'd need a broad view of the system. I might give it a try anyway, though. One thing about Postgres is your team is incredibly helpful, professional, and kind. Tom Lane has given me all kinds of answers on the mailing list in the past. I have enough good will for the project that I'm pretty much just looking for an excuse to contribute something. It would be an honor to give something back, and I feel safe asking for help from you all.


> I'd love to see on-disk bitmap indexes.

I'm curious about what the performance ramifications of that would look like. I recently had to disable bitmap scans in a data warehouse-type-thing I run because filesystem fragmentation was apparently misleading Postgres into thinking it would be cheaper than an index scan. The bitmap scan plan takes ~15 minutes to run cold, and the index scan takes ~20 seconds cold. (Really odd thing is that index scan takes ~1.5 seconds warm, versus ~0.4 seconds warm with the bitmap scan).


We are currently using MS SQL Server, but have looked at the possibility of using Postgres in the future.

One major stopping point for us is the fact that on SQL Server we use a Case Insensitive, Accent Insensitive collation. Postgres doesn't support this, and apparently has no plans to.

We have many stored procedures in TSQL which are built around the assumption of CI,AI. Converting these will be a lot of work, but the lack of CI,AI collation would be by far the biggest pain point.



I use Postgres for pretty much every project possible, as well as using Oracle for consulting clients.

The main feature which I'd like the PG development team to look at next is implementing a more comprehensive partitioning solution.

As far as remaining competitive with the commercial database vendors, the one thing that is the primary reason I have seen for enterprises choosing to reluctantly pay for Oracle over using Postgres for free is RAC (Real Application Clusters). Active/active clustering of DB nodes seems to hit a sweet spot for businesses wanting uptime and scalability without some of the downsides of physical replication.


"Active/active clustering of DB nodes"

This is being developed as Postgres-XC. They released version 1.0 in June 2012.

"Postgres-XC is an open source project to provide a write-scalable, synchronous multi-master, transparent PostgreSQL cluster solution. "

http://postgres-xc.sourceforge.net/


HTTP APIs like http://wiki.postgresql.org/wiki/HTTP_API. Using a javascript framework like AngularJs would permit me to get rid of middleware (Java, PHP, dotNET, etc.) almost completely. I need a relational db and I'm not a big fan of NoSQL, so MongoDB or others are not a viable option. It would also permit to use only javascript from the front end to the back end (https://code.google.com/p/plv8js/).


Nginx + ngx_postgres + ngx_rds_json (or OpenResty bundle) is perfect for the task. See sample #5 at https://github.com/FRiCKLE/ngx_postgres

And it's fast.



[Current PostgreSQL user]

Oracle-like partitioning syntax support (including composite) would be great - although possible through inheritance and triggers/rules, it is the only bigger feature missing in PostgreSQL in my opinion. The current form requires a lot of overhead but is definitely worth the effort when working with 1B+ rows.


I use postgres for every hobby project I have, and oracle at work.

What I really miss in postgres is row level security like it's implemented in oracle.

To implement RLS in postgres I need to either install veil and go through the pain of configuring it or create a lot of views and work on them (but some scenarios can not be implemented like that).



Have you looked at the SE-Postgres work? Probably not exactly what you need, but there might be something there.

http://www.postgresql.org/docs/devel/static/sepgsql.html


What I would find helpful is some sort of operator to merge two JSON values. I use a JSON field to collect various statistics relating to a table's rows, and it'd be nice to not have to worry about overwriting old values. Something like:

json_merge('{"a":5,"b":7}', '{"b":6,"c":8}') -> '{"a":5,"b":6,"c":8}'


No missing per-se, but I wish I could use postgres as a client db in iOS (like sqlite) so I can have a single codebase (like is possible with firebird) even if that mean only a subset (what have sqlite, plus psql store procedures!)


I've been using postgres for a few years now, I also use MSSQL, MonetDB and MySQL.

Some of the things I miss most frequently are:

- Easy table partitioning.

- Support for "phrase search" in fts.

- MERGE/UPSERT support.

- Columnar storage (and related optimizations) for "OLAP like" workloads.

- EXPLAIN ANALYSE that traces functions.


We migrated recently from M$ $QL to postgres and are VERY happy with this move.


> M$ $QL

Really?

Heh, hopefully you're consistent and have similar antics for the likes of Apple, Google, et al


I'm missing ts_rank support in the GIN index (part of the GIN generalization proposal[1]). I had to give up ranking by significance in a full-text search because it was too slow.

[1] https://wiki.postgresql.org/wiki/GIN_generalization


Long time PostgreSQL user. Would like to use table inheritance in a couple places, but don't because of these caveats: http://www.postgresql.org/docs/9.2/static/ddl-inherit.html#D...


I wish there were more built in replication options. Setting up Master-Master (data existing in multiple data centers) with hot slaves for example- I don't think you can do this with Postgres alone currently. Need to use something like Bucardo (a perl program) to make it happen.


I'd love to see plan locking. I've read many horror stories where Postgres just by itself took an application down because its query planner decided to execute a particular query using seqscans because some statistic somewhere went above a certain threshold.


Temporal tables conforming closely to SQL:2011. The ability to use temporary tables created with ON COMMIT DROP in non-serializable prepared transactions (so probably only READ-COMMITTED transactions).

* we currently use postgresql, oracle, mysql, teradata, mssql


Multi-master in the core distribution.


What system do you currently use? Which databases got this right for your use cases, and which ones got it wrong and why?

There's active development on this feature, so now's the time to influence it to better suit your needs. A separate blog post might be a better forum.


Cassandra would be the one to look at.

Have strong consistency by default with the option to define eventual consistency per query/session. It's so useful with Cassandra to have that flexibility to favour performance over accuracy.

Also OOTB should support EC2, ZeroConf/Bonjour and make it understand racks/data centres.


Yes. Yes. Yes.

A well supported, polished multi-master implementation would definitely bring PostgreSQL into the 21st century. It's been far too long not to have this. And look to databases like Cassandra for ease of setup and maintenance.


Postgres-XC?


page level(not field level) snappy compression


You can use PostgreSQL together with ZFS and with lz4 compression enabled and maybe gzip will improve performance even more. http://www.citusdata.com/blog/64-zfs-compression Btrfs also supports compression.


This solution doesn't look production solid for linux server. I am not sure about lz4, but gzip in my cassandra experience is significantly reducing performance, Db needs spend too much CPU on compression. Snappy works just fine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: