10 000 concurrent real-time connections to Django

bayesianhorse · on March 27, 2013

I dabbled in realtime connections for a while now, and this is certainly a sexy option to avoid the fragmentation happening in the Python async world.

hoov · on March 27, 2013

Well, I think that was Guido's goal in creating Tulip.

kapilvt · on March 26, 2013

why do people persist in mangling django into doing websocket persitent concurrency when there are better py or otherwise well suited tools to such tasks...

LeafStorm · on March 26, 2013

Because it's a popular, well-written framework with a huge volume of existing code, and integrating WebSockets into Django means you don't have to maintain two codebases if you want to have a realtime Django app.

Also, it's fun.

mYk · on March 26, 2013

Indeed, these are the exact reasons!

kapilvt · on March 26, 2013

how much of that existing code has any application in a websocket context.. if it uses the orm layer throw it out.. if it uses the template layer.. optionally throw it out for efficiency (jinja and friends).. what are you really left with... routing.. trivial.. a familiar api with the guts out.. overrated and a hobgoblin of consistency.

mYk · on March 26, 2013

This system doesn't replace Django; it complements it.

You could build 95% of an application with the traditional request-response model and add the 5% of real-time featurs with a system similar to my demo.

(Of course, given your opinions on Django, I don't recommend you build anything with it.)

eyepulp · on March 26, 2013

Well said =) It's incredibly frustrating to so easily build the entire app with a "traditional" django stack and then be faced with solving for a whole new stack just to avoid wasteful XHR-polling for some simple server-side event driven UI updates.

We've implemented some SSE based solutions lately with gevent & nginx in front of django, and it's been great to keep it all in the django family.

Maybe switching to py3k has some value after all... =)

notdonspaulding · on March 27, 2013

I concur.

jvdongen · on March 27, 2013

In addition to templating, orm and routing there is also highly integrated i18n toolchain/workflow, built-in csrf protection, good session handling, well integrated caching and many more of such 'details'.

All those things by themselves could be considered trivial and could be gotten from many individual libraries - the level of integration and polish you encounter with Django is anything but trivial though and takes real time and effort.

I've recently had to make a choice for a Python based application platform/environment and have chosen Django (again) despite having no use for the ORM and ORM-using contrib modules. Simply because all the other things are there and work together beautifully without me spending any time on code that is not directly related to my goals.

bayesianhorse · on March 27, 2013

In my opinion there is nothing wrong with doing simple Django ORM calls even with a tornado/gevent websocket. Async purists will cry foul, but if all you want is a realtime feature, you can block for a few milliseconds without getting into trouble.

In the long run they might build tulip-awareness into the ORM.

For that matter I start Tornado servers with management commands just for convenience...

twism · on March 27, 2013

Or you could do this ... https://gist.github.com/anonymous/5190528

This blocks in a separate thread leaving the event loop to handle other tasks.

etchalon · on March 27, 2013

Trivial or not, working within an existing library provides a level of maintainability you just don't get when you toss in something like Tornado to do the tiny bit of your app that requires a persistent connection and websockets. Even if all I got was django's router and views, it's save me the awful headache of having to graft Tornado and Django together, coding under two entirely different library sets.

_fs · on March 27, 2013

What would you recommend as the best tools for accomplishing this?

est · on March 27, 2013

For Python2.X, I think gevent.wsgi.WSGIServer could handle C10k or more, no?

hoov · on March 28, 2013

In my experience -- yes. I've pushed it further than that.

In practice, debugging is an unpleasant experience. If you're doing anything slightly out of the ordinary, you should take care.

fafhrd91 · on March 27, 2013

Potentially it is slower than tulip, It does magic stack slicing. also it works only on X86 cpu.

lucian1900 · on March 27, 2013

Or just Twisted.

eliben · on March 26, 2013

Very cool, well done.

stefantalpalaru · on March 26, 2013

The C10K problem[1] is about real clients accessing real web servers, not about writing short strings over WebSocket.

I'll associate "C10K" and "Django" when I'll see a dynamic web page being served using the template system, the standard middleware and the ORM (with the default behavior of creating and destroying a database connection for each request).

[1] http://www.kegel.com/c10k.html

mYk · on March 26, 2013

The C10k problem was originally about serving static files -- quoting the page you linked to: "take four kilobytes from the disk and send them to the network".

The demo could certainly be extended to send larger amounts of data -- left as an exercise for the reader.

I'm not sure to understand your second paragraph -- if I used this system in a real application, I would serve the pages with the traditional handler (with template rendering, middleware, etc.) and then exchange messages over the websocket. These are different roles.

Regarding database connections, the default behavior isn't the one you're describing any more; I implemented persistent connections in Django a few weeks ago.

stavros · on March 27, 2013

By "implemented", do you mean "contributed to trunk", or "used it in my application"? If the former, thanks, if the latter, I was under the impression that it was on by default in 1.5?

stefantalpalaru · on March 27, 2013

Looks like it will be available in the next (1.6?) release[1].

[1] https://docs.djangoproject.com/en/dev/ref/databases/#persist...

stavros · on March 27, 2013

Oh right, 1.5 and earlier. I misread that, thank you.

stefantalpalaru · on March 26, 2013

Static files are no longer a problem thanks to Varnish or nginx. And this is Django, after all, so please focus on dynamic content performance.

I'm glad to hear that Django managed to get persistent db connections after years of "just use PgPool/PgBouncer". Keep up the good work!

mYk · on March 26, 2013

Yes, I'm just referring to C10k because this demo uses a technique originally created to solve C10k.

Reaching 10 000 connections wasn't difficult in this case; it was just a matter of tuning a few system parameters. Exploring the APIs and studying how they can fit together was much more interesting, and sometimes challenging.

calvinx · on March 27, 2013

Awesome stuff!

raverbashing · on March 27, 2013

Well, about PgBouncer, I believe this tells more of PostgreSQL than of Django + psql libraries

(After all it will still open and close connections for Django but keeps them open for psql)

Now, for serving 10k connections with template libraries and middlewares and ORM, out of the box? Serving dynamic content for each connection? Impossible =)

Not without some kind of caching (but you can do that with the help of a middleware)

stefantalpalaru · on March 27, 2013

It's not PostgreSQL's fault that Django closes the connection after each request explicitly (see close_connection() for what was done before 1.6):

[1] https://github.com/django/django/blob/master/django/db/__ini...

raverbashing · on March 27, 2013

I understand that, and I'm not saying that there aren't several issues in Django (but some are getting better)

What I'm saying is that with PgBouncer you have Django <> PgBouncer <> Postgresql, right?

So what PgBouncer is doing (dealing with several open/close connections) maybe could be done better inside PostgreSQL

vampirechicken · on March 26, 2013

Why would you insist on creating and destroying a db connection on each request? We stopped doing that decades ago.

jaytaylor · on March 27, 2013

Django hasn't stopped insisting on that... v1.5 might have pooling, but it's taken them a long time to get connection pooling in.

obviouslygreen · on March 27, 2013

Forgive my ignorance, but isn't this the point of something like pgpool? I didn't think connecting to a connection pool would be as expensive as connecting to the database itself (if it is, then my next question would be: What's the point of a connection pool? Because clearly I've missed something one way or the other).

jaytaylor · on March 27, 2013

Yes, but pgpool isn't always an option. If you use Heroku's Postgres service, you do not have sufficient privileges to run the commands to attach to software like pgpool.

The only way we were able to get connection pooling working in django 1.3/1.4 was to use django-dbpool[0]. It works okay, but is still pretty sketchy compared to the connection pooling libraries available for the JVM like BoneCP or C3P0.

[0] https://github.com/gmcguire/django-db-pool

obviouslygreen · on March 27, 2013

Understood, thank you for the clarification.

talmirza · on March 27, 2013

You can still us pg_pool with heroku.

stefantalpalaru · on March 27, 2013

Yes, connecting repeatedly to pgpool is less expensive than doing it directly to postgres. Not having to connect repeatedly to anything is even better.

taude · on March 27, 2013

Wow. I'd just have assumed that connection pooling was default in all these modern web frameworks. I guess this is one area where Enterprise means you get what you pay for? Example: ODBC connection pooling with ASP pages back in the late '90s.

the_mitsuhiko · on March 27, 2013

Connection pooling in postgres can actually be counter productive with high concurrency. Postgres scales very badly to high number of connections.