Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
10 000 concurrent real-time connections to Django (github.com/aaugustin)
186 points by mYk on March 26, 2013 | hide | past | favorite | 39 comments


I dabbled in realtime connections for a while now, and this is certainly a sexy option to avoid the fragmentation happening in the Python async world.


Well, I think that was Guido's goal in creating Tulip.


why do people persist in mangling django into doing websocket persitent concurrency when there are better py or otherwise well suited tools to such tasks...


Because it's a popular, well-written framework with a huge volume of existing code, and integrating WebSockets into Django means you don't have to maintain two codebases if you want to have a realtime Django app.

Also, it's fun.


Indeed, these are the exact reasons!


how much of that existing code has any application in a websocket context.. if it uses the orm layer throw it out.. if it uses the template layer.. optionally throw it out for efficiency (jinja and friends).. what are you really left with... routing.. trivial.. a familiar api with the guts out.. overrated and a hobgoblin of consistency.


This system doesn't replace Django; it complements it.

You could build 95% of an application with the traditional request-response model and add the 5% of real-time featurs with a system similar to my demo.

(Of course, given your opinions on Django, I don't recommend you build anything with it.)


Well said =) It's incredibly frustrating to so easily build the entire app with a "traditional" django stack and then be faced with solving for a whole new stack just to avoid wasteful XHR-polling for some simple server-side event driven UI updates.

We've implemented some SSE based solutions lately with gevent & nginx in front of django, and it's been great to keep it all in the django family.

Maybe switching to py3k has some value after all... =)


I concur.


In addition to templating, orm and routing there is also highly integrated i18n toolchain/workflow, built-in csrf protection, good session handling, well integrated caching and many more of such 'details'.

All those things by themselves could be considered trivial and could be gotten from many individual libraries - the level of integration and polish you encounter with Django is anything but trivial though and takes real time and effort.

I've recently had to make a choice for a Python based application platform/environment and have chosen Django (again) despite having no use for the ORM and ORM-using contrib modules. Simply because all the other things are there and work together beautifully without me spending any time on code that is not directly related to my goals.


In my opinion there is nothing wrong with doing simple Django ORM calls even with a tornado/gevent websocket. Async purists will cry foul, but if all you want is a realtime feature, you can block for a few milliseconds without getting into trouble.

In the long run they might build tulip-awareness into the ORM.

For that matter I start Tornado servers with management commands just for convenience...


Or you could do this ... https://gist.github.com/anonymous/5190528

This blocks in a separate thread leaving the event loop to handle other tasks.


Trivial or not, working within an existing library provides a level of maintainability you just don't get when you toss in something like Tornado to do the tiny bit of your app that requires a persistent connection and websockets. Even if all I got was django's router and views, it's save me the awful headache of having to graft Tornado and Django together, coding under two entirely different library sets.


What would you recommend as the best tools for accomplishing this?


For Python2.X, I think gevent.wsgi.WSGIServer could handle C10k or more, no?


In my experience -- yes. I've pushed it further than that.

In practice, debugging is an unpleasant experience. If you're doing anything slightly out of the ordinary, you should take care.


Potentially it is slower than tulip, It does magic stack slicing. also it works only on X86 cpu.


Or just Twisted.


Very cool, well done.


The C10K problem[1] is about real clients accessing real web servers, not about writing short strings over WebSocket.

I'll associate "C10K" and "Django" when I'll see a dynamic web page being served using the template system, the standard middleware and the ORM (with the default behavior of creating and destroying a database connection for each request).

[1] http://www.kegel.com/c10k.html


The C10k problem was originally about serving static files -- quoting the page you linked to: "take four kilobytes from the disk and send them to the network".

The demo could certainly be extended to send larger amounts of data -- left as an exercise for the reader.

I'm not sure to understand your second paragraph -- if I used this system in a real application, I would serve the pages with the traditional handler (with template rendering, middleware, etc.) and then exchange messages over the websocket. These are different roles.

Regarding database connections, the default behavior isn't the one you're describing any more; I implemented persistent connections in Django a few weeks ago.


By "implemented", do you mean "contributed to trunk", or "used it in my application"? If the former, thanks, if the latter, I was under the impression that it was on by default in 1.5?


Looks like it will be available in the next (1.6?) release[1].

[1] https://docs.djangoproject.com/en/dev/ref/databases/#persist...


Oh right, 1.5 and earlier. I misread that, thank you.


Static files are no longer a problem thanks to Varnish or nginx. And this is Django, after all, so please focus on dynamic content performance.

I'm glad to hear that Django managed to get persistent db connections after years of "just use PgPool/PgBouncer". Keep up the good work!


Yes, I'm just referring to C10k because this demo uses a technique originally created to solve C10k.

Reaching 10 000 connections wasn't difficult in this case; it was just a matter of tuning a few system parameters. Exploring the APIs and studying how they can fit together was much more interesting, and sometimes challenging.


Awesome stuff!


Well, about PgBouncer, I believe this tells more of PostgreSQL than of Django + psql libraries

(After all it will still open and close connections for Django but keeps them open for psql)

Now, for serving 10k connections with template libraries and middlewares and ORM, out of the box? Serving dynamic content for each connection? Impossible =)

Not without some kind of caching (but you can do that with the help of a middleware)


It's not PostgreSQL's fault that Django closes the connection after each request explicitly (see close_connection() for what was done before 1.6):

[1] https://github.com/django/django/blob/master/django/db/__ini...


I understand that, and I'm not saying that there aren't several issues in Django (but some are getting better)

What I'm saying is that with PgBouncer you have Django <> PgBouncer <> Postgresql, right?

So what PgBouncer is doing (dealing with several open/close connections) maybe could be done better inside PostgreSQL


Why would you insist on creating and destroying a db connection on each request? We stopped doing that decades ago.


Django hasn't stopped insisting on that... v1.5 might have pooling, but it's taken them a long time to get connection pooling in.


Forgive my ignorance, but isn't this the point of something like pgpool? I didn't think connecting to a connection pool would be as expensive as connecting to the database itself (if it is, then my next question would be: What's the point of a connection pool? Because clearly I've missed something one way or the other).


Yes, but pgpool isn't always an option. If you use Heroku's Postgres service, you do not have sufficient privileges to run the commands to attach to software like pgpool.

The only way we were able to get connection pooling working in django 1.3/1.4 was to use django-dbpool[0]. It works okay, but is still pretty sketchy compared to the connection pooling libraries available for the JVM like BoneCP or C3P0.

[0] https://github.com/gmcguire/django-db-pool


Understood, thank you for the clarification.


You can still us pg_pool with heroku.


Yes, connecting repeatedly to pgpool is less expensive than doing it directly to postgres. Not having to connect repeatedly to anything is even better.


Wow. I'd just have assumed that connection pooling was default in all these modern web frameworks. I guess this is one area where Enterprise means you get what you pay for? Example: ODBC connection pooling with ASP pages back in the late '90s.


Connection pooling in postgres can actually be counter productive with high concurrency. Postgres scales very badly to high number of connections.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: