This is somewhat suspect. At my place of work, we operate a rather large Python ...

lddemi · on Aug 16, 2021

glad you are seeing such awesome performance with gevent+envoy! which part of our experience do you think is suspect?

jhgg · on Aug 16, 2021

So, in guincorn default mode (sync), the mode I'm assuming you're using. This means you really have 1 process handling 1 request at a time. The "thundering herd" problem really only applies to connection acceptance. Which is to say, that in the process of accepting a connection, it is possible to wake all idle processes that are waiting for a connection comes in (they will wake and hit EAGAIN and then go back to waiting.) Busy processes that are servicing requests (not waiting on the accept call) will not be woken, since they aren't waiting on a new request to come in. The "thundering herd" problem as I understand it, can indeed waste CPU cycles, but only on processes that aren't doing much anyways. I do however believe that `accept()` calls have been synchronized between processes on Linux for a while now to prevent spurious wakeups. You should verify you're actually doing spurious wakeups by using `strace` and seeing if you are seeing a bunch of `accept()` calls returning EAGAIN.

In gunicorn, `sync` mode does exhibit a rather pathological connection churn, because it does not support keep-alive. Generally, most load balancing layers already will do connection pooling to the upstream, meaning, your gunicorn processes won't really be accepting much connections after they've "warmed up". This doesn't apply in sync mode unfortunately :(. Connection churn can waste CPU.

Another thing to also note is that if you have 150 worker processes, but your load balancer only allows 50 connections per upstream, chances are 100 of your processes will be sitting there idle.

Something just doesn't feel quite right here.

EDIT: I do see mention of `gthread` worker - so you might be already able to support http-keepalives. If this is the case, then you should really have no big thundering herd problem after the LB establishes connections to all the workers.

mac-chaffee · on Aug 16, 2021

Could the discrepancy be explained by the type of responses?

Sounds like an app like clubhouse might have lots of small, fast responses (like direct messaging), where very little of the response time is spent in application code. Does your API happen to do a lot of CPU-intensive stuff in application code?

jhgg · on Aug 16, 2021

Our app is also a messaging app. So lots of small & fast responses.