It's not clear what you're suggesting as an alternative. My understanding is that you're suggesting thread-per-request, which has many known flaws. There are three approaches to serving requests:
1. Thread-per-request. This is a simple model. You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests. Thread-per-request has several sources of overhead, which is why people recommend against it: thread limits, per-thread stack memory usage, and context switching.
2. Coroutine style handling with cooperative scheduling at synchronization points (locks, I/O). This is how Go handles requests.
3. Asynchronous request handling. You still have a fixed-size thread pool handling requests, but you no longer limit the number of simultaneous requests with the size of that thread pool. There are several different styles of async request handling: callbacks, async/await, and futures.
#2 and #3 are more common these days because they don't suffer from the many drawbacks of the thread-per-request model, although both suffer from some understandability issues.
Those options aren't as distinct as you might imagine. Would calling it fiber-per-request make you happy?
(By the way: most of the time, a plain-old-boring thread-per-request is just fine, because most of the time, you're not writing high-scale software. If you have at most two dozen concurrent tasks, you're wasting your time worrying about the overhead of plain old pthread_t.)
I'm using a much more expansive definition of "thread" than you are. Sure, in the right situation, maybe M:N threading, or full green threads, or whatever is the right implementation strategy. There's no reason that green threading has to involve the use of explicit "async" and "await" keywords, and it's these keywords that I consider silly.
(I agree that thread-per-request works just fine in the majority of cases, but it's still worthwhile to write about the cases where it doesn't work.)
Responding to your original post: you argue that async/await intends to solve the problem of data races. That's not why people use it, nor does it tackle that problem at all (you still need locks around shared data).
It only tries to solve the issue of highly-concurrent servers, where requests are bound by some resource that a request-handling threads have to wait for the result of (typically I/O).
Coroutines/fibers are not an alternative to async servers, because they need primitives that are either baked into the language or the OS itself to work well.
Coroutines/fibers are completely orthogonal to async anything. The OP is arguing against poor-man coroutines, aka stackless coroutines aka top-level yield only, which are significantly less expressive and composable than proper stackfull coroutines (i.e. first class one shot continuations).
An alleged benefit of stackless coroutines is that yield point are explicit, so you know when your state can change. The OP is arguing that this is not really a benefit because it yield to fragile code. I happen to strongly agree.
Green threads / coroutines / fibers are isomorphic with async keyword transparently implemented as a continuation passing style transform, which is how async callbacks usually work. Actual CPU-style stacks in a green thread scenario are nested closure activation records in an explicit continuation passing style scenario, and are implicit closure activation records (but look like stacks) when using an 'async' compiler-implemented CPS.
Properly composed awaits (where each function entered is entered via an await) build a linked list of activation records in the continuations as they drill down. This linked list is the same as the stack (i.e. serves the same purpose and contains the same data in slightly different layout) in a green threads scenario.
What makes all these things different is how much they expose the underlying mechanics, and the metaphors they use in that exposition. But they're not orthogonal.
(If you meant 'async' as in async IO explicitly, rather than the async / await keyword with CPS transform as implemented in C#, Python, Javascript, etc., then apologies.)
As you said, you can of course recover stackful behaviour by using yield/await/async/wathever at every level of the call stack, but in addition to being a performance pitfall (you are in practice heap allocating each frame separately and yield is now O(N): your iterpreter/compiler/jit will need to work hard to remove the abstraction overhead), it leads to the green/red function problem.
Please correct me if I'm wrong, but doesn't asyncio in the form of async/await (or any other ways to explicitly denote context switches) solve the problem of data races in that per-thread data structures can be operated on atomically by different coroutines? My understanding is that unless data structures are shared with another thread, you don't usually need locks for shared data.
async and threads are fundamentally different mechanisms. green threads (async) are scheduled by the runtime, threads are scheduled by the OS.
In CPython threads can be (theoretically) switched at every bytecode instruction. Since calls into extensions / the interpreter are a single instructions, many data structure updates (like dict[a] = b, list.append) will appear atomic from Python.
That being said it is rather rare to have multiple threads run an event loop and process requests in Python. If threads and async are combined in the same process in Python, then it's usually only one event loop thread, and a thread pool for background activity. Usually these will be synchronized through async (eg. tornado.concurrent.run_on_executor) -- but that has nothing to do with context switches.
Edit: Reread your post. I may have slightly missed the point :)
Yes. Often one will find/design that there is no shared state, or that shared state is modified completely between yield points, so no locks between coroutines needed.
In Java, NIO is often slower than threads. And am saying this as somebody who used >60k threads on 256 core machines vs NIO on the same for highly available transacted system.
Why are they silly though? Don't you need them to specify that an operation is in fact async and handle it accordingly? Or in your solution (something about locking?) is that not necessary because everything is safe? How does that work out though? How do I say "No computer, wait for this before we do that"?
Don't be focused on "requests". Requests (where most people mean HTTP requests) are one layer where you need concurrency, but in principal you need it on multiple layers.
E.g. at first you have a server that accepts multiple connections and each must be handled -> Thread per connection or one thread for all connections? If you go for threads you might even need multiples, e.g. a reader thread, a writer thread which processes a write queue and a third one which maintains the state for the connection and coordinates reads and writes.
Then on a higher layer you might have multiple streams per connection (e.g. in HTTP/2), where you again have to decide how these should be represented.
Depending on the protocol and application there might be even more or other layers that need concurrency and synchronization.
But the general approaches that you mention do still apply here: You can either use a thread for each concurrent entity and using blocking operations. Or you can multiplex multiple concurrent entities on a single thread with async operations and callbacks. Coroutines are a mix which provide an API like the first approach with an implementation that looks more like the second approach.
> You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests.
If your application acts as a stateless proxy between client machines and your persistence layer, can't you just spin up another instance and load balance them at any time? It's not the most efficient solution at scale, but lots of people use this strategy.
1. Thread-per-request. This is a simple model. You have a fixed-size thread pool of size N, and once you hit that limit, you can't serve anymore requests. Thread-per-request has several sources of overhead, which is why people recommend against it: thread limits, per-thread stack memory usage, and context switching.
2. Coroutine style handling with cooperative scheduling at synchronization points (locks, I/O). This is how Go handles requests.
3. Asynchronous request handling. You still have a fixed-size thread pool handling requests, but you no longer limit the number of simultaneous requests with the size of that thread pool. There are several different styles of async request handling: callbacks, async/await, and futures.
#2 and #3 are more common these days because they don't suffer from the many drawbacks of the thread-per-request model, although both suffer from some understandability issues.