Tuesday, July 10, 2012

Async/await vs stack-switching

(In response to)

Before we can talk about stack switching being "better" or "worse" we have to have some idea of how the two approaches correspond to each other, and a sense of whether they are equivalent or if one is more "powerful". The four cases you've identified are (1) Fire-and-forget, (2) Await immediately, (3) WhenAll, and (4) the ArchiveDocuments function.

To be honest this it a bit difficult for me because I've never used async/await OR stack switching. But I'll give it a try.

Let us assume that the stack-switching system has some "cooperative scheduler" that is in charge of switching between stacks and deciding who runs next. Each thread would have its own stack scheduler. Maybe it shouldn't be called a scheduler at all because it is not preemptive. I will call it a scheduler anyway. Presumably we can invent mechanisms by which different threads can communicate as necessary, but for simplicity I'm going to imagine only a single thread plus a mechanism for receiving asyncronous events such as "file read complete", "received UDP packet" or "timeout expired".

So here are the equivalents in a stack-switching system:
  1. Fire-and-forget: You just want a task to run and don't care when it completes. With a stack-switching mechanism, the task would be created on a new stack and run (synchronously) until it hit an async operation. Then the scheduler mechanism (based e.g. on a Windows Message Queue) saves the stack, switches back to the calling stack, and later, when the async operation completes, enqueues a message that will invoke the new stack to handle the event. This process repeats until the task is done.

  2. Await immediately. In this case the code using await (let's call it B) must be marked async, and in a stack-switching world this means that B must be able to run in its own stack so that it can be paused, and it could be marked with an attribute to make this happen, or (in a language that doesn't support stack-switching directly) could be called with the help of some special wrapper function that creates the stack. B got the task from some other method (let's call it C) that ALSO has its own stack, and so it initially appears the stack-switching approach would suffer from a proliferation of stacks. However, since you are awaiting immediately there's really no need for two new stacks, so some way is needed when calling C to say "don't create a new stack, just use mine". That should be easy, it just means we call C directly. Then B, not C, gets its own stack. Now, "await C()" is implemented by yielding the current stack (asking the scheduler to switch to some other stack - I am not sure about the fine details yet) and typically the scheduler would return to A, the caller of B. At this point we must shift our focus to A to figure out what happens next.

  3. WhenAll. I doubt WhenAll is itself implemented with async/await, since you can await only one thing at a time and WhenAll wants to wait for several things. So WhenAll doesn't create a stack, but it subscribes to a "stack destroyed" event on each stack, and creates some kind of Task object that manages all this. In each "stack destroyed" event, the WhenAll task checks whether everything is done and if so, asks the scheduler to switch to B (who used "await WhenAll"). The "await WhenAll" command itself corresponds to signing up to be notified of the completion of the Task, and yielding the stack. I expect the user would initiate this with single command, so the code should still be easy to follow. Some of what I said in (2) applies to (3) because you are "awaiting immediately" on WhenAll.

  4. This beast (where have I seen this before?), is a good example the power of async+await:
    async void ArchiveDocuments(List<Url> urls)
      Task archive = null;
      for(int i = 0; i < urls.Count; ++i)
        var document = await FetchAsync(urls[i]);
        if (archive != null)
          await archive;
        archive = ArchiveAsync(document);
    So let's figure out the stack-based equivalent. Since ArchiveDocuments is async, it typically runs on its own stack. This method makes a direct (non-async) call to FetchAsync because it wants to block until FetchAsync is complete; however, it creates a new stack to call ArchiveAsync. It invokes ArchiveAsync immediately, but the scheduler returns to ArchiveDocuments as soon as ArchiveAsync pauses for the first time (e.g. when it writes something to disk asynchronously).

    Finally, "await archive" is implemented by asking the scheduler to block this stack until the archive task is done executing. The scheduler returns to the caller of ArchiveDocuments or to whatever it decides should run next.
Okay, that's it. As far as I can see, the stack-based approach is as powerful as aync/await, the way it operates is just different. Calling an "async" method without "await" generally requires the creation of a new stack, but if you want to immediately await the result, you can just make a direct call.

Now, we can start to think about which is better. In fact, neither is better in every case; each has benefits and drawbacks. First I'll mention some relevant performance facts:
  1. The async/await system is optimized for cases where the task is not actually asynchronous. If the task completes immediately, no extra memory is allocated to track the state of the task. I am not sure whether stack-switching could be optimized for that case. I suppose if the system has a few "scratch stacks" laying around, it would be cheap to switch to one and release it when done, without creating new heap objects for bookkeeping. However, the parameters to the "async" function would have to be chosen on the original stack (because their values may depend on data in the original stack) and then copied to the new one. So when you add up the bookkeeping, it appears this case will be slower than async/await. However, another approach would be to NOT create a new stack by default. The potentially-async method runs on the same stack as its caller, but if it needs to pause, the scheduler is invoked to move the method's stack frame to a new stack and then returns to the caller on the old stack. Thus it is optimized for the non-async case, with a small penalty if the method needs to pause.
  2. I've made some notes on the stack switcher before, but it bears repeating: new stacks on x86 require a minimum 8KB of virtual memory (but realistically more) and 4KB of physical memory. Most likely the stacks would be allocated on a special heap and would be subject to garbage collection or ref-counting. To save memory, the stack manager could copy small stacks that have not been used recently into heap blocks, and then copy them back to "scratch stacks" when they are eventually invoked. User code would have only handles to stacks, not pointers, so that the stack manager can freely move stacks around. When it is necessary to move larger stacks around, the stack manager would use system calls to rearrange virtual memory instead of physical memory (this is possible because stacks are aligned to page boundaries).
  3. The async/await system is implemented entirely within the compiler, so async methods become classes, and local variables become members of the class. Perhaps a kind of switch statement is used to jump to the middle of the method when it is resumed, or maybe the function is completely rearranged somehow.
At last, we can discuss pros and cons of stack-saving vs. async/await:
  1. Pro: IMO, stack saving is easier to understand. Everybody understands what a stack is and how it works; with stack switching you just have to know that there are a bunch of stacks and a scheduler automatically switches between them for you. It's a pretty straightforward extension of multithreading, except that stacks are much cheaper than threads, and you don't have to worry about synchronization if you set up your tasks to run on the same thread.
  2. Pro/Tie: methods that use async/await run slower because their local variables become class members, access to which cannot be optimized as well by the JIT. (Edit: then again, while the code in the stack may run faster, overhead for allocating and switching between stacks may be higher.)
  3. Pro: async/await must be implemented independently for every compiler, but stack-saving does not require any compiler support. It would automatically be available in any language that targets the CLR. Syntactic sugar might be helpful for some operations, like calling a method on a new stack, but it would not be required.
  4. Pro: suppose method A calls task B on a new stack. B calls a function C that doesn't know it is involved in an async task, and then C fires an event D, and D yields the stack (suspending B, C, and D). This scenario is not possible with async/await: C cannot be paused if it is not an async task. To put it another way, async/await is "infectious": if you want to run a stack of functions asynchronously, all of them must be marked async.
  5. Con: Instead of compiler support, stack-saving requires support within the CLR itself. But I think async/await was developed prior to the recent deployment of a new CLR, so MS had the chance to add that support if they had wanted to.
  6. Con/Tie: creating a new stack might be slower than creating a Task object that holds the method's state. But perhaps it could be optimized to be as efficient in the average case.
  7. Con/Tie: A naive stack manager would use a lot of memory, e.g. 8KB RAM and 16KB VM per stack. But perhaps memory use would be similar in the two systems if they are both optimized properly.
  8. Tie: it's not clear whether resuming a stack is faster or slower than resuming a .NET async Task.
  9. Pro: Stack switching can be brought to other non-.NET environments and languages. If it were implemented in native code, for example, C code could use it (actually, Win32 has "Fibers", but I wouldn't expect high efficiency from them.) Any skilled developer with very low-level knowledge could implement it (although the number of such people is few, hence it rarely happens.) In contrast, the async/await feature must be implemented independently for every language.
  10. Con: the implementation of stack-switching is different on each operating system and would need a bunch of scary assembly code.
  11. Con: stack-saving works great if you only need a few stacks. But if you need a large number of stacks that are all called frequently, they will use a lot of memory. An example would be a video game with thousands of individual "actors" that each get their own stack or async Task, where each actor is called 100 times per second. This would require more memory in a stack-switching system and would probably be harder on the cache, too.
  12. Con: we live in the real world. MS just ain't gonna support stack switching, and it seems overkill to have these two mechanisms that do the same thing. However, stack switching should still be considered outside .NET land.
Well, on the whole it looks like the pros and cons just about cancel each other out. And #12 is a doozie. Personally, I prefer stack saving due to points 1 thru 4. (1) I find stack switching to be a more intuitive concept, (2) optimized stack saving might edge-out async/await in performance (3) there are numerous compilers for .NET, some 3rd-party, at it would sure be nice if they could all get async support "automagically", and (4) maybe stack-saver is more powerful, after all.

P.S. I just learned that the Go languages uses a stack-switching system to implement Goroutines.


Post a Comment

<< Home