I was able to re-factor the usb emulation to prevent re-entrant scheduling. However, now I encounter some timer-related issues that I struggle with for not understanding the lx_emul/lx_kit's handling of linux ticks and timing (nor am I an expert in how native Linux does this, it seems to be quite involved). The basic problem is that the driver wants to call "msleep" in Linux to wait between various register writes and reads to the USB hardware. This function is not emulated, so the native Linux implementation is called, which sets a timer.

Now, if I understand it right, the timer setting operation does not update the "hardware" timer until the next schedule (I can trace Genode's clocksource emulation and the msleep call doesn't reset it). In a Linux schedule, I guess this would be handled somehow, but in the emulated schedule, the timer code (tick_nohz_enter/exit) only gets called if (task_flags & PF_WQ_WORKER) is true, which apparently it isn't when msleep is called. If I disable that check, meaning force a tick_nohz_enter/exit on every schedule, then the clocksource will reset after scheduling and the driver can at least initialize the hardware. But as is, it leaves the clocksource scheduled with some default timeout that takes essentially forever (2000 seconds or something), meaning *every* msleep call waits 20+ minutes unless there was a timer event already scheduled.

So, it would be nice to know what's the way msleep emulation in lx_emul/lx_kit is "supposed" to work? Do I have to make sure the calling function always has PF_WQ_WORKER context? Have I disabled some critical code path that makes it run? Do I need to change kernel CONFIG_NO_HZ settings?

To Josef's points about the USP API:

To be honest this USB API ('usb/usb.h') should not be used anymore,
especially as it contains such stumpling blocks under the hood. You took
care to use it a non-blocking fashion but the preferred way is Using the
Usb session directly.
Ok, I understand that Usb::Session may be better, but even using it directly I am not sure it's possible to avoid having to maintain a secondary packet queue. The problem (I think) is this line:
enum { TX_QUEUE_SIZE = 64 };
If there were a way to declare a session with larger TX_QUEUE_SIZE it might be possible, but 64 is simply too small for the ath9k usb driver, which leaves 64 inbound interrupt and 8 inbound bulk URBs open at all times, in addition to requiring at least a few spaces for outbound URBs. So for the moment to keep things simple I am still using the USB API.
 

(I currently don't have the capacity to give you well-grounded help
regarding changing the flow of your LX USB connection which is why I
omitted that part of your post.)
Sure, for the moment I think my approach is at least possible for a proof-of-concept driver, even if the Usb::Session might ultimately be better. It works for ath9k and I think it is close to working for rt2800, if I can figure this timer thing out.

Regards,
CP