Hi, Genode hackers!

I've been debugging the performance of the network under Genode.
I am using Genode on X86 with the dde_ipxe nic driver.

So far, I have figured out the following:
1. Memcpy and malloc calls in the nic driver take a negligible amount of time.
I'm using X86 rdtsc to measure tick count, and memcpy usually amounts to 5-20 ticks
2. Most of time is spent in the rx_handler: the _alloc.submit() call (260 ticks)

Inside the submit() routine (in the os/include/nic/component.h), around 80 percent of
time is occupied by the "_rx.source()->submit_packet()" call.
The submit() routine first checks for acknowledged packets from the client, and then calls
the submit_packet() function.

The Tx_thread::entry() gets packets from the client, sends it to the nic, and acks.

My questions are the following:
1. Which ways to improve nic performance can you suggest?
2. Why does the nic rx ack the packets? since packet order should be handled by TCP, can we split packet acking and submitting the "current" packet into separate threads?
3. What is the suggested "best" way to debug IPC and profile applications?

Thank you for your attention. Have fun!

--
Regards, Alexander