Hi Johannes,
sorry for the late answer, I first had to fix RPI's USB and networking after the last release, it seemed to be kind of or totally broken since the 15.05. release. Reinier pointed me kindly to this. My answer can be found below.
On 05/24/2016 10:11 PM, Johannes Schlatow wrote:
On Tue, 28 Jan 2014 22:48:25 +0100 Sebastian Sumpf <Sebastian.Sumpf@...1...> wrote:
On 01/28/2014 10:22 PM, Julian Stecklina wrote:
On 01/28/2014 01:07 PM, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be our plot guy .-)
Hi Sebastian,
I was wondering whether you actually looked intro that as we are experiencing some strange effects with netperf as well.
Let me briefly summarise our findings: We are running netperf_lwip on base-linux in order to evaluate how our changes in the software affect the networking performance. For TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the results that have been discussed here. However, what actually puzzles me most is the fact that netperf_lwip_bridge draws a quite different picture. More precisely, TCP_STREAM falls down to round about 170Mbit/s which I guess is perfectly explainable by the additional context switch and copying of the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the additional nic_bridge. All results are reproducible. I could also observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the code for TCP_MAERTS only uses send(). Hence, it's totally comprehensible to me that we experience asymmetric throughput results depending on which path (RX or TX) performs better. However, I just don't get why the nic_bridge, which not only adds a context switch but also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets enabled by the asynchronous packet-stream interface. I think I could test this by assigning a high scheduling priority to the nic_bridge so that it always processes a single packet.
Up to this point I have basically two questions:
- Has anyone made any further investigations of Genode's networking
performance? 2. Any other (possible) explanations for my observations?
1. Not really.
2. TCP uses receive and send window sizes. This means that an ACK has to be sent for each window or segment, how they call it, not for each TCP packet. Usually, the higher the throughput is, the larger are the window sizes. We have seen window sizes as large as 20 KB, but only when Linux is sending. The window size dynamically adapts to the rate of ACKs and heavily depends on the timing of both communication partners. Also when sending (MAERTS) we cannot batch packets as we do when receiving them directly from the hardware (there can be multiple packets available in one DMA transaction - on most cards). This means each packet is send to the card in a separate request (especially on Linux). Therefore, I would see the sending as a base line when sending or receiving one packet at a time. Because of the nic_bridge, the timing changed so that the ACK rate somehow caused a slightly larger TCP window (you can check that with wireshark). Because of batching, the receive numbers would be in turn the current (and not so great ;) upper limit. That would be my three cents.
Programming a TCP/IP stack that actually works and performs in the wild is complicated stuff and I guess we could keep our whole company busy, just doing that. I hope this helps to explain some parts of your observation,
Sebastian