Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i obtained extremely low result - 5x10^6 bits/s while the maximum throughput is 1000 mbit/s . I wonder what could be the reason for that.
Anna
Hi Anna,
On 01/28/2014 11:02 AM, Анна Будкина wrote:
Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i obtained extremely low result - 5x10^6 bits/s while the maximum throughput is 1000 mbit/s . I wonder what could be the reason for that.
That is only 5 mbit/s, which is indeed pretty low. Right now the LXIP stack should at least do about 500 mbit/s in either direction. Please let me know what hardware platform you are using, so I can try to reproduce the behavior.
Thanks,
Sebastian
I'm measuring throughput between two hosts. I use genode running on fiasco.OC on one machine and monolithic Linux on another machine. There's 82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi driver doesn't work on my machine i'm using pci_drv driver. Another problem is that level-triggered interrupts are not recieved and i'm polling interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc: while (1) { _lock.lock(); if (_handle_irq) _handler(_priv); _lock.unlock(); }
//while (1) { //_irq.wait_for_irq(); /* only call registered handler function, if IRQ is not disabled */ //_lock.lock(); //if (_handle_irq) _handler(_priv); //_lock.unlock(); //} I've done the same thing while testing l4re network stack and it didn't affect performance then.
2014-01-28 Sebastian Sumpf <Sebastian.Sumpf@...1...>
Hi Anna,
On 01/28/2014 11:02 AM, Анна Будкина wrote:
Hello!
I'm running netperf TCP_STREAM and TCP_MAERTS tests for lxip stack and i obtained extremely low result - 5x10^6 bits/s while the maximum throughput is 1000 mbit/s . I wonder what could be the reason for that.
That is only 5 mbit/s, which is indeed pretty low. Right now the LXIP stack should at least do about 500 mbit/s in either direction. Please let me know what hardware platform you are using, so I can try to reproduce the behavior.
Thanks,
Sebastian
-- Sebastian Sumpf Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.cl... _______________________________________________ Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Hi again Anna,
for the future, please sent your replies at the bottom not the top, like I am doing now .-)
On 01/28/2014 11:58 AM, Анна Будкина wrote:
I'm measuring throughput between two hosts. I use genode running on fiasco.OC on one machine and monolithic Linux on another machine. There's 82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi driver doesn't work on my machine i'm using pci_drv driver. Another problem is that level-triggered interrupts are not recieved and i'm polling interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc: while (1) { _lock.lock(); if (_handle_irq) _handler(_priv); _lock.unlock(); }
//while (1) { //_irq.wait_for_irq(); /* only call registered handler function, if IRQ is not
disabled */ //_lock.lock(); //if (_handle_irq) _handler(_priv); //_lock.unlock(); //} I've done the same thing while testing l4re network stack and it didn't affect performance then.
It sounds like as if you are using an x86 machine - right? Polling is a no go and I have tried to fix the interrupt-mode issue (edge vs. level vs. low and high) on Fiasco OC on several occasions. Unfortunately, there seems to be no universal solution to that. Maybe you should ask about that one on the L4Hackers mailing list: l4-hackers@...28...
If you can afford it and if you're running on x86, please try the Nova version of Genode and let me know about the outcome, performance-wise of course. I appreciate any hints why the ACPI-driver is not working (other then it is not an x86 computer of course).
For the book: I cannot recommend the changes above!
Sebastian
Hello.
If you can afford it and if you're running on x86, please try the Nova version of Genode and let me know about the outcome, performance-wise of course. I appreciate any hints why the ACPI-driver is not working (other then it is not an x86 computer of course).
Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc
Hello Anna,
welcome to the list ;-)
On Tue, Jan 28, 2014 at 02:58:35PM +0400, Анна Будкина wrote:
I'm measuring throughput between two hosts. I use genode running on fiasco.OC on one machine and monolithic Linux on another machine. There's 82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi driver doesn't work on my machine i'm using pci_drv driver. Another problem is that level-triggered interrupts are not recieved and i'm polling interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc:
[...]
I was slightly astonished by the bad benchmark results. So, I tried today's Genode master with the following scenario:
* Genode on Lenovo T61 (82566mm, PCIe 8086:1049) * Linux on T410 (82577LM, PCIe 8086:10ea)
With your patch I got
! PERF: TCP_STREAM 2.02 MBit/s ! PERF: TCP_MAERTS 8.00 MBit/s
This substantiates my assumption that your implemented "polling" degrades the performance significantly. The original code without polling produces
! PERF: TCP_STREAM 65.59 MBit/s ! PERF: TCP_MAERTS 543.35 MBit/s
which is no top-notch result, but looks more promising. We did not investigate the performance drop on TCP_STREAM up to now, but suspect the NIC driver or its integration to be the cause.
Best regards
On 01/28/2014 12:59 PM, Christian Helmuth wrote:
Hello Anna,
welcome to the list ;-)
On Tue, Jan 28, 2014 at 02:58:35PM +0400, Анна Будкина wrote:
I'm measuring throughput between two hosts. I use genode running on fiasco.OC on one machine and monolithic Linux on another machine. There's 82579LM NIC on each host. I'm running netperf_lxip.run script. As acpi driver doesn't work on my machine i'm using pci_drv driver. Another problem is that level-triggered interrupts are not recieved and i'm polling interrupts in the internal cycle in /os/src/lib/dde_kit/interrupt.cc:
[...]
I was slightly astonished by the bad benchmark results. So, I tried today's Genode master with the following scenario:
- Genode on Lenovo T61 (82566mm, PCIe 8086:1049)
- Linux on T410 (82577LM, PCIe 8086:10ea)
With your patch I got
! PERF: TCP_STREAM 2.02 MBit/s ! PERF: TCP_MAERTS 8.00 MBit/s
This substantiates my assumption that your implemented "polling" degrades the performance significantly. The original code without polling produces
! PERF: TCP_STREAM 65.59 MBit/s ! PERF: TCP_MAERTS 543.35 MBit/s
which is no top-notch result, but looks more promising. We did not investigate the performance drop on TCP_STREAM up to now, but suspect the NIC driver or its integration to be the cause.
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
Sebastian
Sebastian,
On Tue, Jan 28, 2014 at 01:07:19PM +0100, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249
Alignment Offset Bytes Bytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send Recv Send Recv Send (avg) Recv (avg) 8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Recv Send Recv Send Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151
Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 679280808 13875.90 48954 16384.00 41474
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data from the system running netperf to the system running netserver. TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just like” a TCP_STREAM test except the data flows from the netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Regards
On 01/28/2014 01:20 PM, Christian Helmuth wrote:
Sebastian,
On Tue, Jan 28, 2014 at 01:07:19PM +0100, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249
Alignment Offset Bytes Bytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send Recv Send Recv Send (avg) Recv (avg) 8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Recv Send Recv Send Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151
Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 679280808 13875.90 48954 16384.00 41474
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data from the system running netperf to the system running netserver. TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just like” a TCP_STREAM test except the data flows from the netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Without cursing: This is no good! I will look into that!
Sebastian
2014-01-28 Sebastian Sumpf <Sebastian.Sumpf@...1...>
On 01/28/2014 01:20 PM, Christian Helmuth wrote:
Sebastian,
On Tue, Jan 28, 2014 at 01:07:19PM +0100, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
Complete netperf output follows
---------------------------- TCP_STREAM ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service
Demand
Socket Socket Message Elapsed Send Recv Send
Recv
Size Size Size Time Throughput local remote local
remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB
us/KB
87380 16384 1024 10.02 65.59 34.95 -1.00 174.598
-1.249
Alignment Offset Bytes Bytes Sends Bytes Recvs Local Remote Local Remote Xfered Per Per Send Recv Send Recv Send (avg) Recv (avg) 8 8 0 0 82158592 1024.00 80233 8905.12 9226
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s
! PERF: TCP_STREAM 65.59 MBit/s ok
---------------------------- TCP_MAERTS ----------------------- spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024 MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.0.0.65 () port 0 AF_INET
Recv Send Send Utilization Service
Demand
Socket Socket Message Elapsed Recv Send Recv
Send
Size Size Size Time Throughput local remote local
remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB
us/KB
87380 16384 1024 10.00 543.35 39.06 -1.00 23.555
-0.151
Alignment Offset Bytes Bytes Recvs Bytes Sends Local Remote Local Remote Xfered Per Per Recv Send Recv Send Recv (avg) Send (avg) 8 8 0 0 679280808 13875.90 48954 16384.00
41474
Maximum Segment Size (bytes) 1448
calculation: overall bytes / size per packet / time = packets per second 679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s
! PERF: TCP_MAERTS 543.35 MBit/s ok
The manual states
TCP_STREAM It is quite simple, transferring some quantity of data from the system running netperf to the system running netserver. TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is "just like" a TCP_STREAM test except the data flows from the netserver to the netperf.
So, the scenario is much slower if the Genode side is _receiving_.
Without cursing: This is no good! I will look into that!
Sebastian
-- Sebastian Sumpf Genode Labs
http://www.genode-labs.com · http://genode.org
Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
WatchGuard Dimension instantly turns raw network data into actionable security intelligence. It gives you real-time visual feedback on key security issues and trends. Skip the complicated setup - simply import a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.cl... _______________________________________________ Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
Thank you very much for the reply! I will try to use Genode on Nova to perform these tests.
Hello,
just for completeness...
On Tue, Jan 28, 2014 at 04:49:49PM +0400, Анна Будкина wrote:
Thank you very much for the reply! I will try to use Genode on Nova to perform these tests.
My results on NOVA
! PERF: TCP_STREAM 230.55 MBit/s ! PERF: TCP_MAERTS 664.66 MBit/s
So, on NOVA it performs slightly better on TCP_MAERTS and also yields much improved performance on TCP_STREAM - unfortunatly still about 1/3 of TCP_MAERTS.
Regards
Hi Vasily,
On 01/28/2014 12:58 PM, Sartakov A. Vasily wrote:
Hello.
If you can afford it and if you're running on x86, please try the Nova version of Genode and let me know about the outcome, performance-wise of course. I appreciate any hints why the ACPI-driver is not working (other then it is not an x86 computer of course).
Just for curiosity: Have you compared LXI performance with different kernels - NOVA vs FOC? on hardware, ofc
No not really, this thing has been optimized for the Exynos 5250 platform on Fiasco OC, using a Gigabit-USB-network adapter. Right now I am looking into the rump stuff (http://wiki.netbsd.org/rumpkernel), while hoping to get a decent performance out of it! Just talk to my colleagues at FOSDEM about it!
Sorry I cannot be there,
Sebastian
On 01/28/2014 01:49 PM, Анна Будкина wrote:
2014-01-28 Sebastian Sumpf <Sebastian.Sumpf@...1... mailto:Sebastian.Sumpf@...1...>
On 01/28/2014 01:20 PM, Christian Helmuth wrote: > Sebastian, > > On Tue, Jan 28, 2014 at 01:07:19PM +0100, Sebastian Sumpf wrote: >> Thanks for your tests! But I don't like the 65 MBit/s thing! What is >> going on? Is this RX or TX? > > Complete netperf output follows > > ---------------------------- TCP_STREAM ----------------------- > spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_STREAM -c -C -- -m 1024 > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Send Recv Send Recv > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1024 10.02 65.59 34.95 -1.00 174.598 -1.249 > > Alignment Offset Bytes Bytes Sends Bytes Recvs > Local Remote Local Remote Xfered Per Per > Send Recv Send Recv Send (avg) Recv (avg) > 8 8 0 0 82158592 1024.00 80233 8905.12 9226 > > Maximum > Segment > Size (bytes) > 1448 > > calculation: overall bytes / size per packet / time = packets per second > 82158592 Bytes / 1024 Bytes / 10.02 s = 8007 packets/s > > ! PERF: TCP_STREAM 65.59 MBit/s ok > > ---------------------------- TCP_MAERTS ----------------------- > spawn netperf-2.6.0 -H 10.0.0.65 -P 1 -v 2 -t TCP_MAERTS -c -C -- -m 1024 > MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.65 () port 0 AF_INET > Recv Send Send Utilization Service Demand > Socket Socket Message Elapsed Recv Send Recv Send > Size Size Size Time Throughput local remote local remote > bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB > > 87380 16384 1024 10.00 543.35 39.06 -1.00 23.555 -0.151 > > Alignment Offset Bytes Bytes Recvs Bytes Sends > Local Remote Local Remote Xfered Per Per > Recv Send Recv Send Recv (avg) Send (avg) > 8 8 0 0 679280808 13875.90 48954 16384.00 41474 > > Maximum > Segment > Size (bytes) > 1448 > > calculation: overall bytes / size per packet / time = packets per second > 679280808 Bytes / 1024 Bytes / 10.00 s = 66336 packets/s > > ! PERF: TCP_MAERTS 543.35 MBit/s ok > > > The manual states > > TCP_STREAM It is quite simple, transferring some quantity of data > from the system running netperf to the system running > netserver. > TCP_MAERTS A TCP_MAERTS (MAERTS is STREAM backwards) test is “just > like” a TCP_STREAM test except the data flows from the > netserver to the netperf. > > So, the scenario is much slower if the Genode side is _receiving_. Without cursing: This is no good! I will look into that!
Thank you very much for the reply! I will try to use Genode on Nova to perform these tests.
So, it is x86 i guess.
Sebastian
On 01/28/2014 10:22 PM, Julian Stecklina wrote:
On 01/28/2014 01:07 PM, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be our plot guy .-)
Sebastian
On Tue, 28 Jan 2014 22:48:25 +0100 Sebastian Sumpf <Sebastian.Sumpf@...1...> wrote:
On 01/28/2014 10:22 PM, Julian Stecklina wrote:
On 01/28/2014 01:07 PM, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be our plot guy .-)
Hi Sebastian,
I was wondering whether you actually looked intro that as we are experiencing some strange effects with netperf as well.
Let me briefly summarise our findings: We are running netperf_lwip on base-linux in order to evaluate how our changes in the software affect the networking performance. For TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the results that have been discussed here. However, what actually puzzles me most is the fact that netperf_lwip_bridge draws a quite different picture. More precisely, TCP_STREAM falls down to round about 170Mbit/s which I guess is perfectly explainable by the additional context switch and copying of the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the additional nic_bridge. All results are reproducible. I could also observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the code for TCP_MAERTS only uses send(). Hence, it's totally comprehensible to me that we experience asymmetric throughput results depending on which path (RX or TX) performs better. However, I just don't get why the nic_bridge, which not only adds a context switch but also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets enabled by the asynchronous packet-stream interface. I think I could test this by assigning a high scheduling priority to the nic_bridge so that it always processes a single packet.
Up to this point I have basically two questions: 1. Has anyone made any further investigations of Genode's networking performance? 2. Any other (possible) explanations for my observations?
Cheers Johannes
Hi Johannes,
sorry for the late answer, I first had to fix RPI's USB and networking after the last release, it seemed to be kind of or totally broken since the 15.05. release. Reinier pointed me kindly to this. My answer can be found below.
On 05/24/2016 10:11 PM, Johannes Schlatow wrote:
On Tue, 28 Jan 2014 22:48:25 +0100 Sebastian Sumpf <Sebastian.Sumpf@...1...> wrote:
On 01/28/2014 10:22 PM, Julian Stecklina wrote:
On 01/28/2014 01:07 PM, Sebastian Sumpf wrote:
Thanks for your tests! But I don't like the 65 MBit/s thing! What is going on? Is this RX or TX?
For the extremely bad case, it might be interesting to capture a packet trace and use tcptrace/xplot on it.
Thanks Julian, I will have a look at it, even though Alex seems to be our plot guy .-)
Hi Sebastian,
I was wondering whether you actually looked intro that as we are experiencing some strange effects with netperf as well.
Let me briefly summarise our findings: We are running netperf_lwip on base-linux in order to evaluate how our changes in the software affect the networking performance. For TCP_STREAM, I get results of approx. 350Mbit/s while TCP_MAERTS results in approx. 110Mbit/s. Interestingly, this asymmetry is reverse to the results that have been discussed here. However, what actually puzzles me most is the fact that netperf_lwip_bridge draws a quite different picture. More precisely, TCP_STREAM falls down to round about 170Mbit/s which I guess is perfectly explainable by the additional context switch and copying of the nic_bridge. Yet TCP_MAERTS performs better, i.e. 130Mbit/s with the additional nic_bridge. All results are reproducible. I could also observe a similar behaviour on hw_rpi.
AFAIK the netserver code for TCP_STREAM only uses recv() whereas the code for TCP_MAERTS only uses send(). Hence, it's totally comprehensible to me that we experience asymmetric throughput results depending on which path (RX or TX) performs better. However, I just don't get why the nic_bridge, which not only adds a context switch but also additional copying, increases the performance for TCP_MAERTS.
I guess this might be caused by bulk processing of multiple packets enabled by the asynchronous packet-stream interface. I think I could test this by assigning a high scheduling priority to the nic_bridge so that it always processes a single packet.
Up to this point I have basically two questions:
- Has anyone made any further investigations of Genode's networking
performance? 2. Any other (possible) explanations for my observations?
1. Not really.
2. TCP uses receive and send window sizes. This means that an ACK has to be sent for each window or segment, how they call it, not for each TCP packet. Usually, the higher the throughput is, the larger are the window sizes. We have seen window sizes as large as 20 KB, but only when Linux is sending. The window size dynamically adapts to the rate of ACKs and heavily depends on the timing of both communication partners. Also when sending (MAERTS) we cannot batch packets as we do when receiving them directly from the hardware (there can be multiple packets available in one DMA transaction - on most cards). This means each packet is send to the card in a separate request (especially on Linux). Therefore, I would see the sending as a base line when sending or receiving one packet at a time. Because of the nic_bridge, the timing changed so that the ACK rate somehow caused a slightly larger TCP window (you can check that with wireshark). Because of batching, the receive numbers would be in turn the current (and not so great ;) upper limit. That would be my three cents.
Programming a TCP/IP stack that actually works and performs in the wild is complicated stuff and I guess we could keep our whole company busy, just doing that. I hope this helps to explain some parts of your observation,
Sebastian