Influence of application priority on IPC

List overview All Threads
Download

newer

older

base-foc: contrib compile errors

[Bug report] memset is stranger...

Sergey Grekhov

11 Nov 2012 11 Nov '12

2:33 p.m.

Hello to everyone!

I have a question about influence of application priority on IPC messaging mechanism. Let's say I have two applications (client and server) which use IPC for communication with each other. In run script I set the priority of client to default value (no priority specified) and the priority of server to -1 (value of prio_levels is equal to 2). The question is quite simple: is there any determined dependency between changing the priority of applications and possible collisions of IPC calls? For example, what if both priorities are set to default values or to -1? And if there is a suspect of lost IPC messages, then how this hypothesis can be checked?

Thank you in advance.

Kind regards, Sergey.

Show replies by date

Norman Feske

12 Nov 12 Nov

10:27 a.m.

Hi Sergey,

...

I have a question about influence of application priority on IPC messaging mechanism.

your concern of the interaction of priorities and IPC are spot-on. This is a topic where the different microkernels differ significantly. Generally, static priorities should be used with caution. You need to be aware that a higher-priority thread can starve any lower-priority activity in the system by a busy loop. The potential problems become apparent when having more than two parties in the system. In such scenarios, priority inversion can occur.

...

Let's say I have two applications (client and server) which use IPC for communication with each other. In run script I set the priority of client to default value (no priority specified) and the priority of server to -1 (value of prio_levels is equal to 2).

Assigning a lower priority to the server than to the client is a classical priority inversion problem. The client has a high priority and wants the server to do some work for him. But because the server has a low priority, any other thread that has a higher priority than the server (but possibly a lower priority than the client) can infinitely delay the execution of the server.

There are two ways to deals with such problems, namely priority inheritance and priority ceiling.

Priority inheritance means that the server will inherit the (higher) priority of the client while doing work for the client. So the priority is not attached to a thread but rather to a specific work topic. Anyone that contributes to the work gets the high priority regardless of which processes the execution flows through. As far as I know, NOVA is the only kernel of the Genode base platforms that properly implements priority inheritance.

Priority ceiling means that there exists a strict order of priorities that is consistent with the client-server relationships of processes. A server always needs to have a priority at least as high as its clients. Because of the invariant that the server operates at a higher priority than the client, the service request will make progress in at least any situation where the client would make progress (because it is scheduled at least as likely as the client who is currently blocked). The disadvantage of priority ceiling is that a client can artificially boost its priority (and cause system load) by using servers.

Priority ceiling can be implemented on L4/Fiasco, OKL4, Fiasco.OC, and L4ka::Pistachio.

...

The question is quite simple: is there any determined dependency between changing the priority of applications and possible collisions of IPC calls? For example, what if both priorities are set to default values or to -1? And if there is a suspect of lost IPC messages, then how this hypothesis can be checked?

Normally, one would expect that priority inversion problems are easy to detect because some part of the system just freezes. However, on most kernels that lack priority inheritance (actually all kernels but NOVA), the bad interaction between IPC and priorities are hard to detect because of an optimization called time-slice donation. On those kernels, the client that calls the server lends its remaining time slice to the server to boost the processing of its IPC request. This way, the flow of control will typically go from the client through the server and then back to the client while the client's time slice is active. Unfortunately, this flow of control ends as soon as the time slice is over. When a timer interrupt occurs while the request is at the server, and the kernel schedules an unrelated thread that has a priority higher then the server, the server will starve. As a consequence, this will possibly end up in a deadlock situation that is extremely hard to reproduce.

To avoid these situations, the rule of thumb is to always assign an equal or higher priority to the server than to the client.

Could you elaborate a bit more on the problem where you are using applying priorities to? Is it possible to turn the scenario into a composition where the server is always prioritized higher than its clients?

Best regards Norman

-- Dr.-Ing. Norman Feske Genode Labs http://www.genode-labs.com · http://genode.org Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Alexander Boettcher

12:24 p.m.

On 12.11.2012 10:27, Norman Feske wrote:

...

processes the execution flows through. As far as I know, NOVA is the only kernel of the Genode base platforms that properly implements priority inheritance.

Just some note for completeness: This is solely implemented for the IPC mechanism of the kernel, however it is not supported for the semaphores provided by the kernel. That means if semaphores are used as locks to protect some critical section in user mode, the thread running inside the critical section (and holding the lock) doesn't inherit the priority of other threads also attempting to get into the same critical section.

Udo Steinberg

2:28 p.m.

On Mon, 12 Nov 2012 12:24:04 +0100 Alexander Boettcher (AB) wrote:

AB> On 12.11.2012 10:27, Norman Feske wrote: AB> > processes the execution flows through. As far as I know, NOVA is the AB> > only kernel of the Genode base platforms that properly implements AB> > priority inheritance. AB> Just some note for completeness: This is solely implemented for the IPC AB> mechanism of the kernel, however it is not supported for the semaphores AB> provided by the kernel. That means if semaphores are used as locks to AB> protect some critical section in user mode, the thread running inside AB> the critical section (and holding the lock) doesn't inherit the priority AB> of other threads also attempting to get into the same critical section.

We don't do it for semaphores, because semaphores can be used from different cores and cross-core inheritance doesn't work. We could add some limited form of local inheritance to semaphores, e.g. threads on the same core could help each other.

But - having only core-local inheritance for semaphores could lead to odd behavior, such as programs working as expected on a single-core machine and breaking when running on a multi-core machine, because a developer may have built assumptions about priority inheritance into their code.

So... how useful would core-local inheritance for semaphores be?

Cheers, Udo

Alexander Boettcher

13 Nov 13 Nov

2:19 p.m.

On 12.11.2012 14:28, Udo Steinberg wrote:

...

On Mon, 12 Nov 2012 12:24:04 +0100 Alexander Boettcher (AB) wrote:

AB> On 12.11.2012 10:27, Norman Feske wrote: AB> > processes the execution flows through. As far as I know, NOVA is the AB> > only kernel of the Genode base platforms that properly implements AB> > priority inheritance. AB> Just some note for completeness: This is solely implemented for the IPC AB> mechanism of the kernel, however it is not supported for the semaphores AB> provided by the kernel. That means if semaphores are used as locks to AB> protect some critical section in user mode, the thread running inside AB> the critical section (and holding the lock) doesn't inherit the priority AB> of other threads also attempting to get into the same critical section.

We don't do it for semaphores, because semaphores can be used from different cores and cross-core inheritance doesn't work. We could add some limited form of local inheritance to semaphores, e.g. threads on the same core could help each other.

But - having only core-local inheritance for semaphores could lead to odd behavior, such as programs working as expected on a single-core machine and breaking when running on a multi-core machine, because a developer may have built assumptions about priority inheritance into their code.

So... how useful would core-local inheritance for semaphores be?

IMHO - it is not useful.

Since semaphores can be used cross CPU, priority inheritance either works

* for semaphores on uni- as multi CPU systems or * it doesn't work at all for semaphores.

Currently it does not - which is consequent.

Alex.

...

Cheers, Udo

Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov

Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main

Сергей Грехов

12 Nov 12 Nov

12:33 p.m.

Hello, Mr. Norman Feske and Mr. Alexander Boettcher!

Thank you very much for the complete answer. My use case is quite simple: I tried to start several L4Linux instances and want them to have network available. The environment of the use case: Pandaboard, Fiasco.OC (revision 38).

In order to provide network I use nic_bridge multiplexer which share “Nic” service provided by usb_drv driver. When I set priority of usb_drv to "-1" and priority of nic_bridge to default value, everything works fine. But if I set both priorities to default values, most of L4Linux instances do not obtain IP addresses through DHCP. Also, it was noticed that: even in case of obtaining the IP address, after some time of intensive usage of network, the connection becomes unavailable, as well as “Nic” service provided by nic_bridge if in the previous case the network is not used intensively, the “ping” detects (from time to time) unreachable network of the L4Linux instance (with obtained IP address)

According to provided info there is a hypothesis about issue with nic_bridge. As long as this multiplexer plays both client and server roles, the problem occurs when usb_drv tries to get something from nic_bridge. The problem with priority="-1" for usb_drv is that there is a strange delay when closing network connection. Setting both priorities to default value solves this problem, but also creates a bigger one: network is unavailable on most L4Linux instances.

Kind regards, Sergey.

12.11.2012, 13:28, "Norman Feske" <norman.feske@...1...>:

...

Hi Sergey,

...
I have a question about influence of application priority on IPC messaging mechanism.

your concern of the interaction of priorities and IPC are spot-on. This is a topic where the different microkernels differ significantly. Generally, static priorities should be used with caution. You need to be aware that a higher-priority thread can starve any lower-priority activity in the system by a busy loop. The potential problems become apparent when having more than two parties in the system. In such scenarios, priority inversion can occur.

...
Let's say I have two applications (client and server) which use IPC for communication with each other. In run script I set the priority of client to default value (no priority specified) and the priority of server to -1 (value of prio_levels is equal to 2).

Assigning a lower priority to the server than to the client is a classical priority inversion problem. The client has a high priority and wants the server to do some work for him. But because the server has a low priority, any other thread that has a higher priority than the server (but possibly a lower priority than the client) can infinitely delay the execution of the server.

There are two ways to deals with such problems, namely priority inheritance and priority ceiling.

Priority inheritance means that the server will inherit the (higher) priority of the client while doing work for the client. So the priority is not attached to a thread but rather to a specific work topic. Anyone that contributes to the work gets the high priority regardless of which processes the execution flows through. As far as I know, NOVA is the only kernel of the Genode base platforms that properly implements priority inheritance.

Priority ceiling means that there exists a strict order of priorities that is consistent with the client-server relationships of processes. A server always needs to have a priority at least as high as its clients. Because of the invariant that the server operates at a higher priority than the client, the service request will make progress in at least any situation where the client would make progress (because it is scheduled at least as likely as the client who is currently blocked). The disadvantage of priority ceiling is that a client can artificially boost its priority (and cause system load) by using servers.

Priority ceiling can be implemented on L4/Fiasco, OKL4, Fiasco.OC, and L4ka::Pistachio.

...
The question is quite simple: is there any determined dependency between changing the priority of applications and possible collisions of IPC calls? For example, what if both priorities are set to default values or to -1? And if there is a suspect of lost IPC messages, then how this hypothesis can be checked?

Normally, one would expect that priority inversion problems are easy to detect because some part of the system just freezes. However, on most kernels that lack priority inheritance (actually all kernels but NOVA), the bad interaction between IPC and priorities are hard to detect because of an optimization called time-slice donation. On those kernels, the client that calls the server lends its remaining time slice to the server to boost the processing of its IPC request. This way, the flow of control will typically go from the client through the server and then back to the client while the client's time slice is active. Unfortunately, this flow of control ends as soon as the time slice is over. When a timer interrupt occurs while the request is at the server, and the kernel schedules an unrelated thread that has a priority higher then the server, the server will starve. As a consequence, this will possibly end up in a deadlock situation that is extremely hard to reproduce.

To avoid these situations, the rule of thumb is to always assign an equal or higher priority to the server than to the client.

Could you elaborate a bit more on the problem where you are using applying priorities to? Is it possible to turn the scenario into a composition where the server is always prioritized higher than its clients?

Best regards Norman

-- Dr.-Ing. Norman Feske Genode Labs

http://www.genode-labs.com · http://genode.org

Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main

-- Best regards, Sergey S. Grekhov

Norman Feske

12:48 p.m.

Hello Sergey,

...

My use case is quite simple: I tried to start several L4Linux instances and want them to have network available. The environment of the use case: Pandaboard, Fiasco.OC (revision 38).

In order to provide network I use nic_bridge multiplexer which share “Nic” service provided by usb_drv driver. When I set priority of usb_drv to "-1" and priority of nic_bridge to default value, everything works fine. But if I set both priorities to default values, most of L4Linux instances do not obtain IP addresses through DHCP. Also, it was noticed that: even in case of obtaining the IP address, after some time of intensive usage of network, the connection becomes unavailable, as well as “Nic” service provided by nic_bridge if in the previous case the network is not used intensively, the “ping” detects (from time to time) unreachable network of the L4Linux instance (with obtained IP address)

According to provided info there is a hypothesis about issue with nic_bridge. As long as this multiplexer plays both client and server roles, the problem occurs when usb_drv tries to get something from nic_bridge. The problem with priority="-1" for usb_drv is that there is a strange delay when closing network connection. Setting both priorities to default value solves this problem, but also creates a bigger one: network is unavailable on most L4Linux instances.

from this description, this looks like a bug lurking somewhere in the nic_bridge, the L4Linux network stub driver, or usb_drv. At best, the use of priorities seems to hide a symptom.

Recently, the said components have been reworked quite significantly. Could you please give your scenario a spin when using the current genodelabs/master version? This version uses Fiasco.OC rev 40 and L4Linux rev 25. So please make sure to do a "make prepare" in the 'base-foc' and 'ports-foc' repositories. I also suggest to start with a clean build directory.

If the problem still persists with the current genodelabs/master branch, could you provide us with a run script that allows us to reproduce the scenario?

Best regards Norman

4761

Age (days ago)

4762

Last active (days ago)

users@lists.genode.org

6 comments

5 participants

tags (0)

participants (5)

Alexander Boettcher
Norman Feske
Sergey Grekhov
Udo Steinberg
Сергей Грехов