The problem with the Signal service implementation.

List overview All Threads
Download

newer

older

How to run Bash on QEMU ARM

Announcement: Genode OS Framework...

ivan.bludov

10 Feb 2012 10 Feb '12

10:39 a.m.

Hi, All.

We were investigating problem of sending big amount of data via network in Genode on Fiasco.OC. And we have found out some significant bug of the Genode Signal service implementation and we also could provide the simple stable solution. As know Packet_stream_rx and Packet_stream_tx implementations (os/include/packet_stream_*) are based on a Signal service (signal.lib) and use RPC submit(Signal_context_capability context) of Genode::Signal_session (base/include/signla_session/signal_session.h). And this submit RPC is invoked all the time while RX – TX packet transferring (as the packet acknowledge). So the problem is that the submit is RPC and according to RPC implementation all parameters, which are capabilities, are passed by special way know as capability marshalling/unmarshalling (base-foc/include/base/ipc.h). And while capability unmarshalling the new l4_cap_sel is allocated and capabilities are mapped once again all the time. In the limit all capability table of the l4 task will be filled completely; especially while intensive packet transferring. And it damages the Genode Core foremost and provides other troubles. As we investigated the decision of using the capability in a sense of signal context is redundant. Signal_session_component::submit uses Siganl_context_capability only for searching it in the context entrypoint (base/src/core/signal_session_component.cc). But the same effect could be achieved by searching signal context by a badge(local_name()) and then there would not be necessity of capability unmarshalling. The decision is quote simple to change the Signal_context_capability typedef from Capability<Signal_context> to simple int or long type (base/include/signla_session/signal_session.h) and then to fix all errors generated by compiler. So we think that this problem of Signal service is significant and has to be fixed as quick as possible. But of course we'll glad to hear any your remarks about the decision to use the capability is a sense of Signal_context.

Best, Ivan Bludov.

Show replies by date

Norman Feske

10 Feb 10 Feb

11:10 a.m.

Hi Ivan,

...

We were investigating problem of sending big amount of data via network in Genode on Fiasco.OC. And we have found out some significant bug of the Genode Signal service implementation and we also could provide the simple stable solution. As know Packet_stream_rx and Packet_stream_tx

thank you for investigating this issue. This is indeed a problem on all kernels that use kernel-protected capabilities. (i.e., Fiasco.OC and NOVA) Actually, there exists an issue-tracker entry for it:

https://github.com/genodelabs/genode/issues/32

Admittedly, I had not realized the significance of this issue for the packet-steam interface.

...

context is redundant. Signal_session_component::submit uses Siganl_context_capability only for searching it in the context entrypoint (base/src/core/signal_session_component.cc). But the same effect could be achieved by searching signal context by a badge(local_name()) and then there would not be necessity of capability unmarshalling. The decision is quote simple to change the Signal_context_capability typedef from Capability<Signal_context> to simple int or long type (base/include/signla_session/signal_session.h) and then to fix all errors generated by compiler. So we think that this problem of Signal service is significant and has to be fixed as quick as possible. But of course we'll glad to hear any your remarks about the decision to use the capability is a sense of Signal_context.

Unfortunately, a real solution for the problem is not as simple as that. You are right that the badge is used as a key for looking up the signal-context within core. But by passing the badge as plain data instead of a capability, the referred signal context could be forged by the client. This way, a malicious client would be able to submit signals to all signal receivers in the system. The use of capabilities prevents that.

That said, I think that your fix is better as interim solution than the current leak of capability selectors.

Thanks a lot for bringing up the issue and for the proposal for a fix. Your work is much appreciated!

Norman

-- Dr.-Ing. Norman Feske Genode Labs http://www.genode-labs.com · http://genode.org Genode Labs GmbH · Amtsgericht Dresden · HRB 28424 · Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth

Norman Feske

11:49 a.m.

Hello again,

...

Unfortunately, a real solution for the problem is not as simple as that. You are right that the badge is used as a key for looking up the signal-context within core. But by passing the badge as plain data instead of a capability, the referred signal context could be forged by the client. This way, a malicious client would be able to submit signals to all signal receivers in the system. The use of capabilities prevents that.

replying to myself now... .-)

I just had the following idea for a fix: When unmarshalling a capability (in 'Ipc_istream::_unmarshal_capability') we need to distinguish the case of having got a new capability from the case of getting a reference to an already known capability. We receive the 'unique_id' as hint (it is just plain data - hence untrusted information) about which capability selector the argument refers to. Using this 'unique_id', we could look up the core-local capability selector at the cap-selector allocator. Currently, we do not store the unique ID at this allocator. So we would need to add a way to register the unique IDs that correspond to cap selectors and a way to perform a lookup from cap selector to unique ID.

If the lookup fails, we know that we received a new capability (for the signal service, this should never happen because all signal contexts are allocated at core). If the lookup succeeds, we obtained a capability selector that can now be tested via the kernel's 'l4_task_cap_equal' kernel function. If the just received capability selector refers to the same kernel capability as the looked up selector, we just keep using the one returned by the lookup and do not allocate a new selector.

Of course there is still the other part of the problem: keeping a reference count for each capability selector (similarly to how shared pointers work). But the fix above should actually solve the capability-selector leak in the packet-stream case without introducing a security problem.

Cheers Norman

ivan.bludov

2:04 p.m.

Hi, we have an idea that can flow Genode security system, correct me if we are wrong. It also can hack signal service. The ideas of kernel-protected capabilities are interesting, but we think the implementation of Genode capabilities has some features. Although we consider only Genode on Fiasco.OC. The Genode::Native_capability could be considered as two values : the badge(the local_name) and the dst ("the pointer" to the kernel's object). Thus we can use simple structure Native_capability(valid_dst, any_number_as_badge) as Signal_context_capability (using static_cap_cast), where valid_dst could be env()->ram_session_cap().dst(). Then we can pass this obtained capability as parameter to the Signal_session_client::submit() RPC and call any arbitrary signal_contexts. Furthermore as we known the env()->ram_session_cap().dst() points to the Genode main entrypoint then we can create dummy_client of Native_capability(env()->ram_session_cap().dst(), any_number_as_badge) that will call any_rpc. And we think these activities will crush all Genode system.

addr_t genode_ep = env()->ram_session_cap().dst(); for(long badge = 0;; badge++) { Dummy_client client(Native_capability(genode_ep, badge)); client.call_dummy_rpc_1(); ... client.call_dummy_rpc_k(); // say k>10 }

May be we don't understand Genode and Fiasco.OC completely and you can explain this behaviour.

Best, Ivan Bludov.

On 10/02/12 15:49, Norman Feske wrote:

...

Hello again,

...
Unfortunately, a real solution for the problem is not as simple as that. You are right that the badge is used as a key for looking up the signal-context within core. But by passing the badge as plain data instead of a capability, the referred signal context could be forged by the client. This way, a malicious client would be able to submit signals to all signal receivers in the system. The use of capabilities prevents that.

replying to myself now... .-)

I just had the following idea for a fix: When unmarshalling a capability (in 'Ipc_istream::_unmarshal_capability') we need to distinguish the case of having got a new capability from the case of getting a reference to an already known capability. We receive the 'unique_id' as hint (it is just plain data - hence untrusted information) about which capability selector the argument refers to. Using this 'unique_id', we could look up the core-local capability selector at the cap-selector allocator. Currently, we do not store the unique ID at this allocator. So we would need to add a way to register the unique IDs that correspond to cap selectors and a way to perform a lookup from cap selector to unique ID.

If the lookup fails, we know that we received a new capability (for the signal service, this should never happen because all signal contexts are allocated at core). If the lookup succeeds, we obtained a capability selector that can now be tested via the kernel's 'l4_task_cap_equal' kernel function. If the just received capability selector refers to the same kernel capability as the looked up selector, we just keep using the one returned by the lookup and do not allocate a new selector.

Of course there is still the other part of the problem: keeping a reference count for each capability selector (similarly to how shared pointers work). But the fix above should actually solve the capability-selector leak in the packet-stream case without introducing a security problem.

Cheers Norman

Virtualization& Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ _______________________________________________ Genode-main mailing list Genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main

Norman Feske

2:43 p.m.

Hello Ivan,

...

addr_t genode_ep = env()->ram_session_cap().dst(); for(long badge = 0;; badge++) { Dummy_client client(Native_capability(genode_ep, badge)); client.call_dummy_rpc_1(); ... client.call_dummy_rpc_k(); // say k>10 }

May be we don't understand Genode and Fiasco.OC completely and you can explain this behaviour.

your observation is correct. Even though we use kernel capabilities for constraining access to individual entrypoints, there is currently no check in place to test the integrity of the supplied unique IDs at the server side. Once you gain access to an entrypoint (in your example to the entrypoint that provides the RAM service) you can invoke any object managed by this particular entrypoint by guessing the object's unique ID. This is particularly bad for core's entrypoints.

For Fiasco.OC, we could implement a check in the lines of the idea I outlined in my previous email. As we use the same value as badge as for the unique ID, a comparison (incoming badge == unique ID?) would prevent the attack.

So you are completely right about the current implementation being incomplete. The good message is that it can be fixed without changing the API.

Cheers Norman

Norman Feske

2:57 p.m.

I just opened a new issue about the problem:

https://github.com/genodelabs/genode/issues/106

Norman

Norman Feske

29 Feb 29 Feb

11:13 a.m.

Hi Ivan,

I just want to give notice that we implemented an interim fix for the problem you reported. It the fix is included in the Genode 12.02 release. The relevant commit is:

https://github.com/genodelabs/genode/commit/41eaff2cc656942e1b6a1665ade99f09...

The problem is also covered in the release notes:

http://genode.org/documentation/release-notes/12.02#Fiasco.OC_microkernel

Please note that this is not the actual solution for the underlying problem of proper lifetime management of capability selectors. Stefan is working on that. In the meantime, we hope that the current fix of the annoying capability-selector leak accommodates you well.

Best regards Norman

4912

Age (days ago)

4931

Last active (days ago)

users@lists.genode.org

6 comments

2 participants

tags (0)

participants (2)

ivan.bludov
Norman Feske