Hello Genodians
We have a complex scenario in which we deploy 6 configs periodically. The configurations are applied in a loop always in the same order.
Eventually init throws 'Genode::Bit_array_base::Invalid_clear' and terminates (see the backtrace below).
Maybe it is noteworthy that packages with sub inits that have components that use resource saturation are started.
Does anybody have an idea on how to proceed with debugging the problem?
Best regards, Pirmin
``` #0 0x0000000050086294 in Genode::Bit_array_base::_set (this=0x50137868 <(anonymous namespace)::local_capability_space()::capability_space+229384>, index=<error reading variable: That operation is not available on integers of more than 8 bytes.>, width=1, free=true) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/bit_array.h:87 #1 Genode::Bit_array_base::_set (free=true, width=1, index=<error reading variable: That operation is not available on integers of more than 8 bytes.>, this=0x50137868 <(anonymous namespace)::local_capability_space()::capability_space+229384>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/bit_array.h:75 #2 Genode::Bit_array_base::clear (width=1, index=<error reading variable: That operation is not available on integers of more than 8 bytes.>, this=0x50137868 <(anonymous namespace)::local_capability_space()::capability_space+229384>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/bit_array.h:149 #3 Genode::Bit_allocator<4096u>::free (num_log2=0, bit_start=<error reading variable: That operation is not available on integers of more than 8 bytes.>, this=0x50137860 <(anonymous namespace)::local_capability_space()::capability_space+229376>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/bit_allocator.h:118 #4 Genode::Capability_space_tpl<4096u, Genode::Native_capability::Data>::dec_ref (data=..., this=0x500ff860 <(anonymous namespace)::local_capability_space()::capability_space>) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/include/base/internal/capability_space_tpl.h:196 #5 Genode::Capability_space::dec_ref (data=...) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/capability_space.cc:63 #6 0x000000005002756f in Genode::Native_capability::~Native_capability (this=0x401ff518, __in_chrg=<optimized out>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/native_capability.h:76 #7 Genode::CapabilityGenode::Pd_session::~Capability (this=0x401ff518, __in_chrg=<optimized out>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/capability.h:53 #8 Genode::Quota_transferGenode::Pd_session, Genode::Ram_quota::Remote_account::~Remote_account (this=0x401ff510, __in_chrg=<optimized out>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/quota_transfer.h:86 #9 Genode::Child::_revert_quota_and_destroy (this=<optimized out>, session=...) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/child.cc:392 #10 0x000000005002800a in Genode::Child::session_closed (this=0x7ffff7f9eed0, session=...) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/child.cc:508 #11 0x0000000050029e99 in Genode::Child::<lambda(Genode::Session_state&)>::operator()(Genode::Session_state &) (session=..., __closure=<optimized out>, __closure=<optimized out>) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/child.cc:532 #12 0x000000005002a3d7 in Genode::Id_spaceGenode::Parent::Server::apply<Genode::Session_state, Genode::Child::session_response(Genode::Parent::Server::Id, Genode::Parent::Session_response)::<lambda(Genode::Session_state&)> > (fn=..., id=..., this=0x7ffff7f96088) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/mutex.h:64 #13 Genode::Child::session_response (this=<optimized out>, id=..., response=<optimized out>) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/child.cc:518 #14 0x000000005003e41d in Genode::Meta::call_member<Genode::Meta::Empty, Genode::Parent, Genode::Meta::Pod_tuple<Genode::Id_spaceGenode::Parent::Server::Id, Genode::Meta::Pod_tupleGenode::Parent::Session_response, Genode::Meta::Empty > > (func=<optimized out>, args=<synthetic pointer>..., server=...) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/meta.h:490 #15 Genode::Parent::Rpc_session_response::serveGenode::Parent, Genode::Meta::Empty (args=<synthetic pointer>..., server=...) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/parent/parent.h:325 #16 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_serveGenode::Parent::Rpc_session_response (args=<synthetic pointer>..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:146 #17 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:180 #18 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #19 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #20 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_session_cap, Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #21 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_session, Genode::Parent::Rpc_session_cap, Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (this=this@entry=0x7ffff7f96318, opcode=..., opcode@entry=..., in=..., out=...) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #22 0x000000005003ef50 in Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_session_sigh, Genode::Parent::Rpc_session, Genode::Parent::Rpc_session_cap, Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/util/meta.h:626 #23 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_announce, Genode::Parent::Rpc_session_sigh, Genode::Parent::Rpc_session, Genode::Parent::Rpc_session_cap, Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #24 Genode::Rpc_dispatcherGenode::Parent, Genode::Parent::_do_dispatch<Genode::Meta::Type_list<Genode::Parent::Rpc_exit, Genode::Parent::Rpc_announce, Genode::Parent::Rpc_session_sigh, Genode::Parent::Rpc_session, Genode::Parent::Rpc_session_cap, Genode::Parent::Rpc_upgrade, Genode::Parent::Rpc_close, Genode::Parent::Rpc_session_response, Genode::Parent::Rpc_main_thread, Genode::Parent::Rpc_deliver_session_cap, Genode::Parent::Rpc_resource_avail_sigh, Genode::Parent::Rpc_resource_request, Genode::Parent::Rpc_yield_sigh, Genode::Parent::Rpc_yield_request, Genode::Parent::Rpc_yield_response, Genode::Parent::Rpc_heartbeat_sigh, Genode::Parent::Rpc_heartbeat_response> > (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:201 #25 Genode::Rpc_dispatcher<Genode::Parent, Genode::Parent>::dispatch (out=..., in=..., opcode=..., this=0x7ffff7f96318) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:235 #26 Genode::Rpc_object<Genode::Parent, Genode::Parent>::dispatch (this=0x7ffff7f96290, opcode=..., in=..., out=...) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/rpc_server.h:272 #27 0x000000005006b38a in Genode::Rpc_entrypoint::<lambda(Genode::Rpc_object_base*)>::operator() (obj=0x7ffff7f96290, __closure=<synthetic pointer>) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/mutex.h:64 #28 Genode::Object_poolGenode::Rpc_object_base::apply<Genode::Rpc_entrypoint::entry()::<lambda(Genode::Rpc_object_base*)>
(func=..., capid=<optimized out>, this=0x500fca50
Genode::bootstrap_component()::startup+528) at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/object_pool.h:147 #29 Genode::Rpc_entrypoint::entry (this=0x500fc9a8 Genode::bootstrap_component()::startup+360) at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/base/rpc_dispatch_loop.cc:68 #30 0x000000005009ec3d in Genode::Thread::_thread_start () at /data/genode/depot/trimpim/api/base/2021-02-23T1255/include/base/output.h:54 #31 0x00000000500b3e3b in thread_start () at /data/genode/depot/trimpim/src/base-linux/2021-02-23T1255/src/lib/syscall/spec/x86_64/lx_clone.S:62 ```
Hi Pirmin,
On 23.02.21 16:34, Pirmin Duss wrote:
Eventually init throws 'Genode::Bit_array_base::Invalid_clear' and terminates (see the backtrace below).
Maybe it is noteworthy that packages with sub inits that have components that use resource saturation are started.
Does anybody have an idea on how to proceed with debugging the problem?
I have never seen this exception before, which got me thinking.
The backtrace is already quite insightful. Thanks for posting it.
My interpretation: The problem occurs while closing a session. The server informs init that it has finished the closing of a session (by calling Parent::session_response), which prompts init to discard the local copy of the session capability. The reference count of this capability reaches zero (which is expected), triggering the deletion of the capability's local ID from init's capability space. For some reason, the corresponding slot is already marked as free, which is unexpected.
To investigate, it is important to understand the situation a bit better. E.g., which session type is the troublemaker? Who is the client, who is the server? Of course, plain instrumentation with debug messages would produce too much noise. I would hence try to add conditional messages that are only printed in "interesting" situations. E.g., the attached patch prints a message with session information, but only in the unexpected exception case. The output may help to form a better mental picture of what is happening. E.g., is the session an environment session or a higher-level session?
Regarding general debugging hints, it depends on how "sporadic" is this issue? Can you easily provoke it? If not, I would first try to work on the reproducibility, e.g., by increasing the frequency of init reconfigurations. I'd also try to strip down the complexity of the scenario step by step, e.g., by removing one component after another while validating that the problem persists.
Best wishes! I'm eager to learn what just happened here.
Norman
Hi Norman
Thanks for your input.
I have found a combination that produces the error most of the times in the first dozen or so config changes, this makes testing a lot easier.
I tried your patch, but unfortunately no output was produced, then I applied it to all other locations where `_revert_quota_and_destroy()` is called, but I still don't get any output (using raw instead also didn't help). For debug purposes I therefore added a log output in `Child::session_closed()` just before the call to `_revert_quota_and_destroy()` With this I always get the output that the last closed session, before the exception, is of type `File_system`. I assumed therefore, that most probably the error happens when a `File_system` is closed.
With this assumption and the label of the closing connection I replaced the component in question with a `chroot` component. With this the exception still occurs. All of the test up until here were done on 20.08 and 20.11.
Then I started to create a very stripped down version of our scenario that only uses a few different components [1].
This did not result in the exception being thrown, but there seems to be a capability leak somewhere along the way, if multiple `chroot` components in nested sub inits are chained.
In our original scenario there are no two `chroot` components chained. But as the error also occurred when I replaced the original component with a `chroot`, I tried to use `chroot` in the test.
Shall I create a issue for this finding?
[1] https://github.com/trimpim/genode/tree/init_bughunt_v2
Regards, Pirmin