Hi all,
I have a scenario with many inits on top of each other, and somewhere lives the wifi_drv. When I add another init called 'init_wifi' on top so I can try to kill/restart the wifi_drv, the system hangs and I get the following log:
[init -> init_system -> init_user -> init_wifi] child "wifi_drv" [init -> init_system -> init_user -> init_wifi] RAM quota: 130816K [init -> init_system -> init_user -> init_wifi] cap quota: 168 [init -> init_system -> init_user -> init_wifi] ELF binary: wifi_drv [init -> init_system -> init_user -> init_wifi] priority: 0 [init -> init_system -> init_user -> init_wifi] provides service Nic [init -> init_system -> init_user -> init_wifi] child "nic_dump" announces service "Nic" Error: corrupted string [init -> init_system -> init_user -> init_wifi -> wifi_drv] Reload wpa_supplicant configuration [init] Warning: re-attempted PD session request 2 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 4 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 8 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 16 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) etcetera, with double retries each time.
Now when this happens the whole system hangs, nothing else is working anymore.
As soon as the wifi_drv is started, this problem occurs.
When I did not have init_wifi and the wifi_drv was running on top of init_user, everything worked fine. Of course, I adjusted all the routes and policies (of the platform_drv) accordingly.
Could the label be too long for core to handle perhaps, or is something else happening? What do these error messages mean and why do they pop up now and not with one init less?
Hi all,
I have a scenario with many inits on top of each other, and somewhere lives the wifi_drv. When I add another init called 'init_wifi' on top so I can try to kill/restart the wifi_drv, the system hangs and I get the following log:
[init -> init_system -> init_user -> init_wifi] child "wifi_drv" [init -> init_system -> init_user -> init_wifi] Â RAM quota:Â 130816K [init -> init_system -> init_user -> init_wifi]Â Â cap quota:Â 168 [init -> init_system -> init_user -> init_wifi]Â Â ELF binary: wifi_drv [init -> init_system -> init_user -> init_wifi]Â Â priority:Â Â 0 [init -> init_system -> init_user -> init_wifi]Â Â provides service Nic [init -> init_system -> init_user -> init_wifi] child "nic_dump" announces service "Nic" Error: corrupted string [init -> init_system -> init_user -> init_wifi -> wifi_drv] Reload wpa_supplicant configuration [init] Warning: re-attempted PD session request 2 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 4 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 8 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) [init] Warning: re-attempted PD session request 16 times (args: virt_space=0, phys_start=0x0, phys_size=0x100000000, diag=0, label="platform_drv -> init_system -> init_user -> init_wifi -> wifi_drv -> ", cap_quota=13) etcetera, with double retries each time.
Now when this happens the whole system hangs, nothing else is working anymore.
As soon as the wifi_drv is started, this problem occurs.
When I did not have init_wifi and the wifi_drv was running on top of init_user, everything worked fine. Of course, I adjusted all the routes and policies (of the platform_drv) accordingly.
Could the label be too long for core to handle perhaps, or is something else happening? What do these error messages mean and why do they pop up now and not with one init less?
Hello Boris,
On 16.02.2018 12:50, Boris Mulder wrote:
[init -> init_system -> init_user -> init_wifi] child "nic_dump" announces service "Nic" Error: corrupted string ... Now when this happens the whole system hangs, nothing else is working anymore.
As soon as the wifi_drv is started, this problem occurs.
When I did not have init_wifi and the wifi_drv was running on top of init_user, everything worked fine. Of course, I adjusted all the routes and policies (of the platform_drv) accordingly.
Could the label be too long for core to handle perhaps, or is something else happening? What do these error messages mean and why do they pop up now and not with one init less?
that is very likely the case.
All session arguments including the label are passed as a single 'Rpc_in_buffer<160>' argument to the `Parent::session` RPC function. Whenever the session request is forwarded to a parent, the parent prefixes the label with the originating child's name. Eventually, when the nesting level becomes very deep, the resulting label might not fit into the argument anymore.
The message "Error: corrupted string" is not directly printed by the session-creation code path but by the log service [1], but it is plausible that it is caused by a truncated label.
I wonder, have you tried to synthetically reproduce the problem outside your actual scenario? This would greatly ease the diagnosis.
[1] src/core/include/log_session_component.h
Cheers Norman
Oops, I acidentally hit reply instead of reply list. Here it is again.
that is very likely the case.
All session arguments including the label are passed as a single 'Rpc_in_buffer<160>' argument to the `Parent::session` RPC function. Whenever the session request is forwarded to a parent, the parent prefixes the label with the originating child's name. Eventually, when the nesting level becomes very deep, the resulting label might not fit into the argument anymore.
The message "Error: corrupted string" is not directly printed by the session-creation code path but by the log service [1], but it is plausible that it is caused by a truncated label.
I wonder, have you tried to synthetically reproduce the problem outside your actual scenario? This would greatly ease the diagnosis.
[1] src/core/include/log_session_component.h
Cheers Norman
A bare run script with a wifi_drv on top of many inits gives the reattempted PD session too, as well as rom_session denied errors for boot modules. Which errors it gives seems to be dependant on the length of the label. This confirms it, since the session request with the longer labels seem to be rejected.
In that case I will try to perhaps move init_wifi down one or two levels, see if that works.
Thanks for your reply.
P.S. Perhaps it would be nice if the session arguments are actually checked for length and a warning is given if they do not fit into the rpc buffer? It took me some time to identify the problem in this case.