Hi, Genodians!
Some days ago, I ordered a Thinkpad T420, which is listed in Genode unofficial HCL by cnuke. I set up Genode booting remotely via PXE+Intel AMT. It boots the system image (x86_32 arch) just fine on real hardware. But if I try booting an x86_64 image, it reboots on most kernels, or traps on Fiasco.OC. Last messages in the log I see are from "bender" [1]. This is when booting using NOVA kernel. If I try Fiasco.OC as a kernel ([2]), It goes until "bootstrap", and last I see in the log are "bootstrap" messages, until it attempts to start the kernel. And at the same time, I see the following trap screen from bootstrap: [3]. All registers are zeroes, except a few ones. So, with "foc" kernel, I see a Trap #0 (division by zero). In case of "NOVA" and "base-hw" kernel, it does not trap, but just reboots. I tried to remove the "novga" parameter from NOVA kernel, but I still see a blank screen.
BIOS memory map is here: [4], [5]
Looking at the memory map, I don't see any memory holes at the same place where modules were loaded.
Any ideas?
[1] Boot log with NOVA kernel: sculpt64-nova.log (attached) [2] Boot log with Fiasco.OC kernel: sculpt64-foc.log (attached) [3] A trap screen from bootstrap, Fiasco.OC kernel: ftp://osfree.org/upload/img/20190307_004.jpg [4] BIOS memory map (part 1): ftp://osfree.org/upload/img/20190307_001.jpg [5] BIOS memory map (part 2): ftp://osfree.org/upload/img/20190307_002.jpg
WBR, valery
Hello,
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Hi, Genodians!
Some days ago, I ordered a Thinkpad T420, which is listed in Genode unofficial HCL by cnuke. I set up Genode booting remotely via PXE+Intel AMT. It boots the system image (x86_32 arch) just fine on real hardware. But if I try booting an x86_64 image, it reboots on most kernels, or traps on Fiasco.OC. Last messages in the log I see are from "bender" [1]. This is when booting using NOVA kernel. If I try Fiasco.OC as a kernel ([2]), It goes until "bootstrap", and last I see in the log are "bootstrap" messages, until it attempts to start the kernel. And at the same time, I see the following trap screen from bootstrap: [3]. All registers are zeroes, except a few ones. So, with "foc" kernel, I see a Trap #0 (division by zero). In case of "NOVA" and "base-hw" kernel, it does not trap, but just reboots. I tried to remove the "novga" parameter from NOVA kernel, but I still see a blank screen.
I could reproduce it after same configuration changes in the UEFI BIOS of my test T420. When I set in the UEFI/BIOS the option "Security -> Memory Protection -> Execution Prevention" to disabled, the machine reboots.
For my T420 test machine it is set to enabled, and then it works, log attached.
Hope it helps,
Alex.
make: Entering directory '/home/alex/genode/sculpt_release/build/x86_64' including /home/alex/genode/sculpt_release/tool/run/boot_dir/nova including /home/alex/genode/sculpt_release/tool/run/load/tftp including /home/alex/genode/sculpt_release/tool/run/log/serial including /home/alex/genode/sculpt_release/repos/gems/run/sculpt_test.run checking configuration syntax CHECK init using 'core-nova.o' as 'core.o' spawn /bin/sh -c picocom -b 115200 /dev/ttyUSB0 picocom v2.2
port is : /dev/ttyUSB0 flowcontrol : none baudrate is : 115200 parity is : none databits are : 8 stopbits are : 1 escape is : C-a local echo is : no noinit is : no noreset is : no nolock is : no send_cmd is : sz -vv receive_cmd is : rz -vv -E imap is : omap is : emap is : crcrlf,delbs,
Type [C-a] [C-h] to see available commands
Terminal ready Bender: Hello World. Need 039fb000 bytes to relocate modules. Relocating to 7c605000: Copying 60629744 bytes... Copying 162240 bytes...
NOVA Microhypervisor v8-b4904a1 (x86_64): Feb 28 2019 14:26:07 [gcc 6.3.0] [MBI]
[ 0] TSC:2400000 kHz BUS:0 kHz DL [ 0] CORE:0:0:0 6:2a:7:4 [29] Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz [ 2] CORE:0:1:0 6:2a:7:4 [29] Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz [ 1] CORE:0:0:1 6:2a:7:4 [29] Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz [ 3] CORE:0:1:1 6:2a:7:4 [29] Intel(R) Core(TM) i5-2430M CPU @ 2.40GHz Hypervisor features VMX Hypervisor reports 4x1 CPUs
Alexander Boettcher wrote:
Hello,
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Hi, Genodians!
Some days ago, I ordered a Thinkpad T420, which is listed in Genode unofficial HCL by cnuke. I set up Genode booting remotely via PXE+Intel AMT. It boots the system image (x86_32 arch) just fine on real hardware. But if I try booting an x86_64 image, it reboots on most kernels, or traps on Fiasco.OC. Last messages in the log I see are from "bender" [1]. This is when booting using NOVA kernel. If I try Fiasco.OC as a kernel ([2]), It goes until "bootstrap", and last I see in the log are "bootstrap" messages, until it attempts to start the kernel. And at the same time, I see the following trap screen from bootstrap: [3]. All registers are zeroes, except a few ones. So, with "foc" kernel, I see a Trap #0 (division by zero). In case of "NOVA" and "base-hw" kernel, it does not trap, but just reboots. I tried to remove the "novga" parameter from NOVA kernel, but I still see a blank screen.
I could reproduce it after same configuration changes in the UEFI BIOS of my test T420. When I set in the UEFI/BIOS the option "Security -> Memory Protection -> Execution Prevention" to disabled, the machine reboots.
For my T420 test machine it is set to enabled, and then it works, log attached.
Hope it helps,
Yes, that helps. I had "Execution prevention" disabled too. I thought that it may cause problems if enabled, but it is vice versa. So, I enabled it again. In case of NOVA and Fiasco.OC kernels, it boots now fine with "Execution prevention" enabled. With "hw" kernel, it now stops after bender's messages, instead of a reboot, though. I'm sorry that I didn't tried with "Execution prevention" enabled, before posting the message. Didn't wanted to disturb you with no reason
P.S.: I wondered why 32-bit works fine, but cnuke on the irc channel supposed that it could be because of NX bit, which is enabled on 32-bit normally only if PAE is enabled.
WBR, valery
Alex.
...
On 08.03.19 11:25, Valery V. Sedletski via users wrote:
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Alexander Boettcher wrote:
I could reproduce it after same configuration changes in the UEFI BIOS of my test T420. When I set in the UEFI/BIOS the option "Security -> Memory Protection -> Execution Prevention" to disabled, the machine reboots.
For my T420 test machine it is set to enabled, and then it works, log attached.
Hope it helps,
Yes, that helps. I had "Execution prevention" disabled too. I thought that it may cause problems if enabled, but it is vice versa. So, I enabled it again. In case of NOVA and Fiasco.OC kernels, it boots now fine with "Execution prevention" enabled. With "hw" kernel, it now stops after bender's messages, instead of a reboot, though. I'm sorry that I didn't tried with "Execution prevention" enabled, before posting the message. Didn't wanted to disturb you with no reason
Nothing to excuse. Now the issue is documented via the mailing list archive so that other can benefit from it. You're welcome.
P.S.: I wondered why 32-bit works fine, but cnuke on the irc channel supposed that it could be because of NX bit, which is enabled on 32-bit normally only if PAE is enabled.
Some kernels don't use PAE in 32bit, or in the kernel configuration it is not enabled. That means the kernels in 32bit just don't use it and therefore they booted up.
In 64bit mode the NX-bit is supposed to be supported by all CPUs, never have seen (until today) that you can disable such a security feature. It makes no sense for modern OSes at all.
So, something new learned and now documented ;-)
Thanks,
Alex.
On Fri Mar 8 15:57:20 2019 Alexander Boettcher alexander.boettcher@genode-labs.com wrote:
On 08.03.19 11:25, Valery V. Sedletski via users wrote:
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Alexander Boettcher wrote:
P.S.: I wondered why 32-bit works fine, but cnuke on the irc channel supposed that it could be because of NX bit, which is enabled on 32-bit normally only if PAE is enabled.
Some kernels don't use PAE in 32bit, or in the kernel configuration it is not enabled. That means the kernels in 32bit just don't use it and therefore they booted up.
In 64bit mode the NX-bit is supposed to be supported by all CPUs, never have seen (until today) that you can disable such a security feature. It makes no sense for modern OSes at all.
IIRC, the bit prevents execution of code in DATA segments (?), It's supposed as a measure against viruses. Usually, such measures break some programs, which aren't viruses. So, I suspected that enabling the NX bit could break something, not vice versa, but it appears, I'm not right. I am still wondering why disabling it can cause traps or reboots. Do the kernels functionality depend on the NX bit somehow?
So, something new learned and now documented ;-)
Thanks,
Alex.
Thank you very much for help too!
WBR, valery
-- Alexander Boettcher Genode Labs
https://www.genode-labs.com - https://www.genode.org
Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Genode users mailing list users@lists.genode.org https://lists.genode.org/listinfo/users
On 08.03.19 12:29, Valery V. Sedletski wrote:
On Fri Mar 8 15:57:20 2019 Alexander Boettcher alexander.boettcher@genode-labs.com wrote:
On 08.03.19 11:25, Valery V. Sedletski via users wrote:
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Alexander Boettcher wrote:
P.S.: I wondered why 32-bit works fine, but cnuke on the irc channel supposed that it could be because of NX bit, which is enabled on 32-bit normally only if PAE is enabled.
Some kernels don't use PAE in 32bit, or in the kernel configuration it is not enabled. That means the kernels in 32bit just don't use it and therefore they booted up.
In 64bit mode the NX-bit is supposed to be supported by all CPUs, never have seen (until today) that you can disable such a security feature. It makes no sense for modern OSes at all.
IIRC, the bit prevents execution of code in DATA segments (?), It's supposed as a measure against viruses. Usually, such measures break some programs, which aren't viruses. So, I suspected that enabling the NX bit could break something, not vice versa, but it appears, I'm not right. I am still wondering why disabling it can cause traps or reboots. Do the kernels functionality depend on the NX bit somehow?
The kernel could/should check whether the NX feature is available by hardware, and solely if, then should actually use it. The kernels obviously don't do that.
When you look on how the NX bit is actually implemented, you will see that the 63bit in the page table entry is used. With NX enabled, the 63. bit decides whether code is executable (0) or not (1). If NX is disabled, the 63 bit is part of your normal address lookup procedure.
If the kernel now set the 63. bit, but actually it is part of the normal address page table walk (NX not supported), you will end up in wrong addresses. Additionally, you also get non-canonical addresses, which is not permitted and leads to hardware exceptions raised (see CPU documentation of Intel/AMD).
Cheers,
Alexander Boettcher wrote:
On 08.03.19 12:29, Valery V. Sedletski wrote:
On Fri Mar 8 15:57:20 2019 Alexander Boettcher alexander.boettcher@genode-labs.com wrote:
On 08.03.19 11:25, Valery V. Sedletski via users wrote:
On 07.03.19 23:10, Valery V. Sedletski via users wrote:
Alexander Boettcher wrote:
P.S.: I wondered why 32-bit works fine, but cnuke on the irc channel supposed that it could be because of NX bit, which is enabled on 32-bit normally only if PAE is enabled.
Some kernels don't use PAE in 32bit, or in the kernel configuration it is not enabled. That means the kernels in 32bit just don't use it and therefore they booted up.
In 64bit mode the NX-bit is supposed to be supported by all CPUs, never have seen (until today) that you can disable such a security feature. It makes no sense for modern OSes at all.
IIRC, the bit prevents execution of code in DATA segments (?), It's supposed as a measure against viruses. Usually, such measures break some programs, which aren't viruses. So, I suspected that enabling the NX bit could break something, not vice versa, but it appears, I'm not right. I am still wondering why disabling it can cause traps or reboots. Do the kernels functionality depend on the NX bit somehow?
The kernel could/should check whether the NX feature is available by hardware, and solely if, then should actually use it. The kernels obviously don't do that.
When you look on how the NX bit is actually implemented, you will see that the 63bit in the page table entry is used. With NX enabled, the 63. bit decides whether code is executable (0) or not (1). If NX is disabled, the 63 bit is part of your normal address lookup procedure.
If the kernel now set the 63. bit, but actually it is part of the normal address page table walk (NX not supported), you will end up in wrong addresses. Additionally, you also get non-canonical addresses, which is not permitted and leads to hardware exceptions raised (see CPU documentation of Intel/AMD).
Thanks for the explanation. So, probably, this bit was included into the page address, which is invalid, and there was a trap because of an incorrect address.
WBR, valery
Cheers,