Hi Norman,
I would like to comment on a few things...
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory. This includes peeking at all page tables, followed by reading the memory pages of the desired user process via their physical backing store.
NF> There are in-kernel data structures like page tables and thread NF> control blocks. Hence, Meltdown may allow untrusted user code to NF> fingerprint the kernel or gather other information about those NF> in-kernel data structures. E.g., detect the presence of threads and NF> protection domains. It is currently unclear to me, in which ways an NF> attacker may exploit this meta data.
If an attacker can read the thread control block of other processes in the kernel, then he gets full access to the register state of preempted threads. The value of the instruction pointer and GPRs can be very valuable, for example when other processes are in the middle of crypto operations. The TCBs may actually be a lot more valuable than other kernel metadata.
NF> Considering that a microkernel has a fairly low cache footprint and NF> only cached information can be leaked via Meltdown, it might be NF> interesting to get hold of the actual scope of information.
Meltdown can leak any information that is marked as "present" in the page tables of the current process. What's not currently cached can result in speculative cache fills, as long as the page is marked cacheable. So it's not just cached information that can leak, it's more.
NF> Even though the Spectre attack affects components on top of Genode NF> that use a JIT-based VM, the microkernel cannot be easily targeted. NF> In contrast to the Linux kernel, which contains a JIT-based VM in NF> the form of the Berkeley Packet Filter, there is no way to NF> deliberately inject certain code patterns into a microkernel. In the NF> worst case, however, the kernel may already contain a code sequence NF> to exploit as a gadget. So it might be sensible to analyze the NF> kernel code in the light of the Spectre attack. Fortunately - in NF> contrast to a monolithic kernel - a microkernel is not a rapidly NF> moving target.
Indirect branch restricted speculation (IBRS) is a new x86 architecture extension to mitigate the effects of Spectre. It requires microcode patches that are rolling out right now. Retpolines is another (software-only) approach. Which one to use when gets messy very quickly: https://docs.google.com/document/d/e/2PACX-1vSMrwkaoSUBAFc6Fjd19F18c1O9pudkf...
NF> It is far too early to draw definite conclusions. E.g., it is unclear to NF> me if and how Intel's microcode updates [5] address parts of the NF> attacks.
Intel ucode adds IBRS.
NF> Since implementing mitigation measures will require significant effort, NF> and performance penalties are to be expected, we won't eagerly go NF> forward on our own right now. The scope and time frame of possible NF> mitigations come down to the priorities of Genode's commercial users.
IPC will definitely become slower as a result.
Since only Intel seems to be affected by Meltdown, the big question will be whether to use different page-table layouts for different vendors or to use the same layout for all. Would you be willing to degrade AMD IPC performance in favor of having one x86 kernel that works the same way everywhere rather than having multiple different implementations or different code paths that need independent testing/verification?
Cheers, Udo
Hi Udo,
On 05.01.2018 17:36, Udo Steinberg wrote:
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory.
before starting to dig/read through all our supported kernels (I'm not all familiar with the internals) - can you please elaborate a bit about which microkernels, according to your knowledge, have all physical memory mapped in the kernel ?
The currently supported microkernels for Genode are Pistachio, OKL4, L4/Fiasco, Fiasco.OC, Nova, seL4 and our own hw kernel.
Thanks,
Hi,
On Fri, Jan 05, 2018 at 08:24:19PM +0100, Alexander Boettcher wrote:
Hi Udo,
On 05.01.2018 17:36, Udo Steinberg wrote:
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory.
before starting to dig/read through all our supported kernels (I'm not all familiar with the internals) - can you please elaborate a bit about which microkernels, according to your knowledge, have all physical memory mapped in the kernel ?
The currently supported microkernels for Genode are Pistachio, OKL4, L4/Fiasco, Fiasco.OC, Nova, seL4 and our own hw kernel.
A partial response related to the last mentioned kernel.
I can warrant that the hw kernel, which is actually Genode's core component combined with a bit of architectural dependent data-structures (e.g. page-tables) and routines, does not contain physical memory mappings used by user-level components. The only exception are the UTCBs already mentioned by Norman. Only before memory is handed over to a user-level component, it gets temporarily faded into the kernel/core to fill it with zeroes. Afterwards it gets detached before it is used by other components.
I would wonder if okl4 or sel4 have all physical memory mapped to the kernel area in each address space, because they do not use L4-like memory propagation via synchronous IPC (aka mapping data-base) as far as I know, or do they?
Best regards Stefan
Thanks,
-- Alexander Boettcher Genode Labs
http://www.genode-labs.com - http://www.genode.org
Genode Labs GmbH - Amtsgericht Dresden - HRB 28424 - Sitz Dresden Geschäftsführer: Dr.-Ing. Norman Feske, Christian Helmuth
Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
genode-main mailing list genode-main@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/genode-main
On Sat, 6 Jan 2018 04:20:25 +0100 Stefan Kalkowski (SK) wrote:
On Fri, Jan 05, 2018 at 08:24:19PM +0100, Alexander Boettcher wrote:
On 05.01.2018 17:36, Udo Steinberg wrote:
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory.
before starting to dig/read through all our supported kernels (I'm not all familiar with the internals) - can you please elaborate a bit about which microkernels, according to your knowledge, have all physical memory mapped in the kernel ?
The currently supported microkernels for Genode are Pistachio, OKL4, L4/Fiasco, Fiasco.OC, Nova, seL4 and our own hw kernel.
I can warrant that the hw kernel, which is actually Genode's core component combined with a bit of architectural dependent data-structures (e.g. page-tables) and routines, does not contain physical memory mappings used by user-level components.
Hi,
I am not following the recent development of all those kernels, so I think it's best to directly consult the individual developers/teams for statements (like the one from Stefan above).
For my part, I can tell you that the NOVA microhypervisor (at least the official version) does not map physical RAM into the kernel virtual address space, other than the RAM in which microhypervisor itself resides. NOVA maps certain devices (like APIC, IOMMU), but those can't be speculatively accessed anyway. I cannot comment on modified NOVA versions.
Some commercial kernels and L4/Fiasco certainly used to map as much physical memory as can fit into the kernel address space. Not sure if Fiasco.OC retains that behavior. Check for Physmem in class Mem_layout.
Also any kernel that performs certain things like long IPC via a lazily flushed IPC window may have transient mappings of memory belonging to other user processes.
Cheers, Udo
I am not following the recent development of all those kernels, so I think it's best to directly consult the individual developers/teams for statements (like the one from Stefan above).
For my part, I can tell you that the NOVA microhypervisor (at least the official version) does not map physical RAM into the kernel virtual address space, other than the RAM in which microhypervisor itself resides. NOVA maps certain devices (like APIC, IOMMU), but those can't be speculatively accessed anyway. I cannot comment on modified NOVA versions.
I for my part, can confirm that the slightly, cough, modified NOVA version [1], as used by Genode, kept the original behavior of the official NOVA version [0] in that regard.
Some commercial kernels and L4/Fiasco certainly used to map as much physical memory as can fit into the kernel address space. Not sure if Fiasco.OC retains that behavior. Check for Physmem in class Mem_layout.
Also any kernel that performs certain things like long IPC via a lazily flushed IPC window may have transient mappings of memory belonging to other user processes.
Thanks for the insights,
Alex.
[0] https://github.com/udosteinberg/NOVA [1] https://github.com/alex-ab/NOVA/tree/r9
On Fri, 5 Jan 2018 22:38:39 +0100 Alexander Boettcher (AB) wrote:
I am not following the recent development of all those kernels, so I think it's best to directly consult the individual developers/teams for statements (like the one from Stefan above).
For my part, I can tell you that the NOVA microhypervisor (at least the official version) does not map physical RAM into the kernel virtual address space, other than the RAM in which microhypervisor itself resides. NOVA maps certain devices (like APIC, IOMMU), but those can't be speculatively accessed anyway. I cannot comment on modified NOVA versions.
I for my part, can confirm that the slightly, cough, modified NOVA version [1], as used by Genode, kept the original behavior of the official NOVA version [0] in that regard.
An addition after looking at the old code some more:
Note that Pd::kern, i.e. the kernel PD, actually has all physical memory mapped 1:1, simply to have an elegant (non-special-case) way to establish the root of the mapping hierarchy. However, no user thread ever runs in Pd::kern, so those mappings cannot be speculatively abused.
In PDs, where user threads do run, physical memory is not mapped in the page tables.
Cheers, Udo
Hi Udo,
thank you for joining the discussion. Nice to see a life sign of you after so long time! ;-)
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory. This includes peeking at all page tables, followed by reading the memory pages of the desired user process via their physical backing store.
As confirmed by Stefan, Alexander, and you, both NOVA and our base-hw kernel do not have this problem. For seL4, let us see how the kernel developers respond to the issue. The other microkernels (OKL4, L4/Fiasco, Fiasco.OC, Pistachio) are not of much interest as we support them for nostalgic reasons only.
NF> There are in-kernel data structures like page tables and thread NF> control blocks. Hence, Meltdown may allow untrusted user code to NF> fingerprint the kernel or gather other information about those NF> in-kernel data structures. E.g., detect the presence of threads and NF> protection domains. It is currently unclear to me, in which ways an NF> attacker may exploit this meta data.
If an attacker can read the thread control block of other processes in the kernel, then he gets full access to the register state of preempted threads. The value of the instruction pointer and GPRs can be very valuable, for example when other processes are in the middle of crypto operations. The TCBs may actually be a lot more valuable than other kernel metadata.
True, it is bad. But let us acknowledge that the problem is at a different magnitude. With a monolithic kernel, an attacker can read secrets from the kernel at a rate of multiple KiB/sec. In contrast, the attack you sketched samples CPU registers of a remote thread at a rate of 100 times/sec (when the crypto-computing thread is preempted at the default time-slice length of 10ms, disregarding interrupts). Given that the secrets processed by the remote thread must be in CPU registers when the thread was preempted, information about crypto keys can leak only if they are currently in use. The economy of the attack is vastly different - it becomes similar to the cache-based side-channel attacks we already know and accept to exist.
Intuitively, I'd argue that if crypto material is valuable enough to justify the costs of such an attack, it should never be in reach of an Intel CPU that executes untrusted code to begin with. Instead it should stay on a dedicated smartcard, security token, trustzone-like enclave, or another form of HSM.
Please don't take my stance as an attempt to downplay the issue. Of course, I want to see it mitigated as it complicates our argumentation for co-hosting trusted and untrusted components side by side. If left unmitigated, we'd ultimately need to assess the possible impact of leaked in-kernel thread state, which is not a discussion I want to enter.
NF> Considering that a microkernel has a fairly low cache footprint and NF> only cached information can be leaked via Meltdown, it might be NF> interesting to get hold of the actual scope of information.
Meltdown can leak any information that is marked as "present" in the page tables of the current process. What's not currently cached can result in speculative cache fills, as long as the page is marked cacheable. So it's not just cached information that can leak, it's more.
...
Intel ucode adds IBRS.
Thank you for the clarification and the valuable pointers!
Since only Intel seems to be affected by Meltdown, the big question will be whether to use different page-table layouts for different vendors or to use the same layout for all. Would you be willing to degrade AMD IPC performance in favor of having one x86 kernel that works the same way everywhere rather than having multiple different implementations or different code paths that need independent testing/verification?
Given the scarcity of NOVA experts and maintainers, I would prefer to avoid special cases, even if it means that AMD performance unjustly suffers for the time being. I have to acknowledge that most of Genode's funding comes from users of Intel platforms.
Do you have a feeling how invasive such a change of NOVA would be?
Cheers Norman
Hi,
On 2018-0105 at 20:24:19 +0100, Alexander Boettcher wrote:
Hi Udo,
On 05.01.2018 17:36, Udo Steinberg wrote:
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory.
before starting to dig/read through all our supported kernels (I'm not all familiar with the internals) - can you please elaborate a bit about which microkernels, according to your knowledge, have all physical memory mapped in the kernel ?
The currently supported microkernels for Genode are Pistachio, OKL4, L4/Fiasco, Fiasco.OC, Nova, seL4 and our own hw kernel.
I would like to give a comment on Fiasco.OC. Fiasco.OC / L4Re is vulnerable to Meltdown-like attacks because the kernel is mapped into each task. However, the kernel does not map all physical memory but only memory it requires for its own data structures + kernel-user memory required for e.g. UTCBs and vCPU state save areas. Depending on the amount of physical memory and the available page sizes, Fiasco.OC may map it little bit more than that to save TLB entries. That means there can be a slight overlap of user memory that is visible to the kernel. But it is not possible for a thread to read _all_ memory.
Because we think that no thread should read information from other threads (pagetables, capability arrays, UTCBs etc.) we will change Fiasco.OC to execute in its own address space on Intel CPUs.
Against Spectre we do not plan to implement anything right now. We think the attack surface of the kernel is very little (if any) and may be even further reduced with Intel's microcode updates and future compiler/tool mitigations. However, we will observe future discussions and developments and may reassess this in the future.
Thank you and regards, Matthias.
Hello Matthias,
thank you for the information !
Alex.
On 08.01.2018 10:47, Matthias Lange wrote:
Hi,
On 2018-0105 at 20:24:19 +0100, Alexander Boettcher wrote:
Hi Udo,
On 05.01.2018 17:36, Udo Steinberg wrote:
Norman Feske (NF) wrote:
NF> In contrast to monolithic kernels, a microkernel like base-hw, NOVA, NF> or seL4 does not deal with any user-level content like cryptographic NF> secrets, or the content of files. There is hardly any credential to NF> leak to begin with. User content stays outside the microkernel.
While it is true that a microkernel stores significantly fewer secrets than a monolithic kernel, like Linux, most microkernels actually have a full mapping of the entire physical memory in the kernel portion of each address space, which allows an attacker to peek anywhere into physical memory.
before starting to dig/read through all our supported kernels (I'm not all familiar with the internals) - can you please elaborate a bit about which microkernels, according to your knowledge, have all physical memory mapped in the kernel ?
The currently supported microkernels for Genode are Pistachio, OKL4, L4/Fiasco, Fiasco.OC, Nova, seL4 and our own hw kernel.
I would like to give a comment on Fiasco.OC. Fiasco.OC / L4Re is vulnerable to Meltdown-like attacks because the kernel is mapped into each task. However, the kernel does not map all physical memory but only memory it requires for its own data structures + kernel-user memory required for e.g. UTCBs and vCPU state save areas. Depending on the amount of physical memory and the available page sizes, Fiasco.OC may map it little bit more than that to save TLB entries. That means there can be a slight overlap of user memory that is visible to the kernel. But it is not possible for a thread to read _all_ memory.
Because we think that no thread should read information from other threads (pagetables, capability arrays, UTCBs etc.) we will change Fiasco.OC to execute in its own address space on Intel CPUs.
Against Spectre we do not plan to implement anything right now. We think the attack surface of the kernel is very little (if any) and may be even further reduced with Intel's microcode updates and future compiler/tool mitigations. However, we will observe future discussions and developments and may reassess this in the future.
Thank you and regards, Matthias.