Opened 15 years ago

Closed 12 years ago

#3 closed enhancement (fixed)

Memory management limitations

Reported by: Martin Decky Owned by: Martin Decky
Priority: major Milestone: 0.5.0
Component: helenos/kernel/generic Version: mainline
Keywords: Cc: jakub@…, zdenek.bouska@…
Blocker for: Depends on: #343
See also:

Description

Running VESA framebuffer on a machine with more that 1 GB of physical memory shows some minor corruptions of the framebuffer in kernel console. The exact reason is not yet known (tested only in QEMU).

The kernel is unable to run (due to design limitations of the memory management subsystem) on a machine with more that 2 GB of physical memory on ia32. A solution has been proposed in the mailing list:

http://lists.modry.cz/cgi-bin/private/helenos-devel/2009-February/002397.html

Change History (19)

comment:1 by Martin Decky, 15 years ago

Milestone: 0.5.0

comment:2 by Jakub Jermář, 14 years ago

Cc: jakub@… added
Component: unspecified

I am copying the proposal here for easier reference.

1) The mechanism of physical memory zones has to be extended to know
   about such properties as various special zones (e.g. the lower 16 MB
   legacy DMA zone and the somehow strange lower 1 MB zone on ia32),
   legacy devices, interrupt vectors, memory allocated during the boot
   process, occupied by static kernel structures and so on. We certainly
   do handle some of these things now, but sometimes in a very
   unsystematic way.

   - Currently the kernel is using some platform-dependent techniques
     how to avoid colliding with its own memory during boot (ballocs
     on sparc64, etc.). This should be unified in one framework.

   - The buddy system of frames can be perhaps improved by using
     using two-level management (buddy + bitmaps) or other clever ideas.

   - If the zones were the primary structure for describing all
     important properties of the physical memory address space, the
     separate pareas and their tweaking on some platforms will be no
     longer needed.

2) We should introduce another layer of abstraction between the physical
   and virtual memory management which might be called "memory
   segments". This should be a generic description of the kernel virtual
   address space -- i.e. where is the user space memory segment,
   where is the initial 1:1 identity mapped kernel segment (which should
   be as small as possible) and where is the kernel on-demand mapped
   "high memory", where the slab allocator lives and where also hardware
   devices are mapped.

   On most platforms you can design these segments in almost any way you
   like as it is only a convention. But for example on mips32 the
   configuration of the segments is enforces by the hardware and on
   amd64 the address space layout is even more confusing.

   The current macros KA2PA, PA2KA, KERNEL_ADDRESS_SPACE_START_ARCH,
   KERNEL_ADDRESS_SPACE_END_ARCH, etc. and the basic assumption that the
   kernel is using the 1:1 identity mapping for most of its purposes is
   very rudimentary. It is the cause for many limitations:

   - Impossibility to work with more than 512 MB of RAM on mips32.
   - Issues with more than 2 GB of RAM on ia32 and even amd64 (no space
     for mapping of the framebuffer and other issues).
   - Probably no way how to implement PAE on ia32 properly.
   - Most of the problems with memory management of the ia32xen port
     (now dead) were caused by the issues with creating the kernel
     1:1 identity mapping for the whole available memory. This memory
     model simply does not scale.

It may be a good idea to split these into multiple tickets so that the issues
can be tracked individually. I don't see any reason why e.g. the zones-related
suggestions and improvements should depend (by virtue of being in one ticket)
on the realization of other suggestions here such as implementation of non 1:1
mappings.

Some suggestions such as the one about introducing bitmaps next to the buddy
system would also profit from moving to a separate ticket as it is not clear
why such a change would be needed (is the current implementation limiting the
physical memory management in any way?) and what would be the benefits of it.

Contrary to:

   where is the initial 1:1 identity mapped kernel segment (which should
   be as small as possible) and where is the kernel on-demand mapped
   "high memory"

shouldn't the 1:1 identity mapped kernel segment be as _large_ as possible?
Having it as _small_ as possible, for the benefit of the non-identity mapped
high memory segment, will most likely cause performance degradation because
the kernel will need to map/unmap pages from the high memory segment on demand.
The more frequently this happens, the bigger the performance hit. If we realize
that the kernel will have to be able to access the whole physical memory, which
can be much larger than the non-identity mapped segment, it cannot afford to map
the high memory into this area permanently. So the pattern will most likely look
like:

   va = map_high_memory(pa, cnt);
   do something in va
   unmap_high_memory(va, cnt);

The unmap operation can be especially expensive.

comment:3 by Martin Decky, 14 years ago

Ad splitting into multiple tickets:
OK, why not. There are clearly some subtasks which can be separated. On the other hand, other changes might depend on each other in a non-trivial way and it might be even complicated to see the dependencies before digging in. Thus I suggest to keep this ticket here and split it later as the coding progresses.

Ad as small/large as possible:
Why as large as possible? I am somehow puzzled by your example showing frequent mapping and unmapping. The 1:1 identity mapping serves two important purposes:

(a) To make the bootstrap possible without making it extremely complicated. The kernel has to "live" somewhere and allocate/initialize data structures to make the "real" memory management feasible and with respect to this it is definitively easier to have a static 1:1 memory mapping than anything else. From a different point of view, it is a chicken-and-the-egg problem.

(b) It also makes the life easier for various exception handling routines.

But that's basically it. There is no need for the kernel heap to use 1:1 identically mapped memory. The kernel virtual memory area for the SLAB allocator can grow by on-demand mapping and the allocator by its design can avoid frequent unmapping of the memory (because it can keep freed structures in its caches for some time). The memory won't be mapped and unmapped with every access, not even with every allocation.

Secondly, I don't see the reason why the kernel has to access all the physical memory. The physical memory will always be used primarily by the user space tasks (or it will be plain unused). The kernel will mostly map only those pieces of the physical address space which it requires for the heap and which it requires for accessing some hardware devices.

What is the use case you are describing (mapping a piece of physical memory, touching it briefly and unmapping it again)? Is it IPC? I believe that there are some tricks to avoid such usage patterns in IPC.

comment:4 by Jakub Jermář, 14 years ago

I believe the common sense tells us that the "as large as possible" variant is the correct one, because the identity mapping is more natural and cheaper for the kernel - you don't need to make any additional provisions/mapping/translations to access the memory which is always mapped by the identity mapping. Making this area smaller implies making the on-demand mapped area bigger. Of course, in order to agree on this, we need to perceive the "as possible" in the same way. I think it means "maximum size but still allowing the on-demand mapping to function in a useful way, if the on-demand mapping is necessary at all". For example, for me this means that if the physical memory can entirely fit in the identity mapping, then do not artificially limit the identity mapped area to smaller size in favor of the on-demand area. But why would you then?

The on-demand mapped area will change in time as mappings will come and go. If it wasn't the case then there would be no reason to do the non-identity mapping in the first place, because the kernel address space is of a fixed size, it would not matter if you always map one 4G set of frames or another.

Now, let me think of the possible uses cases that would require frequent mapping/unmapping inside of the on-demand mapped area:

  • Address space area backends The backends typically allocate some physical memory for the task when it references a page in its address space area for the first time. For security and other reasons, this memory must be accessible to the kernel, because it must zero / initialize it before mapping it to userspace. After zeroing / initializing this memory, the kernel does not need the mapping any more.
  • Memory allocated using the frame allocator This is a slightly different from the slab allocator case, where we can assume that the memory will be already mapped by the allocator for us. There are various places in the code that allocate memory using the frame allocator directly. Even this memory needs to be accessible from the kernel address space. If we are talking about kernel thread stacks, this could become an issue in case of many short lived threads.
  • Ticket #12 when it's implemented You basically map a frame from one address space to another and copy some data to it. Then you unmap it.
  • COW when it's implemented

comment:5 by Martin Decky, 14 years ago

As in many other discussions we can't easily agree on something based on "common sense" or "what is natural", as our understanding of common sense is usually very different :). Or, in other words, we simply put different emphasis on different aspects of the topic.

For me, common sense says that the kernel address space is just an address space and should not be treated in some special way (except in situations where a special treatment is really beneficial, see my previous comment).

Please, don't consider always the absolute worst case implications, I have no intentions to break everything. You say: "If we limit the 1:1 mapping, all the sudden every memory access will be extremely slow, because you will be forced to map and unmap the memory all the time." But this an overstatement. Nothing forces you to always access the memory in the most stupidest way and waste CPU cycles just because you use on-demand mapping. You are free to write clever code which will limit many of the negative impacts on performance.

Yes, there will be still some cases where the on-demand mapping hurts performance, but on the other hand on-demand mapping is more flexible than static 1:1 mapping. You simply can't get something for nothing.

To offer you some positive compromise, perhaps neither "as small as possible" nor "as large as possible" are good quantifiers. It should be "reasonably large". The MIPS architecture demonstrates that there is a size of the 1:1 mapping which can be described as reasonably large. The designers were even so brave to hardwire it into the CPU.

for me this means that if the physical memory can entirely fit in the identity mapping, then do not artificially limit the identity mapped area to smaller size in favor of the on-demand area.

But accessing the whole physical memory is only one issue with 1:1 mapping. The other issue is the plain fact that the halved virtual address space is usually smaller than the physical address space (at least on many 32b platforms, most severely on ia32/PAE). If the physical memory is non-continuous, it might be impossible to access the memory via 1:1 mapping even when the size of the physical memory can fit into the kernel address space.

More severely if you use all kernel virtual address space for the 1:1 mapping of the physical memory, you have no space left for mapping additional pieces of the physical address space (e.g. the framebuffer). Therefore, if we agree that for example 512 MB or 1 GB is enough for the 1:1 mapping (if all the physical memory can fit in there, then hurray!), we will have fairly enough space for the on-demand mapping.

The on-demand mapped area will change in time as mappings will come and go. If it wasn't the case then there would be no reason to do the non-identity mapping in the first place

Sure, the mapping will change from time to time. This is the reason VMM was invented, isn't it? :) But it won't change wildly with every memory access and every allocation. It might not even change wildly with every syscall operation (if you use anything better than the most naive approach). Therefore the overhead should be reasonably low.

The backends typically allocate some physical memory for the task when it references a page in its address space area for the first time. For security and other reasons, this memory must be accessible to the kernel, because it must zero / initialize it before mapping it to userspace. After zeroing / initializing this memory, the kernel does not need the mapping any more.

But you do this only once per physical memory allocation request. This is just a constant overhead and you don't have to unmap the memory immediately (for the case that the client requests releasing and acquiring the same memory in a wildly manner).

The obvious question is: What would you do with 1:1 mapping when there is no place in the kernel address space to map the physical memory to? I don't know .. I prefer something that works a little bit slower every time than something that doesn't work at all sometimes.

If we are talking about kernel thread stacks, this could become an issue in case of many short lived threads.

Again, the keyword is: Cache. Keep the allocated frames mapped for some time, have a pool of these frames to avoid unnecessary short round-trips to the frame allocator, or even extend the SLAB allocator's API to better support kernel thread stacks (extending the SLAB allocator in such a way to avoid most direct calls to the frame allocator is perhaps the best solution).

Sure, you will still use the frame allocator to allocate large quantities of linear frames (such as the framebuffer) where caching is not feasible. But, again, in these cases you don't allocate and release such frames wildly.

Ticket #12 when it's implemented

Again, the keyword is: Cache. Why should you tear down the mapping and release the kernel buffers immediately after doing the operation? (Except if you are under memory pressure.) If two tasks exchange some data only say once per second, the added overhead of mapping and unmapping won't kill anybody. If they exchange data frequently, keeping the buffers ready will not only mitigate this particular overhead, but also other overheads involved.

COW when it's implemented

You lost me here completely. What does this have to do with kernel 1:1 mapping?

comment:6 by Martin Decky, 14 years ago

Just a side note:

Except for the benefits of the 1:1 identity mapping which I have described earlier (for bootstrap, exception handling, etc.), the benefits you are describing can be perhaps summarized as: The memory mapping is implicitly cached (forever).

My proposal means going mostly from this implicit caching to explicit caching. This complicates many things on one hand, but on the other hand it allows you to be less limited (the difference between the physical and the kernel address space size is an objective issue which cannot be avoided) and tune the caching policy to your specific needs.

in reply to:  5 ; comment:7 by Jakub Jermář, 14 years ago

Replying to decky:

You say: "If we limit the 1:1 mapping, all the sudden every memory access will be extremely slow, because you will be forced to map and unmap the memory all the time." But this an overstatement.

It is also a false statement. I am just saying, that in situations when we have a choice between identity and on-demand mapping, the identity mapping has smaller overhead and should be preferred.

Yes, there will be still some cases where the on-demand mapping hurts performance, but on the other hand on-demand mapping is more flexible than static 1:1 mapping. You simply can't get something for nothing.

I understand the benefits of both and I do think that we should also provide the on-demand mapping, if the system configuration demands it. If the system configuration can work without the on-demand mapping though (and there are no other motivators), I would not enable it.

To offer you some positive compromise, perhaps neither "as small as possible" nor "as large as possible" are good quantifiers. It should be "reasonably large". The MIPS architecture demonstrates that there is a size of the 1:1 mapping which can be described as reasonably large. The designers were even so brave to hardwire it into the CPU.

Reasonably large sounds good. I wanted to avoid some artificial limitations like "hey, what if we did identity-mapping for only about 256M on ia32 and, for the beauty of it, demand-mapped the rest…"

But accessing the whole physical memory is only one issue with 1:1 mapping. The other issue is the plain fact that the halved virtual address space is usually smaller than the physical address space (at least on many 32b platforms, most severely on ia32/PAE).

You are now mixing two things:

  1. identity mapping, which can be done on an arbitrary portion of the address space (if not dictated by hw), not only on 50%
  1. kernel - user address space split ratio, e.g. 2g:2g, 1g: 3g and similar

So it is the split ratio which determines how much physical memory can one address space use, not the mapping mechanism.

If the physical memory is non-continuous, it might be impossible to access the memory via 1:1 mapping even when the size of the physical memory can fit into the kernel address space.

Yes, in this case when the whole cannot be identity mapped, I would do identity mapping only for the area which fits and demand-mapped the rest.

More severely if you use all kernel virtual address space for the 1:1 mapping of the physical memory, you have no space left for mapping additional pieces of the physical address space (e.g. the framebuffer). Therefore, if we agree that for example 512 MB or 1 GB is enough for the 1:1 mapping (if all the physical memory can fit in there, then hurray!), we will have fairly enough space for the on-demand mapping.

We sort of demand-map devices even now, we are just relying on the fact that we have less physical memory than virtual memory. This needs to be fixed and I would also use demand mapping for this. Note that we don't need much of virtual address space for the devices in the kernel though - only about few MBs.

But you do this only once per physical memory allocation request. This is just a constant overhead and you don't have to unmap the memory immediately (for the case that the client requests releasing and acquiring the same memory in a wildly manner).

It's a constant that you pay every time you access a new page. It is actually worse when you unmap the new page as you suffer TLB shootdowns and such. Who would you cache the mapping for? The kernel is not going to need it again. The only negligible chance is when the frame is unmapped, deallocated, allocated by the same task again and mapped to the same virtual address. So you would optimize for an entirely improbable case.

Again, the keyword is: Cache. Keep the allocated frames mapped for some time, have a pool of these frames to avoid unnecessary short round-trips to the frame allocator, or even extend the SLAB allocator's API to better support kernel thread stacks (extending the SLAB allocator in such a way to avoid most direct calls to the frame allocator is perhaps the best solution).

Actually, the slab allocator could be used to allocate the stacks even know, but what about the other use cases.

Ticket #12 when it's implemented

Again, the keyword is: Cache. Why should you tear down the mapping and release the kernel buffers immediately after doing the operation? (Except if you are under memory pressure.) If two tasks exchange some data only say once per second, the added overhead of mapping and unmapping won't kill anybody. If they exchange data frequently, keeping the buffers ready will not only mitigate this particular overhead, but also other overheads involved.

This would require some additional caching mechanism and it could indeed speed up the operation.

COW when it's implemented

You lost me here completely. What does this have to do with kernel 1:1 mapping?

I'll try to refresh your memory :-) We were talking about the various use cases that would require the kernel to do one-time map-use-unmap operations, which we considered harmful for on-demand mapping. Well, COW is an example of exactly that pattern. Imagine you have a page with say some initialized uspace data and you want to write to it. The page is being shared by multiple tasks. Initially, the COW implementation will map the page read-only. When you, as a uspace app, write it, the kernel will catch the attempt, allocate a new frame for you, copy the data and map the frame to the original virtual address. After that, the kernel mapping is not needed anymore as the new frame will remain your private copy.

in reply to:  6 comment:8 by Jakub Jermář, 14 years ago

Replying to decky:

Except for the benefits of the 1:1 identity mapping which I have described earlier (for bootstrap, exception handling, etc.), the benefits you are describing can be perhaps summarized as: The memory mapping is implicitly cached (forever).

I agree.

My proposal means going mostly from this implicit caching to explicit caching. This complicates many things on one hand, but on the other hand it allows you to be less limited (the difference between the physical and the kernel address space size is an objective issue which cannot be avoided) and tune the caching policy to your specific needs.

But why mostly use the on-demand mapping, when a given piece of memory can be identity mapped? As I see it, there will need to be some formula, which will tell us, how much memory will be identity mapped and how much will be on-demand mapped, based on the amount of physical memory and possible hardware constraints. If identity mapping will be used mostly or not should therefore depend on this formula. I still fail to see why we should give up the advantages of this, as you say, implicit caching of memory mapping when it is available. For me, the on-demand mapping should come into play especially for memory, which cannot be accessed using identity mapping, and for devices.

Maybe I lost you again, maybe a long time ago, but I have been considering a model, in which the identity mapped and on-demand mapped areas coexist, so you cannot say that the system would be limited in any way, even if it prefers identity mapping for some areas of memory. It's about using the best(tm) method for each part of the physical memory.

in reply to:  7 ; comment:9 by Martin Decky, 14 years ago

You are now mixing two things:

  1. identity mapping, which can be done on an arbitrary portion of the address space (if not dictated by hw), not only on 50%
  1. kernel - user address space split ratio, e.g. 2g:2g, 1g: 3g and similar

So it is the split ratio which determines how much physical memory can one address space use, not the mapping mechanism.

I believe you didn't understand my point. You can certainly change the split ratio, but unless you give 4 GB to the kernel and 0 GB to the user space, you won't be able to cover all 4 GB of physical address space using identity mapping at the same time (and you certainly want the user address space to be as large as possible, not as small as possible). With PAE, there is even no way how to cover the whole physical address space at the same time.

You can identically map various large windows of the physical address space, but isn't it better to use a full-featured on-demand mapping?

We sort of demand-map devices even now, we are just relying on the fact that we have less physical memory than virtual memory. This needs to be fixed and I would also use demand mapping for this. Note that we don't need much of virtual address space for the devices in the kernel though - only about few MBs.

Yes, but if you just say that you limit the identity mapping to say 1 GB and don't do any other modifications, you solve the problem with address space to map devices to, but limit the kernel memory allocator to this 1 GB.

I know, this is very subtle, because I have been saying that most of the "upper" physical memory will be then used by user space and the kernel won't need to access is very frequently. And now I am worried about this very same memory. But the problem is that the kernel might possibly run out of the 1 GB identically mapped memory (because more physical memory implies more user space tasks to run at once and this implies more kernel resources needed at once), thus the kernel should be allowed to acquire more physical memory and on-demand map it.

This again calls for a generic framework for kernel on-demand mapping (where the fixed identically mapped area would be just a special hardwired mapping which would be preferred if available), so you don't impose any artificial limitations to usage of the memory.

It's a constant that you pay every time you access a new page. It is actually worse when you unmap the new page as you suffer TLB shootdowns and such. Who would you cache the mapping for? The kernel is not going to need it again. The only negligible chance is when the frame is unmapped, deallocated, allocated by the same task again and mapped to the same virtual address. So you would optimize for an entirely improbable case.

"Allocated by the same task" .. Huh? Why are we talking about tasks now? All the time we were talking about some physical memory which needs to be on-demand mapped by the kernel into the kernel address space (e.g. because some syscall is pushing some data to the kernel or via the kernel or because of kernel allocator request), then the kernel does something with the contents of the memory, and then unmaps the memory.

If I say that we should "cache the mapping for some time", I mean "defer the unmapping phase, keep the mapping around for some time (and unmap the cached mapping in case of kernel address space pressure)". Thus you don't suffer the TLB shootdown.

There is a reasonable chance that the user space task will try to pass the same physical memory to the kernel again (because a reasonable task is obviously also not unmapping its memory after every single syscall and soon requiring to map a totally different piece of physical memory to perform a frequent operation), thus the kernel-side mapping will be still cached and in-place.

The kernel SLAB allocator naturally caches previously claimed pages (even if they allocation of the structures inside them fluctuates frequently), thus even here the cached (= "not prematurely unmapped") mapping will hit and save you from frequent TLB shootdowns.

I'll try to refresh your memory :-) We were talking about the various use cases that would require the kernel to do one-time map-use-unmap operations, which we considered harmful for on-demand mapping. Well, COW is an example of exactly that pattern. Imagine you have a page with say some initialized uspace data and you want to write to it. The page is being shared by multiple tasks. Initially, the COW implementation will map the page read-only. When you, as a uspace app, write it, the kernel will catch the attempt, allocate a new frame for you, copy the data and map the frame to the original virtual address. After that, the kernel mapping is not needed anymore as the new frame will remain your private copy.

OK, now I understand. The steps you are describing are following:

  1. Claim new physical frame for the diverging page.
  2. Unmap the original COW page from the diverging task.
  3. Map the diverging page to the diverging task.
  4. Create kernel mapping for the original COW page.
  5. Create kernel mapping for the diverging page.
  6. Copy data from the original COW page to the diverging page.
  7. Unmap the COW page from the kernel address space.
  8. Unmap the diverging page from the kernel address space.

I assume that the critical operations are (7) and (8). But again, the caching of the kernel mapping (deferring (7) and (8)) can help here tremendously

If more tasks which share the original COW page are going to diverge, you have the kernel mapping of the original COW page ready. If the diverging page is going to be shared by some other tasks, you have the kernel mapping of it ready, too.

The only case where you need to do the expensive unmap operation is when the kernel virtual address space is growing full. But even in this case you do the unmapping just once.

The only severely degraded situation might happen when the user space tasks (and/or the kernel) wildly claim and release physical memory under physical memory pressure, where it might happen that the entities are going to get different physical frames every time. But these conditions are already bad and undesirable enough, the additional overhead of the forced kernel address space unmapping will just make it a little bit worse.

Maybe I lost you again, maybe a long time ago, but I have been considering a model, in which the identity mapped and on-demand mapped areas coexist, so you cannot say that the system would be limited in any way, even if it prefers identity mapping for some areas of memory. It's about using the best(tm) method for each part of the physical memory.

Yes, we can definitively agree on this.

in reply to:  9 ; comment:10 by Jakub Jermář, 14 years ago

Replying to decky:

I believe you didn't understand my point. You can certainly change the split ratio, but unless you give 4 GB to the kernel and 0 GB to the user space, you won't be able to cover all 4 GB of physical address space using identity mapping at the same time (and you certainly want the user address space to be as large as possible, not as small as possible). With PAE, there is even no way how to cover the whole physical address space at the same time.

Well, this is certainly true also for on-demand mapping: you won't be able to cover the entire physical address space using on-demand mapping at _one point in time_. To address more, you'll need to map and unmap pages. I admit that some of my examples can be solved by keeping the mapping around, but e.g. the example with clearing memory frames in the address space area backends is the case where the caching will not help. It would, in fact, worsen the situation. See below.

You can identically map various large windows of the physical address space, but isn't it better to use a full-featured on-demand mapping?

I'd rather use full-featured identity mapping where possible and full-featured on-demand mapping where needed. In other words, I'd be against the use of on-demand mapping if its justification (for any given area of physical memory) is only based on a subjective feeling such as that it is more generic, more pure, more engineered and similar.

This again calls for a generic framework for kernel on-demand mapping (where the fixed identically mapped area would be just a special hardwired mapping which would be preferred if available), so you don't impose any artificial limitations to usage of the memory.

Now we are touching the implementation part. I can agree with the idea that both mapping types will be represented by something like an address space area, which will describe itself as either identity-mapped or on-demand mapped.

It's a constant that you pay every time you access a new page. It is actually worse when you unmap the new page as you suffer TLB shootdowns and such. Who would you cache the mapping for? The kernel is not going to need it again. The only negligible chance is when the frame is unmapped, deallocated, allocated by the same task again and mapped to the same virtual address. So you would optimize for an entirely improbable case.

"Allocated by the same task" .. Huh? Why are we talking about tasks now? All the time we were talking about some physical memory which needs to be on-demand mapped by the kernel into the kernel address space (e.g. because some syscall is pushing some data to the kernel or via the kernel or because of kernel allocator request), then the kernel does something with the contents of the memory, and then unmaps the memory.

In this case, I was not talking either about a syscall pushing data to the kernel or the kernel allocator allocating some memory. Instead, I was talking about a task, more precisely one of its threads running in userspace, causing a page fault by accessing some virtual address for the first time. This is what happens in the kernel then:

  1. the kernel figures out that the page fault happened e.g. inside an anonymous address space area
  2. the anonymous address space area backend allocates a memory frame, let's suppose it is from the non-identity mapped area
  3. the backend maps the frame to an available page in the on-demand mapped area
  4. the backend clears the page
  5. the backend unamps the page
  6. the kernel returns to userspace

Note that if you actually cache the mapping, you are not going to use it again, because the kernel simply does not need to touch the user data again. Such a mapping will only effectively decrease the size of the on-demand mapped area (because it is not useful anymore and is taking up virtual address space).

If I say that we should "cache the mapping for some time", I mean "defer the unmapping phase, keep the mapping around for some time (and unmap the cached mapping in case of kernel address space pressure)". Thus you don't suffer the TLB shootdown.

So you would be just increasing the kernel address space pressure without any gain. You would still need to do more TLB shootdowns than with identity mapping, even though in a deferred manner.

There are some possible optimizations for my exmaple, such as that you disable preemption, and map the frame only on the local CPU, then you can unmap it cheaply, and enable preemption again. But still, even if you don't suffer the TLB shootdowns and don't increase the kas pressure, the operation is going to have a way bigger overhead than with identity mapped mapping. So again, I fail to see why would you deliberately want to accept the overhead, when the kernel could be configured to avoid this as much as possible by preferring identity mapping?

There is a reasonable chance that the user space task will try to pass the same physical memory to the kernel again (because a reasonable task is obviously also not unmapping its memory after every single syscall and soon requiring to map a totally different piece of physical memory to perform a frequent operation), thus the kernel-side mapping will be still cached and in-place.

This was not the case that I was discussing, see above.

  1. Unmap the COW page from the kernel address space.
  2. Unmap the diverging page from the kernel address space.

I assume that the critical operations are (7) and (8). But again, the caching of the kernel mapping (deferring (7) and (8)) can help here tremendously

Again, the kernel will not need to access the diverging page for a very long time. So you can apply the previous discussion here.

If more tasks which share the original COW page are going to diverge, you have the kernel mapping of the original COW page ready. If the diverging page is going to be shared by some other tasks, you have the kernel mapping of it ready, too.

I can see some merit in caching the mapping of the COW cache, not so much for the diverging page.

The only case where you need to do the expensive unmap operation is when the kernel virtual address space is growing full. But even in this case you do the unmapping just once.

But frequent use of on-demand mapped mappings will exhaust the kas more frequently, so in total, there will be more TLB shootdowns.

in reply to:  10 ; comment:11 by Martin Decky, 14 years ago

Well, this is certainly true also for on-demand mapping: you won't be able to cover the entire physical address space using on-demand mapping at _one point in time_.

Sure, true. But you certainly see that the most suitable use case for 1:1 identity mapping is the need to access large continuous pieces of physical memory. If accessing large continuous pieces of memory is not the most important issue (as in our case), but the most important requirement is to be able to map a maximum count of physical frames disregarding their mutual distance in the physical address space, then on-demand mapping wins, because it provides something which identity mapping simply can't provide.

In an analogy, if you favour identity mapping, you create a privileged set of tasks which are lucky that their physical memory falls into the identity mapping area (thus they don't suffer any overhead due to unmapping the frames from the kernel address space). If you favour on-demand mapping, you spread the said overhead to all tasks more-or-less evenly. If you cache the kernel on-demand mapping, then given a reasonable workload you have a good chance to hit just a very reasonable overall overhead.

And yes, if the physical memory claim/release pattern of the tasks is very wrong, you can suffer really bad performance because of this overhead. But the question is who is actually to blame then ..

If you have say 32 GB of physical memory on ia32/PAE, the more memory you reserve for identity mapping, the more "unfairness" you introduce. The more memory you reserve for on-demand mapping, the more "fairly" you distribute the overhead and the caching of the on-demand mapping have more space to adapt on the current workload.

And finally, it won't hurt much if you use on-demand mapping even when the physical memory might fit into a reasonably large identity mapping area (e.g. 1 GB physical memory on ia32). In the worst case you end up with the whole physical memory mapped on-demand and the chance that something would evict this mapping is extremely low (there is no more physical memory to map and the remaining 1 GB of kernel virtual address space is usually quite enough for hardware devices).

Now we are touching the implementation part. I can agree with the idea that both mapping types will be represented by something like an address space area, which will describe itself as either identity-mapped or on-demand mapped.

Great, at least something we can agree on without conflicts :)

  1. the kernel figures out that the page fault happened e.g. inside an anonymous address space area
  2. the anonymous address space area backend allocates a memory frame, let's suppose it is from the non-identity mapped area
  3. the backend maps the frame to an available page in the on-demand mapped area
  4. the backend clears the page
  5. the backend unamps the page
  6. the kernel returns to userspace

Note that if you actually cache the mapping, you are not going to use it again, because the kernel simply does not need to touch the user data again. Such a mapping will only effectively decrease the size of the on-demand mapped area (because it is not useful anymore and is taking up virtual address space).

(a) You don't know whether the task (thread) won't actually use the memory to push some data to/via the kernel. With every reuse you lower the relative impact of the possible kernel unmapping overhead.

(b) Even if (a) is not the case, the task might release the memory and some other task might claim the same physical frame later (but soon enough to hit the caching mechanism). Again, the cached kernel mapping will save you from the unmapping overhead (and even the repeated mapping).

© The cases (a) and (b) are both talking about probabilities and we won't probably agree on them. But your final argument about decreasing the available space in the kernel on-demand memory area works exactly the same for the identity mapped area. Even worse, the space in the identity mapped area is occupied also when there is no reason for it.

My conclusion from (a), (b), © is that the on-demand mapping has all the benefits of identity mapping (w.r.t. a running system, not a bootstraping system — as we discussed earlier). Plus it does not waste the kernel address space on memory which the kernel does not care for. Plus a clever caching of the kernel mapping is more fair to reasonable workloads than the fixed "caching" of the fixed identity mapping.

The only disadvantage of on-demand mapping is the slight danger of unmapping overhead. But the important point is that the unmapping would actually happen the sooner the smaller the on-demand mapping area is!

In other words, if you favour identity mapping, you have a chance that the the tasks would fit just into the identity mapped memory and no TLB shootdowns would happen (but in that case the tasks would also fit perfectly into the cached on-demand mapping and also no TLB shootdowns would happen). If the tasks, on the other hand, run beyond the identity mapped memory (and you are forced to use the on-demand mapping), the probability that you consume the whole kernel virtual address space (and be forced to unmap something) grows directly proportional to the size of the identity mapped area.

This reasoning brings me again to making the identity mapped area reasonably small, not reasonably large.

So you would be just increasing the kernel address space pressure without any gain. You would still need to do more TLB shootdowns than with identity mapping, even though in a deferred manner.

No. If my reasoning above is correct, this is actually not true. You would actually decrease the kernel address space pressure (in the worst case the pressure would stay the same as with only using identity mapping) and you would actually decrease the probability of doing TLB shootdowns (although the shootdowns will be running across a larger portion of the kernel virtual address space).

But still, even if you don't suffer the TLB shootdowns and don't increase the kas pressure, the operation is going to have a way bigger overhead than with identity mapped mapping. So again, I fail to see why would you deliberately want to accept the overhead, when the kernel could be configured to avoid this as much as possible by preferring identity mapping?

Because the overall overhead won't be as monstrous as you try to imagine. With usual workloads the overhead would be somewhere from none to very very slight. Only really pathological workloads would create a large overhead.

And the probability of the large overhead would increase if you dedicate more memory to the identity mapped area, not decrease!

in reply to:  11 comment:12 by Jakub Jermář, 14 years ago

Replying to decky:

Sure, true. But you certainly see that the most suitable use case for 1:1 identity mapping is the need to access large continuous pieces of physical memory. If accessing large continuous pieces of memory is not the most important issue (as in our case), but the most important requirement is to be able to map a maximum count of physical frames disregarding their mutual distance in the physical address space, then on-demand mapping wins, because it provides something which identity mapping simply can't provide.

It would have been the most important requirement, if we had to choose and deploy only one of the mechanisms, but since we plan to let the kernel auto-configure itself at boot time and use both mechanisms as appropriate, the most important requirement should be the low overhead.

In an analogy, if you favour identity mapping, you create a privileged set of tasks which are lucky that their physical memory falls into the identity mapping area (thus they don't suffer any overhead due to unmapping the frames from the kernel address space). If you favour on-demand mapping, you spread the said overhead to all tasks more-or-less evenly. If you cache the kernel on-demand mapping, then given a reasonable workload you have a good chance to hit just a very reasonable overall overhead.

For me it is hard to predict what frames and from what physical addresses will a task use. Thanks to fragmentation, it can be quite likely that one task will be using frames that the kernel accesses both via identity and on-demand mapping. Moreover, the analogy is a little bit evil, because it says that rather than accepting a chance, that someone will be happy, we will make everyone unhappy. I plan to touch this below, but depending on the size of the physical memory, it can be more probable that most tasks will be happy, if the identity mapped region is rather larger than smaller.

If you have say 32 GB of physical memory on ia32/PAE, the more memory you reserve for identity mapping, the more "unfairness" you introduce. The more memory you reserve for on-demand mapping, the more "fairly" you distribute the overhead and the caching of the on-demand mapping have more space to adapt on the current workload.

Ok, now we are getting to the parametrization part. It is clear that the more physical memory the system has, the more advantage should be given to the on-demand part (see? I am no identity mapping zealot :-). But as I mentioned this in one of my previous comment, this should be governed by a formula with the physical memory size as a parameter.

Having said that, I can imagine some guidelines, which should be followed by such a formula:

  1. if the size of the physical memory fits within the identity mapped region, do identity mapping for the entire physical memory and do not configure the on-demand region at all (except for mappings needed by devices)
  1. in other cases, use the ratio between the size of identity-mappable (IS) physical memory and the total size of physical memory (TS) to split the kas between identity-mapped region (I) and on-demand mapped region (D), such that IS/TS = I/D
  1. the identity mapped region should be never smaller than some agreed limit, e.g. 512M

So as you can see, (1) is the case where identity is preferred and the identity mapping is as large as possible. For your example of ia32 with PAE, (2) would split the 2G kas to 128M of identity mapping and 1920M if on-demand mapping. However, applying (3) would give some advantage to the identity mapped area merely to guarantee some reasonably sane functioning of the kernel, by giving 512M to identity mapping and 1536M to on-demand mapping. In another example of a system with 3G memory, identity would have 1.3G and on-demand would have 0.7G. A 4GB system would be split equally 1G:1G.

And finally, it won't hurt much if you use on-demand mapping even when the physical memory might fit into a reasonably large identity mapping area (e.g. 1 GB physical memory on ia32). In the worst case you end up with the whole physical memory mapped on-demand and the chance that something would evict this mapping is extremely low (there is no more physical memory to map and the remaining 1 GB of kernel virtual address space is usually quite enough for hardware devices).

But why bother, why suffer even the overhead of only mapping, caching and looking up when it is not necessary at all?

My conclusion from (a), (b), © is that the on-demand mapping has all the benefits of identity mapping (w.r.t. a running system, not a bootstraping system — as we discussed earlier). Plus it does not waste the kernel address space on memory which the kernel does not care for. Plus a clever caching of the kernel mapping is more fair to reasonable workloads than the fixed "caching" of the fixed identity mapping.

I would argue that even the running system has bigger overhead, because it constantly looks up translations in the cache. And because processes come and go, there will hardly be any balanced period, when everything is cached. Moreover, the caching layer you are talking about will bring another level of complexity.

In other words, if you favour identity mapping, you have a chance that the the tasks would fit just into the identity mapped memory and no TLB shootdowns would happen (but in that case the tasks would also fit perfectly into the cached on-demand mapping and also no TLB shootdowns would happen). If the tasks, on the other hand, run beyond the identity mapped memory (and you are forced to use the on-demand mapping), the probability that you consume the whole kernel virtual address space (and be forced to unmap something) grows directly proportional to the size of the identity mapped area.

This should be handled by the proportional split suggested above.

This reasoning brings me again to making the identity mapped area reasonably small, not reasonably large.

You need to consider the size of the physical memory. On some smaller configurations, you can reach as large as possible and it will surely be the best solution, while on larger configurations it makes sense to let the on-demand prevail. However, only ia32 with PAE seems to be able to make use of the bigger on-demand area on all other 32-bit platforms, including standard ia32, the formula above will favor identity or make the two areas equally sized.

comment:13 by Jakub Jermář, 14 years ago

Component: unspecifiedkernel/generic

comment:15 by Jakub Jermář, 13 years ago

Type: taskenhancement

comment:16 by Martin Decky, 13 years ago

Depends on: #343

comment:17 by Zdenek Bouska, 13 years ago

Cc: zdenek.bouska@… added

comment:18 by Jakub Jermář, 12 years ago

In mainline,1352, I merged Phase I of functionality needed by ticket #3. Starting with this revision, it is already possible for the kernel to access an arbitrary physical address, including device registers and high memory.

comment:19 by Jakub Jermář, 12 years ago

Since mainline,1365, uspace memory is allocated preferably from high memory and the kernel uses kernel non-identity mapping to initialize it.

comment:20 by Jakub Jermář, 12 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.