aboutsummaryrefslogtreecommitdiff
path: root/vm/vm_map.c
Commit message (Collapse)AuthorAgeFilesLines
* Fix bogus formatSamuel Thibault2024-07-091-1/+1
|
* vm: Mark entries as in-transition while wiring downSergey Bugaev2024-04-051-1/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When operating on the kernel map, vm_map_pageable_scan() does what the code itself describes as "HACK HACK HACK HACK": it unlocks the map, and calls vm_fault_wire() with the map unlocked. This hack is required to avoid a deadlock in case vm_fault or one of its callees (perhaps, a pager) needs to allocate memory in the kernel map. The hack relies on other kernel code being "well-behaved", in particular on that nothing will do any serious changes to this region of memory while the map is unlocked, since this region of memory is "owned" by the caller. Even if the kernel code is "well-behaved" and doesn't alter VM regions that it doesn't "own", it can still access adjacent regions. While this doesn't affect the region being wired down as such, it can still end up causing trouble due to extension & coalescence (merging) of VM entries. VM entry coalescence is an optimization where two adjacent VM entries with identical properties are merged into a single one that spans the combined region of the two original entries. VM entry extension is a similar an optimization where an existing VM entry is extended to cover an adjacent region, instead of a new VM entry being created to describe the region. These optimizations are a private implementation detail of vm_map, and (while they can be observed through e.g. vm_region) they are not supposed to cause any visible effects to how the described regions of memory behave; coalescence/extension and clipping happen automatically as needed when adding or removing mappings, or changing their properties. This is why it's fine for "well-behaved" kernel code to unknowingly cause extension or coalescence of VM entries describing a region by operating on adjacent VM regions. The "HACK HACK HACK HACK" code path relies on the VM entries in the region staying intact while it keeps the map unlocked, as it passes direct pointers to the entries into vm_fault_wire(), and also walks the list of entries in the region by following the vme_next pointers in the entries. Yet, this assumption is violated by the entries getting concurrently modified by other kernel code operating on adjacent VM regions, as described above. This is not only undefined behavior in the sense of the C language standard, but can also cause very real issues. Specifically, we've been seeing the VM subsystem deadlock when building Mach with SMP support and running a test program that calls mach_port_names() concurrently and repearedly. mach_port_names() implementation allocates and wires down memory, and when called from multiple threads, it was likely to allocate, and wire, several adjacent regions of memory, which would then cause entry coalescence/extension and clipping to kick in. The specific sequence of events that led to a deadlock appear to have been: 1. Multiple threads execute mach_port_names() concurrently. 2. One of the threads is wiring down a memory region, another is unwiring an adjacent memory region. 3. The wiring thread has unlocked the ipc_kernel_map, and called into vm_fault_wire(). 4. Due to entry coalescence/extension, the entry the wiring thread was going to wire down now describes a broader region of memory, namely it includes an adjustent region of memory that has previously been wired down by the other thread that is about to unwire it. 5. The wiring thread sets the busy bit on a wired-down page that the unwiring thread is about to unwire, and is waiting to take the map lock for reading in vm_map_verify(). 6. The unwiring thread holds the map lock for writing, and is waiting for the page to lose its busy bit. 7. Deadlock! To prevent this from happening, we have to ensure that the VM entries, at least as passed into vm_fault_wire() and as used for walking the list of such entries, stay intact while we have the map unlocked. One simple way to achieve that that I have proposed previously is to make a temporary copy of the VM entries in the region, and pass the copies into vm_fault_wire(). The entry copies would not be affected by coalescence/ extension, even if the original entries in the map are. This is however only straigtforward to do when there's just a single entry describing the while region, and there are further concerns with e.g. whether the underlying memory objects could, too, get coalesced. Arguably, making copies of the memory entries is making the hack even bigger. This patch instead implements a relatively clean solution that, arguably, makes the whole thing less of a hack: namely, making use of the in-transition bit on VM entries to prevent coalescence and any other unwanted effects. The entry in-transition bit was introduced for a very similar use case: the VM map copyout logic has to temporarily unlock the map to run its continuation, so it marks the VM entries it copied out into the map up to that point as being "in transition", asking other code to hold off making any serious changes to those entries. There's a companion "needs wakeup" bit that other code can set to block on the VM entry exiting this in-transition state; the code that puts an entry into the in-transition state is expected to, when unsetting the in-transition bit back, check for needs_wakeup being set, and wake any waiters up in that case, so they can retry whatever operation they wanted to do. There is no need to check for needs_wakeup in case of vm_map_pageable_scan(), however, exactly because we expect kernel code to be "well-behaved" and not make any attempts to modify the VM region. This relies on the in-transition bit inhibiting coalescence/extension, as implemented in the previous commit. Also, fix a tiny sad misaligned comment line. Reported-by: Damien Zammit <damien@zamaudio.com> Helped-by: Damien Zammit <damien@zamaudio.com> Message-ID: <20240405151850.41633-3-bugaevc@gmail.com>
* vm: Don't attempt to extend in-transition entriesSergey Bugaev2024-04-051-0/+4
| | | | | | | | | | | | | | | | The in-transition mechanism exists to make it possible to unlock a map while still making sure some VM entries won't disappear from under you. This is currently used by the VM copyin mechanics. Entries in this state are better left alone, and extending/coalescing is only an optimization, so it makes sense to skip it if the entry to be extended is in transition. vm_map_coalesce_entry() already checks for this; check for it in other similar places too. This is in preparation for using the in-transition mechanism for wiring, where it's much more important that the entries are not extended while in transition. Message-ID: <20240405151850.41633-2-bugaevc@gmail.com>
* vm: Fix use-after-free in vm_map_pageable_scan()Sergey Bugaev2024-04-051-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When operating on the kernel map, vm_map_pageable_scan() does what the code itself describes as "HACK HACK HACK HACK": it unlocks the map, and calls vm_fault_wire() with the map unlocked. This hack is required to avoid a deadlock in case vm_fault or one of its callees (perhaps, a pager) needs to allocate memory in the kernel map. The hack relies on other kernel code being "well-behaved", in particular on that nothing will do any serious changes to this region of memory while the map is unlocked, since this region of memory is "owned" by the caller. This reasoning doesn't apply to the validity of the 'end' entry (the first entry after the region to be wired), since it's not a part of the region, and is "owned" by someone else. Once the map is unlocked, the 'end' entry could get deallocated. Alternatively, a different entry could get inserted after the VM region in front of 'end', which would break the 'for (entry = start; entry != end; entry = entry->vme_next)' loop condition. This was not an issue in the original Mach 3 kernel, since it used an address range check for the loop condition, but got broken in commit 023401c5b97023670a44059a60eb2a3a11c8a929 "VM: rework map entry wiring". Fix this by switching the iteration back to use an address check. This partly fixes a deadlock with concurrent mach_port_names() calls on SMP, which was Reported-by: Damien Zammit <damien@zamaudio.com> Message-ID: <20240405151850.41633-1-bugaevc@gmail.com>
* vm_map: Add comment and assert for vm_map_deleteDamien Zammit2024-02-231-0/+7
| | | | | | This will prevent calling vm_map_delete without the map locked unless ref_count is zero. Message-ID: <20240223081505.458240-1-damien@zamaudio.com>
* vm_map_lookup: Add parameter for keeping map lockedDamien Zammit2024-02-221-3/+6
| | | | | | | | | | | This adds a parameter called keep_map_locked to vm_map_lookup() that allows the function to return with the map locked. This is to prepare for fixing a bug with gsync where the map is locked twice by mistake. Co-Authored-By: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240222082410.422869-3-damien@zamaudio.com>
* adjust range when changing memory pageabilityLuca Dariz2024-01-131-5/+26
| | | | | | | | | * vm/vm_map.c: use actual limits instead of min/max boundaries to change pageability of the currently mapped memory. This caused the initial vm_wire_all(host, task VM_WIRE_ALL) in glibc startup to fail with KERN_NO_SPACE. Message-ID: <20240111210907.419689-5-luca@orpolo.org>
* vm: Coalesce map entriesSergey Bugaev2023-11-271-2/+27
| | | | | | | | When - extending an existing entry, - changing protection or inheritance of a range of entries, we can get several entries that could be coalesced. Attempt to do that. Message-ID: <20230705141639.85792-4-bugaevc@gmail.com>
* vm: Add vm_map_coalesce_entrySergey Bugaev2023-11-271-2/+76
| | | | | | This function attempts to coalesce a VM map entry with its preceeding entry. It wraps vm_object_coalesce. Message-ID: <20230705141639.85792-3-bugaevc@gmail.com>
* vm: Also check for virtual addresses in vm_map_deleteSamuel Thibault2023-08-141-0/+3
|
* vm: Make vm_object_coalesce return new object and offsetSergey Bugaev2023-07-051-5/+6
| | | | | | | | | | | | | | | | | | | vm_object_coalesce() callers used to rely on the fact that it always merged the next_object into prev_object, potentially destroying next_object and leaving prev_object the result of the whole operation. After ee65849bec5da261be90f565bee096abb4117bdd "vm: Allow coalescing null object with an internal object", this is no longer true, since in case of prev_object == VM_OBJECT_NULL and next_object != VM_OBJECT_NULL, the overall result is next_object, not prev_object. The current callers are prepared to deal with this since they handle this case seprately anyway, but the following commit will introduce another caller that handles both cases in the same code path. So, declare the way vm_object_coalesce() coalesces the two objects its implementation detail, and make it return the resulting object and the offset into it explicitly. This simplifies the callers, too. Message-Id: <20230705141639.85792-2-bugaevc@gmail.com>
* vm: Eagerly release deallocated pagesSergey Bugaev2023-07-031-5/+21
| | | | | | | | | | | | | | If a deallocated VM map entry refers to an object that only has a single reference and doesn't have a pager port, we can eagerly release any physical pages that were contained in the deallocated range. This is not a 100% solution: it is still possible to "leak" physical pages that can never appear in virtual memory again by creating several references to a memory object (perhaps by forking a VM map with VM_INHERIT_SHARE) and deallocating the pages from all the maps referring to the object. That being said, it should help to release the pages in the common case sooner. Message-Id: <20230626112656.435622-6-bugaevc@gmail.com>
* vm: Allow coalescing entries forwardSergey Bugaev2023-07-031-4/+35
| | | | | | | When entering an object into a map, try to extend the next entry backward, in addition to the previously existing attempt to extend the previous entry forward. Message-Id: <20230626112656.435622-5-bugaevc@gmail.com>
* vm: Allow coalescing a VM object with itselfSergey Bugaev2023-07-031-6/+6
| | | | | | | | If a mapping of an object is made right next to another mapping of the same object have the same properties (protection, inheritance, etc.), Mach will now expand the previous VM map entry to cover the new address range instead of creating a new entry. Message-Id: <20230626112656.435622-3-bugaevc@gmail.com>
* Define rpc_vm_size_array_t and rpc_vm_offset_array_tFlavio Cruz2023-01-311-6/+7
| | | | | | | | | | When generating stubs, Mig will will take the vm_size_array_t and define the input request struct using rpc_vm_size_t since the size is variable. This will turn cause a mismatch between types (vm_size_t* vs rpc_vm_size_t*). We could also ask Mig to produce a prototype by using rpc_vm_size_t*, however we would need to change the implementation of the RPC to use rpc_* types anyway since we want to avoid another allocation of the array. Message-Id: <Y9iwScHpmsgY3V0N@jupiter.tail36e24.ts.net>
* Remove existing old style definitions and use -Wold-style-definition.Flavio Cruz2023-01-191-2/+1
| | | | Message-Id: <Y8mYd/pt/og4Tj5I@mercury.tail36e24.ts.net>
* Include mig generated headers to avoid warnings with -Wmissing-prototypes.Flavio Cruz2023-01-191-0/+1
| | | | | | This also reverts 566c227636481b246d928772ebeaacbc7c37145b and 963b1794d7117064cee8ab5638b329db51dad854 Message-Id: <Y8d75KSqNL4FFInm@mercury.tail36e24.ts.net>
* Fix some warnings with -Wmissing-prototypes.Flavio Cruz2022-12-271-85/+7
| | | | | | | | | | | Marked some functions as static (private) as needed and added missing includes. This also revealed some dead code which was removed. Note that -Wmissing-prototypes is not enabled here since there is a bunch more warnings. Message-Id: <Y6j72lWRL9rsYy4j@mars>
* Use -Wstrict-prototypes and fix warningsFlavio Cruz2022-12-211-8/+6
| | | | | | | Most of the changes include defining and using proper function type declarations (with argument types declared) and avoiding using the K&R style of function declarations. Message-Id: <Y6Jazsuis1QA0lXI@mars>
* Use __builtin_ffs instead of libc provided ffs in vm_map.cFlavio Cruz2022-12-151-2/+2
| | | | | | We already use this built-in in other places and this will move us closer to being able to build the kernel without libc. Message-Id: <Y5l80/VUFvJYZTjy@jupiter.tail36e24.ts.net>
* Define vm_size_t and vm_offset_t as __mach_uintptr_t.Flavio Cruz2022-12-061-4/+4
| | | | | | | | | | This allows *printf to use %zd/%zu/%zx to print vm_size_t and vm_offset_t. Warnings using the incorrect specifiers were fixed. Note that MACH_PORT_NULL became just 0 because GCC thinks that we were comparing a pointer to a character (due to it being an unsigned int) so I removed the explicit cast. Message-Id: <Y47UNdcUF35Ag4Vw@reue>
* vm_wire_all: Fix vm_map_protect caseSamuel Thibault2022-11-271-1/+2
| | | | | | | | | | | If a "wire_required" process calls vm_map_protect(0), the memory gets unwired as expected. But if the process then calls vm_map_protect(VM_PROT_READ) again, we need to wire that memory. (This happens to be exactly what glibc does for its heap) This fixes Hurd hangs on lack of memory, during which mach was swapping pieces of mach-defpager out.
* fix warnings for 32 bit buildsLuca Dariz2022-08-271-1/+1
| | | | | Signed-off-by: Luca Dariz <luca@orpolo.org> Message-Id: <20220628101054.446126-13-luca@orpolo.org>
* vm_region_get_proxy: rename to create_proxySergey Bugaev2021-11-071-4/+4
| | | | For coherency with memory_object_create_proxy.
* Memory proxies: Add support for anonymous mappingsSergey Bugaev2021-11-071-4/+17
| | | | | | | | * vm/vm_map.c (vm_region_get_proxy): - Return KERN_INVALID_ARGUMENT when the entry is a submap. - Create a pager for the vm_object when the entry doesn't have any yet, since it's an anonymous mapping. Message-Id: <20211106081333.10366-3-jlledom@mailfence.com>
* vm: vm_region_get_proxyJoan Lledó2021-11-071-0/+56
| | | | | | | | | To get a proxy to the region a given address belongs to, with protection and range limited to the region ones. * include/mach/mach4.defs: vm_region_get_proxy RPC declaration * vm/vm_map.c: vm_region_get_proxy implementation Message-Id: <20211106081333.10366-2-jlledom@mailfence.com>
* vm_page_grab: allow allocating in high memorySamuel Thibault2021-08-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_page_grab was systematically using the VM_PAGE_SEL_DIRECTMAP selector to play safe with existing code. This adds a flags parameter to let callers of vm_page_grab specify their constraints. Linux drivers need 32bit dmas, Xen drivers use kvtophys to clear some data. Callers of kmem_pagealloc_physmem and vm_page_grab_phys_addr also use kvtophys. Otherwise allocations can go to highmem. This fixes the allocation jam in the directmap segment. * vm/vm_page.h (VM_PAGE_DMA, VM_PAGE_DMA32, VM_PAGE_DIRECTMAP, VM_PAGE_HIGHMEM): New macros. (vm_page_grab): Add flags parameter. * vm/vm_resident.c (vm_page_grab): Choose allocation selector according to flags parameter. (vm_page_convert, vm_page_alloc): Pass VM_PAGE_HIGHMEM to vm_page_grab. (vm_page_grab_phys_addr): Pass VM_PAGE_DIRECTMAP to vm_page_grab. * vm/vm_fault.c (vm_fault_page): Pass VM_PAGE_HIGHMEM to vm_page_grab. * vm/vm_map.c (vm_map_copy_steal_pages): Pass VM_PAGE_HIGHMEM to vm_page_grab. * kern/slab.c (kmem_pagealloc_physmem): Pass VM_PAGE_DIRECTMAP to vm_page_grab. * i386/intel/pmap.c (pmap_page_table_page_alloc): Pass VM_PAGE_DIRECTMAP to vm_page_grab. * xen/block.c (device_read): Pass VM_PAGE_DIRECTMAP to vm_page_grab. * linux/dev/glue/block.c (alloc_buffer): Pass VM_PAGE_DMA32 to vm_page_grab.
* vm_map: Avoid linking gaps for vm_copy_tSamuel Thibault2021-01-041-16/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This does not make sense, and produces incorrect results (since vme_end is 0, etc.) * vm/vm_map.h (_vm_map_clip_start, _vm_map_clip_end): Add link_gap parameter. * vm/vm_map.c (_vm_map_entry_link): Add link_gap parameter, do not call vm_map_gap_insert if it is 0. (vm_map_entry_link): Set link_gap to 1 in _vm_map_entry_link call. (_vm_map_clip_start): Add link_gap parameter, pass it to _vm_map_entry_link call.. (vm_map_clip_start): Set link_gap_to 1 in _vm_map_clip_start call. (vm_map_copy_entry_link): Set link_gap to 0 in _vm_map_entry_link call. (vm_map_copy_clip_start): Set link_gap_to 0 in _vm_map_clip_start call. (_vm_map_entry_unlink): Add unlink_gap parameter, do not call vm_map_gap_remove if it is 0. (vm_map_entry_unlink): Set unlink_gap to 1 in _vm_map_entry_unlink call. (_vm_map_clip_end): Add link_gap parameter, pass it to _vm_map_entry_link call.. (vm_map_clip_end): Set link_gap_to 1 in _vm_map_clip_end call. (vm_map_copy_entry_unlink): Set unlink_gap to 0 in _vm_map_entry_unlink call. (vm_map_copy_clip_end): Set link_gap_to 0 in _vm_map_clip_end call. * vm/vm_kern.c (projected_buffer_deallocate): set link_gap to 1 in _vm_map_clip_start and _vm_map_clip_end calls.
* vm_map: print warning when max_size gets smaller than sizeSamuel Thibault2021-01-041-0/+2
| | | | | * vm/vm_map.c (vm_map_find_entry_anywhere): Print warning when max_size gets smaller than size.
* vm_map: Fix taking into account high bits in maskSamuel Thibault2020-12-201-2/+29
| | | | | | | | | | | glibc's sysdeps/mach/hurd/dl-sysdep.c has been wanting to use this for decades. * include/string.h (ffs): New declaration. * vm/vm_map.c: Include <string.h>. (vm_map_find_entry_anywhere): Separate out high bits from mask, to compute the maximum offset instead of map->max_offset. * doc/mach.texi (vm_map): Update documentation accordingly.
* satisfy 'werror=parantheses'.guy fleury iteriteka2019-08-311-1/+1
| | | | * vm/vm_map.c(vm_map_msync): explit group of first condition.
* Fix the pointer comparison of different type.guy fleury iteriteka2019-08-311-1/+1
| | | | | * vm/vm_map.c(vm_map_fork): use VM_MAP_NULL instead of PMAP_NULL when compare with new_map.
* fix return KERN_INVALID_ARGUMENT when the map is NULL.guy fleury iteriteka2019-08-301-2/+2
| | | | * vm/vm_map.c(vm_map_msync): Add missing return keyword.
* Fix allocation testSamuel Thibault2019-08-111-1/+1
| | | | | * vm/vm_map.c (vm_map_fork): Check for `new_map` being non-NULL, and not for `new_pmap` a second time.
* Add vm_object_sync supportSamuel Thibault2018-11-031-0/+31
| | | | | | | | | | | | | * include/mach/vm_sync.h: New file. * include/mach/mach_types.h: Include <mach/vm_sync.h> * Makefrag.am (include_mach_HEADERS): Add include/mach/vm_sync.h. * include/mach/mach_types.defs (vm_sync_t): Add type. * include/mach/gnumach.defs (vm_object_sync, vm_msync): Add RPCs. * vm/vm_map.h: Include <mach/vm_sync.h>. (vm_map_msync): New declaration. * vm/vm_map.c (vm_map_msync): New function. * vm/vm_user.c: Include <mach/vm_sync.h> and <kern/mach.server.h>. (vm_object_sync, vm_msync): New functions.
* vm_map: Fix bugs on huge masks parametersSamuel Thibault2018-04-221-2/+4
| | | | | * vm/vm_map.c (vm_map_find_entry_anywhere): Also check that (min + mask) & ~mask remains bigger than min.
* Fix warningSamuel Thibault2018-01-281-1/+1
| | | | * vm/vm_map.c (vm_map_copyout): Fix panic format.
* vm: Improve error handling.Justus Winter2017-08-141-1/+5
| | | | | * vm/vm_map.c (vm_map_create): Gracefully handle resource exhaustion. (vm_map_fork): Likewise at the callsite.
* Fix typo.Justus Winter2017-08-141-1/+1
|
* VM: add the vm_wire_all callRichard Braun2016-12-241-5/+86
| | | | | | | | | | | | | | | | | This call maps the POSIX mlockall and munlockall calls. * Makefrag.am (include_mach_HEADERS): Add include/mach/vm_wire.h. * include/mach/gnumach.defs (vm_wire_t): New type. (vm_wire_all): New routine. * include/mach/mach_types.h: Include mach/vm_wire.h. * vm/vm_map.c: Likewise. (vm_map_enter): Automatically wire new entries if requested. (vm_map_copyout): Likewise. (vm_map_pageable_all): New function. vm/vm_map.h: Include mach/vm_wire.h. (struct vm_map): Update description of member `wiring_required'. (vm_map_pageable_all): New function. * vm/vm_user.c (vm_wire_all): New function.
* VM: rework map entry wiringRichard Braun2016-12-241-271/+290
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First, user wiring is removed, simply because it has never been used. Second, make the VM system track wiring requests to better handle protection. This change makes it possible to wire entries with VM_PROT_NONE protection without actually reserving any page for them until protection changes, and even make those pages pageable if protection is downgraded to VM_PROT_NONE. * ddb/db_ext_symtab.c: Update call to vm_map_pageable. * i386/i386/user_ldt.c: Likewise. * ipc/mach_port.c: Likewise. * vm/vm_debug.c (mach_vm_region_info): Update values returned as appropriate. * vm/vm_map.c (vm_map_entry_copy): Update operation as appropriate. (vm_map_setup): Update member names as appropriate. (vm_map_find_entry): Update to account for map member variable changes. (vm_map_enter): Likewise. (vm_map_entry_inc_wired): New function. (vm_map_entry_reset_wired): Likewise. (vm_map_pageable_scan): Likewise. (vm_map_protect): Update wired access, call vm_map_pageable_scan. (vm_map_pageable_common): Rename to ... (vm_map_pageable): ... and rewrite to use vm_map_pageable_scan. (vm_map_entry_delete): Fix unwiring. (vm_map_copy_overwrite): Replace inline code with a call to vm_map_entry_reset_wired. (vm_map_copyin_page_list): Likewise. (vm_map_print): Likewise. Also print map size and wired size. (vm_map_copyout_page_list): Update to account for map member variable changes. * vm/vm_map.h (struct vm_map_entry): Remove `user_wired_count' member, add `wired_access' member. (struct vm_map): Rename `user_wired' member to `size_wired'. (vm_map_pageable_common): Remove function. (vm_map_pageable_user): Remove macro. (vm_map_pageable): Replace macro with function declaration. * vm/vm_user.c (vm_wire): Update call to vm_map_pageable.
* VM: make vm_wire more POSIX-friendlyRichard Braun2016-12-111-10/+15
| | | | | | | | * doc/mach.texi: Update return codes. * vm/vm_map.c (vm_map_pageable_common): Return KERN_NO_SPACE instead of KERN_FAILURE if some of the specified address range does not correspond to mapped pages. Skip unwired entries instead of failing when unwiring.
* vm: Print names of maps in the debugger.Justus Winter2016-11-041-2/+2
| | | | * vm/vm_map.c (vm_map_print): Print name of the map.
* Gracefully handle pmap allocation failures.Justus Winter2016-10-211-0/+3
| | | | | | * kern/task.c (task_create): Gracefully handle pmap allocation failures. * vm/vm_map.c (vm_map_fork): Likewise.
* Redefine what an external page isRichard Braun2016-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of a "page considered external", which apparently takes into account whether a page is dirty or not, redefine this property to reliably mean "is in an external object". This commit mostly deals with the impact of this change on the page allocation interface. * i386/intel/pmap.c (pmap_page_table_page_alloc): Update call to vm_page_grab. * kern/slab.c (kmem_pagealloc_physmem): Use vm_page_grab instead of vm_page_grab_contig. (kmem_pagefree_physmem): Use vm_page_release instead of vm_page_free_contig. * linux/dev/glue/block.c (alloc_buffer, device_read): Update call to vm_page_grab. * vm/vm_fault.c (vm_fault_page): Update calls to vm_page_grab and vm_page_convert. * vm/vm_map.c (vm_map_copy_steal_pages): Update call to vm_page_grab. * vm/vm_page.h (struct vm_page): Remove `extcounted' member. (vm_page_external_limit, vm_page_external_count): Remove extern declarations. (vm_page_convert, vm_page_grab): Update declarations. (vm_page_release, vm_page_grab_phys_addr): New function declarations. * vm/vm_pageout.c (VM_PAGE_EXTERNAL_LIMIT): Remove macro. (VM_PAGE_EXTERNAL_TARGET): Likewise. (vm_page_external_target): Remove variable. (vm_pageout_scan): Remove specific handling of external pages. (vm_pageout): Don't set vm_page_external_limit and vm_page_external_target. * vm/vm_resident.c (vm_page_external_limit): Remove variable. (vm_page_insert, vm_page_replace, vm_page_remove): Update external page tracking. (vm_page_convert): Remove `external' parameter. (vm_page_grab): Likewise. Remove specific handling of external pages. (vm_page_grab_phys_addr): Update call to vm_page_grab. (vm_page_release): Remove `external' parameter and remove specific handling of external pages. (vm_page_wait): Remove specific handling of external pages. (vm_page_alloc): Update call to vm_page_grab. (vm_page_free): Update call to vm_page_release. * xen/block.c (device_read): Update call to vm_page_grab. * xen/net.c (device_write): Likewise.
* VM: improve pageout deadlock workaroundRichard Braun2016-09-161-23/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5dd4f67522ad0d49a2cecdb9b109251f546d4dd1 makes VM map entry allocation done with VM privilege, so that a VM map isn't held locked while physical allocations are paused, which may block the default pager during page eviction, causing a system-wide deadlock. First, it turns out that map entries aren't the only buffers allocated, and second, their number can't be easily determined, which makes a preallocation strategy very hard to implement. This change generalizes the strategy of VM privilege increase when a VM map is locked. * device/ds_routines.c (io_done_thread): Use integer values instead of booleans when setting VM privilege. * kern/thread.c (thread_init, thread_wire): Likewise. * vm/vm_pageout.c (vm_pageout): Likewise. * kern/thread.h (struct thread): Turn member `vm_privilege' into an unsigned integer. * vm/vm_map.c (vm_map_lock): New function, where VM privilege is temporarily increased. (vm_map_unlock): New function, where VM privilege is decreased. (_vm_map_entry_create): Remove VM privilege workaround from this function. * vm/vm_map.h (vm_map_lock, vm_map_unlock): Turn into functions.
* Remove map entry pageability property.Richard Braun2016-09-071-56/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Since the replacement of the zone allocator, kernel objects have been wired in memory. Besides, as of 5e9f6f (Stack the slab allocator directly on top of the physical allocator), there is a single cache used to allocate map entries. Those changes make the pageability attribute of VM maps irrelevant. * device/ds_routines.c (mach_device_init): Update call to kmem_submap. * ipc/ipc_init.c (ipc_init): Likewise. * kern/task.c (task_create): Update call to vm_map_create. * vm/vm_kern.c (kmem_submap): Remove `pageable' argument. Update call to vm_map_setup. (kmem_init): Update call to vm_map_setup. * vm/vm_kern.h (kmem_submap): Update declaration. * vm/vm_map.c (vm_map_setup): Remove `pageable' argument. Don't set `entries_pageable' member. (vm_map_create): Likewise. (vm_map_copyout): Don't bother creating copies of page entries with the right pageability. (vm_map_copyin): Don't set `entries_pageable' member. (vm_map_fork): Update call to vm_map_create. * vm/vm_map.h (struct vm_map_header): Remove `entries_pageable' member. (vm_map_setup, vm_map_create): Remove `pageable' argument.
* vm: fix boot on xenRichard Braun2016-08-291-3/+10
| | | | | * vm/vm_map.c (_vm_map_entry_create: Make sure there is a thread before accessing VM privilege.
* VM: fix pageout-related deadlockRichard Braun2016-08-071-0/+18
| | | | | * vm/vm_map.c (_vm_map_entry_create): Temporarily set the current thread as VM privileged.
* Augment VM maps with task namesRichard Braun2016-08-061-1/+2
| | | | | | | | | | | | This change improves the clarity of "no more room for ..." VM map allocation errors. * kern/task.c (task_init): Call vm_map_set_name for the kernel map. (task_create): Call vm_map_set_name where appropriate. * vm/vm_map.c (vm_map_setup): Set map name to NULL. (vm_map_find_entry_anywhere): Update error message to include map name. * vm/vm_map.h (struct vm_map): New `name' member. (vm_map_set_name): New inline function.