| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
This is basically a no-op but ensures we are doing smp
bringup correctly.
Message-ID: <20241210072926.911061-5-damien@zamaudio.com>
|
|
|
|
|
| |
The number is actually a mask bit per cpu.
Message-ID: <20241210072926.911061-2-damien@zamaudio.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
On SMP builds with 2 CPU cores, we've seen whole-system lock-ups caused
by irqdev.tot_num_intr getting set to -1, even though it's supposed to
always stay non-negative. Indeed, it was modified without the
appropriate synchronization. Fix this by protecting it, as well as
various other internals of device/intr with a simple_lock_irq.
Reported-by: Damien Zammit <damien@zamaudio.com>
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20241210115705.710555-3-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
| |
'true' and 'false' are keywords in C23. This will equally be an issue on
older standards if something pulls in stdbool.h, which make these into
macros. In either of these two cases, just typedef 'boolean' from
'bool' (avoiding the enum), and rely on 'true' and 'false' being valid
values of the type.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20241210115705.710555-2-bugaevc@gmail.com>
|
|
|
|
|
|
|
| |
Fixes Wincompatible-pointer-types errors on GCC 15.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20241210115705.710555-1-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
| |
The condition was intended for non-BSP processors to
disable timer, but apic_id != 0 means it could affect BSP
if its apic id is non-zero. Fixes this bug.
Message-ID: <20241209121706.879984-7-damien@zamaudio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since modern x86 cpus only support 4 bits of destination field
in ICR, we could only address up to 16 processors, assuming
their physical APIC ID was < 0x10. Some processors eg AMD fam15h
have physical apic ids starting at 0x10 but only support 4 bits.
So these lapics are unaddressable using physical destination mode.
Therefore, we switch to using logical destinations for IPIs which
gives us 8 bits of unique mask for addressing up to 8 groups of processors.
INIT and STARTUP is not changed here.
Message-ID: <20241209121706.879984-6-damien@zamaudio.com>
|
|
|
|
|
|
|
| |
Prepare for smp parallel init where we want to call these
two functions on different cpus at different times.
Message-ID: <20241209121706.879984-5-damien@zamaudio.com>
|
|
|
|
|
|
| |
Since we just set up the gs segment, we can use
CPU_NUMBER instead of CPU_NUMBER_NO_STACK.
Message-ID: <20241209121706.879984-3-damien@zamaudio.com>
|
| |
|
|
|
|
|
|
|
| |
The current segmentation already adds -KERNELBASE.
But only when accessing the memory.
Message-ID: <20241209121706.879984-2-damien@zamaudio.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Non-master processors cannot have cpu_number() == 0.
The synchronisation fails hard if the cpu number is wrong.
Alert the condition if this is the case.
(On AMD fam15h, this assert currently fails, but I haven't
been able to boot it with smp yet either).
Message-ID: <20241207101222.800350-1-damien@zamaudio.com>
|
|
|
|
|
|
| |
msgh_size is a mach_msg_size_t which represent an unsigned int, so %u
must me used there instead of %d
Message-ID: <20241206134419.6609-1-etienne.brateau@gmail.com>
|
|
|
|
|
|
|
|
| |
Fallthrough was incorrectly using fxsave() instead of
xsave() or xsaveopt().
TESTED: on AMD fam15h: no longer throws "No coprocessor" exception.
Message-ID: <20241205074929.704111-1-damien@zamaudio.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
the call vm_page_seg_pull_cache_page() return an vm_page (src) with his
object being locked, as we don’t unlock before doing the vm_page_insert,
it is still lock there, and so trying to relock it cause a deadlock.
Replace this lock by an assert.
This case was not seen as for non-smp locking is a no-op.
Message-ID: <20241202182721.27920-2-etienne.brateau@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When an irq handler dies, we are decrementing the n_unacked count
and calling __enable_irq() the right number of times, but we need
to decrement the total interrupt count by the number that were lost
and also clear that number.
This fixes a hang when a shared irq handler quits and leaves some
unacked interrupts.
Message-ID: <20241123222020.245519-1-damien@zamaudio.com>
|
|
|
|
| |
Message-ID: <20241119191048.43597-1-etienne.brateau@gmail.com>
|
| |
|
|
|
|
| |
Message-ID: <20241027092828.3162279-1-damien@zamaudio.com>
|
|
|
|
|
|
|
|
|
|
|
| |
This change forces multiboot loader to provide video mode info
and also sets the preferred video mode to EGA text to ensure
existing console behaviour still works.
When support for graphical consoles is provided, we can change
the preferred mode to linear framebuffer.
Message-ID: <20241024001047.3033826-2-damien@zamaudio.com>
|
|
|
|
| |
Message-ID: <20241024001047.3033826-1-damien@zamaudio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* i386/i386at/acpi_parse_apci.c (acpi_print_info): %lx -> %llx
i386/i386at/acpi_parse_apic.c: In function 'acpi_print_info':
i386/i386at/acpi_parse_apic.c:51:25: warning: format '%lx' expects argument of type 'long unsigned int', but argument 2 has type 'phys_addr_t' {aka 'long long unsigned int'} [-Wformat=]
51 | printf(" rsdp = 0x%lx\n", rsdp);
| ~~^ ~~~~
| | |
| | phys_addr_t {aka long long unsigned int}
| long unsigned int
| %llx
Message-ID: <20241022173641.2774-2-jbranso@dismail.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I compiled with ./configure --enable-xen --enable-acpi.
* i386/intel/pmap.c (pmap_bootstrap_xen, pmap_bootstrap,
pmap_set_page_readwrite, pmap_clear_bootstrap_pagetable, pmap_map_mfn,
pmap_expand_level, pmap_collect): Lots of tiny changes. I've copied
in some of the error messages.
cast many variables to (long unsigned int), (vm_offset_t) -> (unsigned
long), %llx <-- (uint64_t) variable,
In file included from i386/intel/pmap.c:63:
i386/intel/pmap.c: In function 'pmap_bootstrap_xen':
i386/intel/pmap.c:703:39: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=]
703 | panic("couldn't pin page %p(%lx)", l1_map[n_l1map], (vm_offset_t) kv_to_ma (l1_map[n_l1map]));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
i386/intel/pmap.c: In function 'pmap_set_page_readwrite':
i386/intel/pmap.c:897:23: warning: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'vm_offset_t' {aka 'unsigned int'} [-Wformat=]
897 | panic("couldn't set hiMMU readwrite for addr %lx(%lx)\n", vaddr, (vm_offset_t) pa_to_ma (paddr));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~
| |
| vm_offset_t {aka unsigned int}
./kern/debug.h:67:50: note: in definition of macro 'panic'
67 | Panic (__FILE__, __LINE__, __FUNCTION__, s, ##__VA_ARGS__)
| ^
i386/intel/pmap.c:897:64: note: format string is defined here
897 | panic("couldn't set hiMMU readwrite for addr %lx(%lx)\n", vaddr, (vm_offset_t) pa_to_ma (paddr));
| ~~^
| |
| long unsigned int
| %x
i386/intel/pmap.c:897:23: warning: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'unsigned int' [-Wformat=]
897 | panic("couldn't set hiMMU readwrite for addr %lx(%lx)\n", vaddr, (vm_offset_t) pa_to_ma (paddr));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./kern/debug.h:67:50: note: in definition of macro 'panic'
67 | Panic (__FILE__, __LINE__, __FUNCTION__, s, ##__VA_ARGS__)
| ^
i386/intel/pmap.c:897:68: note: format string is defined here
897 | panic("couldn't set hiMMU readwrite for addr %lx(%lx)\n", vaddr, (vm_offset_t) pa_to_ma (paddr));
| ~~^
| |
| long unsigned int
| %x
Message-ID: <20241022173641.2774-1-jbranso@dismail.de>
|
|
|
|
|
|
|
| |
This fixes a spurious intnull(9) from occurring on real hardware
during ACPI startup when compiled with --enable-apic
Message-ID: <20241021032217.2915842-1-damien@zamaudio.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* i386/intel/pmap.c (pmap_page_table_page_dealloc): define it only on
the Xen platform. Best not to delete page_alloc, so we know how to do
so if need be.
i386/intel/pmap.c:1265:1: warning: 'pmap_page_table_page_dealloc' defined but not used [-Wunused-function]
1265 | pmap_page_table_page_dealloc(vm_offset_t pa)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
i386/intel/pmap.c:1171:1: warning: 'pmap_page_table_page_alloc' defined but not used [-Wunused-function]
1171 | pmap_page_table_page_alloc(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Message-ID: <20241020190744.2522-3-jbranso@dismail.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* kern/slab.c(kalloc_init): %lu -> %zu
kern/slab.c: In function 'kalloc_init':
kern/slab.c:1349:33: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
1349 | sprintf(name, "kalloc_%lu", size);
| ~~^ ~~~~
| | |
| | size_t {aka unsigned int}
| long unsigned int
| %u
Message-ID: <20241020190744.2522-2-jbranso@dismail.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* vm/vm_page.c(vm_page_setup): %lu -> %zu
vm/vm_page.c: In function 'vm_page_setup':
vm/vm_page.c:1425:41: warning: format '%lu' expects argument of type 'long unsigned int', but argument 2 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
1425 | printf("vm_page: page table size: %lu entries (%luk)\n", nr_pages,
| ~~^ ~~~~~~~~
| | |
| long unsigned int size_t {aka unsigned int}
| %u
vm/vm_page.c:1425:54: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Wformat=]
1425 | printf("vm_page: page table size: %lu entries (%luk)\n", nr_pages,
| ~~^
| |
| long unsigned int
| %u
1426 | table_size >> 10);
| ~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
Message-ID: <20241020190744.2522-1-jbranso@dismail.de>
|
|
|
|
| |
Now that gnumach does not define it any more.
|
| |
|
|
|
|
| |
Message-ID: <20240904201806.510082-2-luca@orpolo.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* i386/i386/fpu.c: extend current getter and setter to support the
extended state; move the struct casting here to reuse the locking
and allocation logic for the thread state; make sure the new state
is set as valid, otherwise it won't be applied; add
i386_get_xstate_size() to dynamically retrieve the FPU state size.
* i386/i386/fpu.h: update prototypes to accept generic thread state
* i386/i386/pcb.c: forward raw thread state to getter and setter, only
checking for minimum size and use the new i386_get_xstate_size()
helper.
* i386/include/mach/i386/mach_i386.defs: expose the new helper
i386_get_xstate_size().
* i386/include/mach/i386/thread_status.h: add interface definition for
I386_XFLOAT_STATE and the corresponding data structure.
Message-ID: <20240904201806.510082-1-luca@orpolo.org>
|
|
|
|
|
|
| |
* x86_64/locore.S: adjust to the changes in the thread state
structure (segment registers), and add the missing opcode.
Message-ID: <20240904201806.510082-3-luca@orpolo.org>
|
|
|
|
|
|
|
|
| |
* tests/test-machmsg.c: add two use cases used by glibc during signal
handling
* tests/include/testlib.h
* tests/testlib.c: add new wait_thread_terminated() helper
Message-ID: <20240821163616.189307-3-luca@orpolo.org>
|
|
|
|
|
| |
struct i386_xfp_xstate_header header is at offset 440 of struct
i386_xfp_save, so not a multiple of 64 anyway.
|
|
|
|
| |
Message-ID: <6qm4fdtthi5nrmmleum7z2xemxz77adohed454eaeuzlmvfx4d@l3pyff4tqwry>
|
|
|
|
|
| |
Remove unnecessary definitions from sys/types.h.
Message-ID: <oitneneybjishhqq7bgedkasrqqd6nq7vselruaacw27sbe47e@6rt3xbi7fnie>
|
|
|
|
| |
Message-ID: <4cea36qrjeo7tkklmqcwgkrxstxiqykdofha65zxmpni2o6lp3@2offokab6fvn>
|
| |
|
|
|
|
|
| |
with -Werror=incompatible-pointer-types and
-Werror=implicit-function-declaration.
|
|
|
|
| |
Message-ID: <376mwj4qtzxqgg2p4teqefxep7qz2kxll25synb3sulgof24j5@wxhqtaf7ei32>
|
|
|
|
|
|
|
|
| |
* tests/test-machmsg.c: add more combinations to existing cases:
- make tx and rx ports independent in the send/receive tests
- add two more variants for send/receive tests, using two separate
system calls, using different code paths in mach_msg().
Message-ID: <20240612062755.116308-2-luca@orpolo.org>
|
|
|
|
|
|
|
|
|
|
|
| |
* ipc/copy_user.c: recent MIG stubs should always fill the size
correctly in the msg header, but we shouldn't rely on that. Instead,
we use the size that was correctly copied-in, overwriting the value
in the header. This is already done by the 32-bit copyinmsg(), and
was missing in the 64-bit version.
Furthermore, the assertion about user/kernel size make sense with
and without USER32, so take it out if the #ifdef.
Message-ID: <20240612062755.116308-1-luca@orpolo.org>
|
|
|
|
|
|
|
| |
This tests generating and handling exceptions, thread_get_state(),
thread_set_state(), and newly added thread_set_self_state(). It does
many of the same things that glibc does when handling a signal.
Message-ID: <20240416071013.85596-1-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a new Mach trap that sets the calling thread's state to the
passed value, as if with a call to the thread_set_state() RPC. If the
flavor of state being set is the one that contains the register used for
syscall return value (i386_THREAD_STATE or i386_REGS_SEGS_STATE on x86,
AARCH64_THREAD_STATE on AArch64), the set register value is *not*
overwritten with KERN_SUCCESS when the state gets set successfully, yet
errors do get reported if the syscall fails.
Although the trap is intended to enable userland to implement sigreturn
functionality in the AArch64 port (more on which below), the trap itself
is architecture-independent, and fully implemented in terms of the
existing kernel routines (thread_setstatus & thread_set_syscall_return).
This trap's functionality is similar to sigreturn() on Unix or
NtContinue() on NT. The use case for these all is restoring the local
state of an interrupted thread in the following set-up:
1. A thread is running some arbitrary code.
2. An event happens that deserves the thread's immediate attention,
analogous to a hardware interrupt request. This might be caused by
the thread itself (e.g. running into a Mach exception that was
arranged to be handled by the same thread), or by external events
(e.g. receiving a Unix SIGCHLD).
3. Another thread (or perhaps the kernel, although this is not the case
on Mach) suspends the thread, saves its state at the point of
interruption, alters its state to execute some sort of handler for
the event, and resumes the thread again, now running the handler.
4. Once the thread is done running the handler, it wants to return to
what it was doing at the time it was interrupted. To do this, it
needs to restore the state as saved at the moment of interruption.
Unlike with setjmp()/longjmp(), we cannot rely on the interrupted logic
collaborating in any way, as it's not aware that it's being interrupted.
This means that we have to fully restore the state, including values of
all the general-purpose registers, as well as the stack pointer, program
counter, and any state flags.
Depending on the instruction set, this may or may not be possible to do
fully in userland, simply by loading all the registers with their saved
values. It should be more or less easy to load the saved values into
general-purpose registers, but state flags and the program counter can
be more of a challenge. Loading the program counter value (in other
words, performing an indirect jump to the interrupted instruction) has
to be the very last thing we do, since we don't control the program flow
after that. The only real place program counter can be loaded from is
popped off the stack, since all general-purpose registers would already
contain their restored values by that point, and using global storage is
incompatible with another interruption of the same kind happening at the
time we were about to return. For the same reason, the saved program
counter cannot be really stored outside of the "active" stack area (such
as below the stack pointer), since otherwise it can get clobbered by
another interruption.
This means that to support fully-userland returns, the instruction set
must provide a single instruction that loads an address from the stack,
adjusts the stack pointer, and performs an indirect jump to the loaded
address. The instruction must also either preserve (previously
restored) state flags, or additionally load state flags from the stack
in addition to the jump address.
On x86, 'ret' is such an instruction: it pops an address from the stack,
adjusting the stack pointer without modifying flags, and performs an
indirect jump to the address. On x86_64, where the ABI mandates a red
zone, one can use the 'ret imm16' variant to additionally adjust the
stack pointer by the size of the red zone, atomically restoring the
value of the stack pointer at the time of the interruption while loading
the return address from outside the red zone. This is how sigreturn is
implemented in glibc for the Hurd on x86.
On ARM AArch32, 'pop {pc}' (alternatively written 'ldr pc, [sp], #4') is
such an instruction: since SP and PC are just general-purpose, directly
accessible registers (r13 and r15), it is possible to perform a load
from the address pointed to by SP into PC, with a post-increment of SP.
It is, in fact, possible to restore all the other general-purpose
registers too in a single instruction this way: 'pop {r0-r12, r14, r15}'
will do that; here r13, the stack pointer, gets incremented after all
the other registers get loaded from the stack. This also preserves the
CPSR flags, which would need to be restored just prior to the 'pop'.
On ARM AArch64 however, PC is no longer a directly accessible general-
purpose register (and SP is only accessible that way by some of the
instructions); so it is no longer possible to load PC from memory in a
single instruction. The only way to perform an indirect jump is by
using one of the dedicated branching instructions ('br', 'blr', or
'ret'). All of them accept the address to branch to in a general-
purpose register, which is incompatible with our use case.
Moreover, with the BTI extension, there is a BTYPE field in PSTATE that
tracks which type (if any) of an indirect branch was the last executed
instruction; this is then used to raise an exception if the instruction
the indirect branch lands on was not intended to be a target of an
indirect branch (of a matching type). It is important to restore the
BTYPE (among the other state) when returning to an interrupted context;
failing to do that will either cause an unexpected BTI failure exception
(if the last executed instruction before the interruption was not an
indirect branch, but the last instruction of the restoration logic is),
or open up a window for exploitation (if the last executed instruction
before the interruption was an indirect branch, but the last instruction
of the restoration logic is not -- note that 'ret' is not considered an
indirect branch for the purposes of BTI).
So, it is not possible to fully restore the state of an interrupted
context in userland on AArch64. The kernel can do that however (and is
in fact doing just that every time it handles a fault or an IRQ): the
'eret' instruction for returning from an exception is accessible to EL1
(the kernel), but not EL0 (the user). 'eret' atomically restores PC
from the ELR_EL1 system register, and PSTATE from the SPSR_EL1 system
register (and does other things); both of these system registers are
inaccessible from userland, and so couldn't have been used by the
interrupted context for any purpose, meaning their values doesn't need
to be restored. (They can be used by the kernel code, which presents an
additional complication when it's the kernel context that gets
interrupted and has to be returned to. To make this work, the kernel
masks interrupt requests and avoids doing anything that could cause a
fault when using those registers.)
The above justifies the need for a kernel API to atomically restore
saved userland state on AArch64 (and possibly other platforms that
aren't x86). Mach already has an API to set state of a thread, namely
the thread_set_state() RPC; however, a thread calling thread_set_state()
on itself is explicitly disallowed. We have previously relaxed this
restriction to allow setting i386_DEBUG_STATE and i386_FSGS_BASE_STATE
on the current thread, so one way to address the need for such an API on
AArch64 would be to also allow setting AARCH64_THREAD_STATE on the
current thread. That is what I have originally proposed and
implemented. Like the thread_set_self_state() trap implemented by this
patch, the implementation of setting AARCH64_THREAD_STATE on the current
thread needs to ensure that the set value of the x0 register does not
get immediately overwritten with the return value of the mach_msg()
trap.
However, it's not only the return value of the mach_msg() trap that is
important, but also the RPC reply message. The thread_set_state() RPC
should not generate a reply message when used for returning to an
interrupted context, since there'd be nobody expecting the message.
This could be achieved by special-casing that in the kernel as well, or
(simpler) by userland not passing a valid reply port in the first place.
Note that the implementation of sigreturn in glibc already uses the
strategy of passing an invalid reply port for the last RPC is does
before returning to the interrupted context (which is deallocating the
reply port used by the signal handler).
Not passing a valid reply port and consequently not blocking on awaiting
the reply message works, since the way Mach is implemented, kernel RPCs
are always executed synchronously when userland sends the request
message (unless the routine implementation includes explicit asynchrony,
as device RPCs do, and gsync_wait() should do, but currently doesn't),
meaning the RPC caller never has to *wait* for the reply message, as one
is produced immediately. In other words, the mere act of invoking a
kernel RPC (that does not involve explicit asynchrony) is enough to
ensure it completes when mach_msg() returns, even if a reply message is
not received (whether because an invalid reply port has been specified,
or because MACH_RCV_MSG wasn't passed to mach_msg(), or because a
message other than the kernel RPC's reply was received by the call).
However, the same is not true when interposing is involved, and the
thread's self port does not in fact point directly to the kernel, but to
a userspace proxy of some sort. The two primary examples of this are
Hurd's rpctrace tool, which interposes all the task's ports and proxies
all RPCs after tracing them, and Mach's old netmsg/netname server, which
proxies ports and messages over network. In this case, the actual
implementation only runs once the request reaches the actual kernel, and
not once the request message has been sent by the original caller, so it
*is* necessary for the caller to await the reply message if it wants to
make sure that the requested action has been completed. This does not
cause much issues for deallocation of a reply port on the sigreturn code
path in glibc, since that only delays when the port is deallocated, but
does not otherwise change the program behavior. With
thread_set_state(mach_thread_self()), however, this would be quite
catastrophic, since the message-send would return back to the caller
without changing its state, and the actual change of state would only
happen at some later point.
This issue is avoided nicely by turning the functionality into an
explicit Mach trap rather than an RPC. As it's not an RPC, it doesn't
involve messaging, and doesn't need a reply port or a reply message. It
is always a direct call to the kernel (and not to any interposer), and
it's always guaranteed to have completed synchronously once the trap
returns. That also means that the thread_set_self_state() call won't be
visible to rpctrace or forwarded over network for netmsg, but this is
fine, since all it does is sets thread state (i.e. register values); the
thread could do the same on its own by issuing relevant machine
instruction without involving any Mach abstractions (traps or RPCs) at
all if it weren't for the need of atomicity.
Finally, this new trap is unfortunately somewhat of a security concern
(as any sigreturn-like functionality is in general), since it would
potentially allow an attacker who already has a way to invoke a function
with 3 controlled argument values to set the values of all registers to
any desired values (sigreturn-oriented programming). There is currently
no mitigation for this other than the generic ones such as PAC and stack
check guards.
The limit of 150 used in the implementation has been chosen to be large
enough to fit the largest thread state flavor so far, namely
AARCH64_FLOAT_STATE, but small enough to not overflow the 4K stack. If
a new thread state flavor is added that is too big to fit on the stack,
the implementation should be switched to use kalloc instead of on-stack
storage.
Message-ID: <20240415090149.38358-9-bugaevc@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Notes:
* TPIDR_EL0, the TLS pointer, is included in the generic state directly.
* TPIDR2_EL0, part of the SME extension, is not included in the generic
state. If we add SME support, it will be a part of something like
aarch64_sme_state.
* CPSR is not a real register in AArch64 (unlike in AArch32), but a
collection of individually accessible bits and pieces from PSTATE.
Due to how the kernel accesses user mode's PSTATE (via SPSR), it's
convenient to represent PSTATE as a pseudo-register in the same
format as SPSR. This is also what QEMU and XNU do.
* There is no hardware-enforced 'natural' order to place the registers
in, since no registers get pushed onto the stack on exception entry.
Saving and restoring registers from an instance of struct
aarch64_thread_state is implemented entirely in software, and the
format is essentially arbitrary.
* aarch64_float_state includes registers of a 128-bit type; this may
create issues for compilers other than GCC.
* fp_reserved is not a register, but a placeholder. If and when Arm
adds another floating-point meta-register, this will be changed to
represent it, and that would not be considered a compatibility break,
so don't access fp_reserved by name, or its value, from userland.
Instead, memset the whole structure to 0 if starting from scratch, or
memcpy an existing structure.
More thread state types could be added in the future, such as
aarch64_debug_state, aarch64_virt_state (for hardware-accelerated
virtualization), potentially ones for PAC, SVE/SME, etc.
Message-ID: <20240415090149.38358-8-bugaevc@gmail.com>
|
|
|
|
|
|
|
| |
A few yet-unimplemented codes are also sketched out; these are included
so you know roughly what to expect once the missing functionality gets
implemented, but are not in any way stable or usable.
Message-ID: <20240415090149.38358-7-bugaevc@gmail.com>
|