Programming
linux x86 x86-64 system-calls ptrace
Updated Sun, 21 Aug 2022 17:09:28 GMT

Can ptrace tell if an x86 system call used the 64-bit or 32-bit ABI?


I'm trying to use ptrace to trace all syscalls made by a separate process, be it 32-bit (IA-32) or 64-bit (x86-64). My tracer would run on a 64-bit x86 installation with IA-32 emulation enabled, but ideally would be able to trace both 64-bit and 32-bit applications, including if a 64-bit application forks and execs a 32-bit process.

The issue is that, since 32-bit and 64-bit syscall numbers differ, I need to know whether a process is 32-bit or 64-bit to determine which syscall it used, even if I have the syscall number. There seem to be imperfect methods, like checking /proc/<pid>/exec or (as strace does) the size of the registers struct, but nothing reliable.

Complicating this is the fact that 64-bit processes can switch out of long mode to execute 32-bit code directly. They can also make 32-bit int $0x80 syscalls, which, of course, use the 32-bit syscall numbers. I don't "trust" the processes I trace to not use these tricks, so I want to detect them correctly. And I've independently verified that in at least the latter case, ptrace sees the 32-bit syscall numbers and argument register assignments, not the 64-bit ones.

I poked around in the kernel source and came across the TS_COMPAT flag in arch/x86/include/asm/processor.h, which appears to be set whenever a 32-bit syscall is made by a 64-bit process. The only problem is that I have no idea how to access this flag from userland, or if it is even possible.

I also thought about reading the %cs and comparing it to $0x23 or $0x33, inspired by this method for switching bitness in a running process. But this only detects 32-bit processes, not necessarily 32-bit syscalls (those made with int $0x80) from a 64-bit process. It's also fragile since it relies on undocumented kernel behavior.

Finally, I noticed that the x86 architecture has a bit for long mode in the Extended Feature Enable Register MSR. But ptrace has no way of reading the MSR from a tracee, and I feel like reading it from within my tracer will be inadequate because my tracer is always running in long mode.

I'm at a loss. Perhaps I could try and use one of those hacks—at this point I'm leaning towards %cs or the /proc/<pid>/exec method—but I want something durable that will actually distinguish between 32-bit and 64-bit syscalls. How can a process using ptrace under x86-64, which has detected that its tracee made a syscall, reliably determine whether that syscall was made with the 32-bit (int $0x80) or 64-bit (syscall) ABI? Is there some other way for a user process to gain this information about another process that it is authorized to ptrace?




Solution

Interesting, I hadn't realized that there wasn't an obvious smarter way that strace could use to correctly decode int 0x80 from 64-bit processes. (This is being worked on, see this answer for links to a proposed kernel patch to add PTRACE_GET_SYSCALL_INFO to the ptrace API. strace 4.26 already supports it on patched kernels.)

Update: now supports per-syscall detection IDK which mainline kernel version added the feature. I tested on Arch Linux with kernel version 5.5 and strace version 5.5.

e.g. this NASM source assembled into a static executable:

mov eax, 4
int 0x80
mov eax, 60
syscall

gives this trace: nasm -felf64 foo.asm && ld foo.o && strace ./a.out

execve("./foo", ["./foo"], 0x7ffcdc233180 /* 51 vars */) = 0
strace: [ Process PID=1262249 runs in 32 bit mode. ]
write(0, NULL, 0)                       = 0
strace: [ Process PID=1262249 runs in 64 bit mode. ]
exit(0)                                 = ?
+++ exited with 0 +++

strace prints a message every time a system call uses a different ABI bitness than previously. Note that the message about runs in 32 bit mode is completely wrong; it's merely using the 32-bit ABI from 64-bit mode. "Mode" has a specific technical meaning for x86-64, and this is not it.


With older kernels

As a workaround, I think you could disassemble the code at RIP and check whether it was the syscall instruction (0F 05) or not, because ptrace does let you read the target process's memory.

But for a security use-case like disallowing some system calls, this would be vulnerable to a race condition: another thread in the syscall process could rewrite the syscall bytes to int 0x80 after they execute, but before you can peek at them with ptrace.


You only need to do that if the process is running in 64-bit mode, otherwise only the 32-bit ABI is available. If it's not, you don't need to check. (The vdso page can potentially use 32-bit mode syscall on AMD CPUs that support it but not sysenter. Not checking in the first place for 32-bit processes avoids this corner case.) I think you're saying you have a reliable way to detect that at least.

(I haven't used the ptrace API directly, just the tools like strace that use it. So I hope this answer makes sense.)





Comments (5)

  • +0 – Checking the opcode at %rip/%eip hadn't occurred to me, but it makes perfect sense! Thanks for that insight. And I appreciate the mention of that corner caseI was hoping to be able to use opcode scanning as my primary method, but looks like I'll need to check the bitness of the process first before delegating to opcode scanning for 64-bit processes. In any event, thanks for the help! — Nov 25, 2018 at 07:25  
  • +0 – So, to be clearyou're saying that in a 32-bit process's vDSO page (but not that of a 64-bit process), the syscall instruction uses the 32-bit ABI? That's an interesting inconsistency. — Nov 25, 2018 at 07:30  
  • +1 – @ameed: AMD's 32-bit mode syscall is basically a different instruction, even though it has the same mnemonic and the same opcode as 64-bit syscall. It obviously can't use R11d in legacy mode on a pure 32-bit CPU, because that register doesn't exist. The kernel side is different in compat mode than legacy mode, though, and IIRC Linux doesn't even use it in legacy mode because it's too badly designed to be usable. But it will in compat mode if sysenter isn't available. Syscall or sysenter on 32 bits Linux? — Nov 25, 2018 at 18:18  
  • +1 – @ameed: see also What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? for links to Linux's entry_64_compat.S (32-bit ABI entry points into a 64-bit kernel) and entry_32.S (32-bit ABI entry points into a 32-bit kernel). Specifically github.com/torvalds/linux/blob/… for syscall from a 32-bit process into a 64-bit kernel, where it explains that 32-bit Linux kernels disable 32-bit syscall because it's too poorly designed. — Nov 25, 2018 at 18:20  
  • +0 – I'll give these a look. Thanks! — Nov 25, 2018 at 19:58