Assembly clone syscall thread function not called

1

Im trying to create a thread using 'clone' syscall ... i searched toooooooo much ! for example,
link1
link2

and now this is my source code in assembly for linux x64:

FORMAT  ELF64 EXECUTABLE
ENTRY   thread_linux_x64

THREAD_MEM_SIZE = 1024

define PROT_READ        0x1
define PROT_WRITE       0x2
define PROT_EXEC        0x4

define MAP_PRIVATE      0x02
define MAP_ANONYMOUS    0x20

define CLONE_VM         0x00000100
define CLONE_FS         0x00000200
define CLONE_FILES      0x00000400
define CLONE_SIGHAND    0x00000800
define CLONE_PARENT     0x00008000
define CLONE_THREAD     0x00010000
define CLONE_IO         0x80000000

define SIGCHLD          20

CLONE_FLAGS = CLONE_VM OR CLONE_FS OR CLONE_FILES OR CLONE_SIGHAND OR CLONE_PARENT OR CLONE_THREAD OR CLONE_IO

MMAP_FLAG       = MAP_PRIVATE OR MAP_ANONYMOUS
MMAP_PERMISSION = PROT_READ   OR PROT_WRITE OR PROT_EXEC

SEGMENT READABLE EXECUTABLE
thread_linux_x64:

        ; Memory allocation using 'mmap' syscall
        mov     eax,  9                 ; sys_mmap
        xor     edi,  edi               ; addr = null (0)
        mov     esi,  THREAD_MEM_SIZE   ; Memory size
        mov     edx,  MMAP_PERMISSION   ; Permission
        mov     r10d, MMAP_FLAG         ; Flag
        mov     r8d,  -1                ; Fd = -1 (invalid fd)
        xor     r9d,  r9d               ; Offset = 0
        syscall

        cmp     rax, 0                  ; error ?
        jl      .error_mmap

        mov     r13, rax                ; r13 = memory address

        ; create a new child process (thread) using 'clone' syscall
        mov     eax,  56                                ; sys_clone
        mov     edi,  CLONE_FLAGS                       ; flags
        lea     rsi,  [r13 + THREAD_MEM_SIZE - 8]       ; stack address - 8 (8-BYTE to store the function address)
        mov     QWORD [rsi], thread_func                ; set function address
        xor     edx,  edx                               ; parent_tid = NULL (0)
        xor     r10d, r10d                              ; child_tid  = NULL (0)
        xor     r8d,  r8d                               ; tid = 0
        syscall

        cmp     rax, 0          ; error ?
        jle     .error_clone

        ; wait for the created thread to exit using 'wait4' syscall
        mov     rdi, rax        ; created-thread pid
        mov     eax, 61         ; sys_wait4
        xor     esi, esi        ; stat_addr = null (0)
        xor     edx, edx        ; options = 0
        xor     r10d, r10d      ; rusage = 0
        syscall

        ; free the allocated memory (r13) using 'munmap' syscall
        mov     eax, 11                 ; sys_munmap
        mov     rdi, r13                ; memory address
        mov     esi, THREAD_MEM_SIZE    ; memory size
        syscall

        ; exit (return 0 (success))
        mov     eax, 60         ; sys_exit
        xor     edi, edi        ; return 0
        syscall

.error_mmap:
        ; set error message to print
        mov     rsi, .mmap_failed_msg           ; error message
        mov     edx, .mmap_failed_msg_len       ; error message length
        jmp     short .error

.error_clone:
        ; free the allocated memory (r13) using 'munmap' syscall
        mov     eax, 11                 ; sys_munmap
        mov     rdi, r13                ; memory address
        mov     esi, THREAD_MEM_SIZE    ; memory size
        syscall

.error:
        ; print error message
        mov     eax, 1          ; sys_write
        xor     edi, edi        ; stdout (0)
        syscall

        ; exit (return 1 (error))
        mov     eax, 60         ; sys_exit
        mov     edi, 1          ; return 1
        syscall

.mmap_failed_msg db 'Memory allocation failed', 0x0a, 0x00
.mmap_failed_msg_len = $ - .mmap_failed_msg

.clone_failed_msg db 'Unable to create a new child process', 0x0a, 0x00
.clone_failed_msg_len = $ - .clone_failed_msg

thread_func:

        ; print message
        mov     eax, 1                  ; sys_write
        xor     edi, edi                ; stdout (0)
        mov     rsi, .message           ; message address
        mov     edx, .message_len       ; message length
        syscall

        ; exit (return 0 (success))
        mov     eax, 60         ; sys_exit
        xor     edi, edi        ; return 0
        syscall

        .message db 'Child process is called', 0x0a, 0x00
        .message_len = $ - .message   

everything is looks normal !!!! but when i run this program, i get nothing !!!! NO 'Child process is called' message print ! in fact, i think my thread function is not running ... i also got strace test and this is the result !!!

trace -f ./thread_linux_x64

execve("./thread_linux_x64", ["./thread_linux_x64"], 0x7fffd4db1b58 /* 53 vars */) = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32ba3e4000
clone(child_stack=0x7f32ba3e43f8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 32064 attached
) = 32064
[pid 32064] munmap(0x7f32ba3e4000, 1024 <unfinished ...>
[pid 32063] wait4(32064, NULL, 0, NULL) = -1 ECHILD (No child processes)
[pid 32064] <... munmap resumed>)       = 0
[pid 32063] munmap(0x7f32ba3e4000, 1024 <unfinished ...>
[pid 32064] write(0, "", 0 <unfinished ...>
[pid 32063] <... munmap resumed>)       = 0
[pid 32063] exit(0 <unfinished ...>
[pid 32064] <... write resumed>)        = 0
[pid 32063] <... exit resumed>)         = ?
[pid 32064] exit(1)                     = ?
[pid 32064] +++ exited with 1 +++
+++ exited with 0 +++

This problem is driving me crazy! because there is no error ... and everything looks just fine !!!!

Update:

here i change my source code to create the thread without calling thread_create or ... function (in the main function) and now my problem fixed ... in fact, 'thread_func' now called but i have a new problem ! i get Segment failure !!!! i think it's about my CLONE_FLAGS !!!!

FORMAT  ELF64 EXECUTABLE
ENTRY   thread_linux_x64

THREAD_MEM_SIZE = 1024

define PROT_READ        0x1
define PROT_WRITE       0x2
define PROT_EXEC        0x4

define MAP_PRIVATE      0x02
define MAP_ANONYMOUS    0x20

define CLONE_VM         0x00000100
define CLONE_FS         0x00000200
define CLONE_FILES      0x00000400
define CLONE_SIGHAND    0x00000800
define CLONE_PARENT     0x00008000
define CLONE_THREAD     0x00010000
define CLONE_IO         0x80000000

CLONE_FLAGS = CLONE_VM OR CLONE_FS OR CLONE_FILES OR CLONE_SIGHAND OR CLONE_PARENT OR CLONE_THREAD OR CLONE_IO

MMAP_FLAG       = MAP_PRIVATE OR MAP_ANONYMOUS
MMAP_PERMISSION = PROT_READ OR PROT_WRITE OR PROT_EXEC

SEGMENT READABLE EXECUTABLE
thread_linux_x64:

        ; Memory allocation using 'mmap' syscall (sys_mmap (9))
        mov     eax, 9                  ; sys_mmap
        xor     edi, edi                ; addr = 0 (NULL)
        mov     esi, THREAD_MEM_SIZE    ; Memory allocation size
        mov     edx, MMAP_PERMISSION    ; Permission (PROT_READ, ...)
        mov     r10d, MMAP_FLAG         ; Flag (MAP_PRIVATE, ...)
        mov     r8d, -1                 ; File descriptor (Fd) = -1 (invalid File descriptor)
        xor     r9d, r9d                ; Offset = 0
        syscall

        test    rax, rax                ; ERROR ?
        jl      .error_mmap

        mov     r13, rax                ; R13 = Memory address (RAX)

        ; Create a new child process (thread) using 'clone' syscall (sys_clone (56))
        mov     eax, 56                                 ; sys_clone
        mov     edi, CLONE_FLAGS                        ; Flag (CLONE_VM, ...)
        lea     rsi, [r13 + THREAD_MEM_SIZE - 16]       ; End of the stack - 16 (8-BYTE to store the function address and 8-BYTE to store the data (parameter) address)
        mov     qword [rsi], thread_func                ; Set thread function
        mov     qword [rsi+8], 0                        ; No data (parameter = NULL)
        xor     edx, edx                                ; * parent_tid = NULL (0)
        xor     r10d, r10d                              ; * child_tid  = NULL (0)
        xor     r8d, r8d                                ; tid = 0
        syscall

        test    rax, rax                ; pid == 0 ? | pid < 0 ?
        jg      short .parent_continue  ; parent !
        jl      .error_clone            ; ERROR !

        ; *** CHILD PROCESS ***
        ret                             ; by using the 'ret' instruction, we called the requested function (thread)
                                        ; because we moved the function address into the stack of child process and
                                        ; by using the 'ret' instruction, we jump to the thread function (thread_func)

.parent_continue:

        ; Wait for the created thread to exit using 'wait4' syscall (sys_wait4 (61))
        mov     rdi, rax        ; TID (Thread id)
        mov     eax, 61         ; sys_wait4
        xor     esi, esi
        xor     edx, edx
        xor     r10d, r10d
        syscall

        ; Free the memory (R13) using 'munmap' syscall (sys_munmap (11))
        mov     eax, 11                         ; sys_munmap
        mov     rdi, r13                        ; Memory address (R13)
        mov     esi, THREAD_MEM_SIZE            ; Memory size
        syscall

        ; Write 'done' message
        mov     eax, 1                  ; sys_write
        xor     edi, edi                ; STDOUT (0)
        mov     rsi, .message           ; Message address
        mov     edx, .message_len       ; Message length
        syscall

        ; exit (return 0)
        mov     eax, 60                 ; sys_exit
        xor     edi, edi                ; return 0
        syscall

.error_mmap:
        ; Set error message to write it to STDOUT
        mov     rsi, .mmap_failed_msg           ; Error message
        mov     edx, .mmap_failed_msg_len       ; Error message length
        jmp     short .error

.error_clone:
        ; Free the memory (R13) using 'munmap' syscall (sys_munmap (11))
        mov     eax, 11                         ; sys_munmap
        mov     rdi, r13                        ; Memory address (R13)
        mov     esi, THREAD_MEM_SIZE            ; Memory size
        syscall

        ; Set error message to write it to STDOUT
        mov     rsi, .clone_failed_msg          ; Error message
        mov     edx, .clone_failed_msg_len      ; Error message length

.error:
        ; Write error message to STDOUT
        mov     eax, 1          ; sys_write
        xor     edi, edi        ; STDOUT (0)
        syscall

        ; exit (return 1 (error))
        mov     eax, 60         ; sys_exit
        mov     edi, 1          ; return 1
        syscall

.message db 'Child process is terminated', 0x0a, 0x00
.message_len = $ - .message

.mmap_failed_msg db 'Memory allocation failed', 0x0a, 0x00
.mmap_failed_msg_len = $ - .mmap_failed_msg

.clone_failed_msg db 'Unable to create a new child process', 0x0a, 0x00
.clone_failed_msg_len = $ - .clone_failed_msg

thread_func:

        ; Write message from child process
        mov     eax, 1                  ; sys_write
        xor     edi, edi                ; STDOUT (0)
        mov     rsi, .message           ; Message address
        mov     edx, .message_len       ; Message length
        syscall

        ; exit (return 0)
        mov     eax, 60                 ; sys_exit
        xor     edi, edi                ; return 0
        syscall

.message db 'Child process is called', 0x0a, 0x00
.message_len = $ - .message

here, everything looks good ! but this is the function result ->
Child process is terminated
Segmentation fault (core dumped)

but sometimes i get this too !!!!!!!!
Child process is called
Child process is terminated

also sometimes i get this tooo !!!!!!!!!!!!!!!!!
Child process is terminated
Child process is called

but 100% there is a problem because "Segmentation fault" !!!! what is the problem?

strace

execve("./thread_linux_x64", ["./thread_linux_x64"], 0x7fff7cc37508 /* 53 vars */) = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1f8b97b000
clone(child_stack=0x7f1f8b97b3f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 3131 attached
) = 3131
[pid  3131] write(0, "Child process is called\n\0", 25 <unfinished ...>
Child process is called
[pid  3130] wait4(3131,  <unfinished ...>
[pid  3131] <... write resumed>)        = 25
[pid  3130] <... wait4 resumed>NULL, 0, NULL) = -1 ECHILD (No child processes)
[pid  3131] exit(0 <unfinished ...>
[pid  3130] munmap(0x7f1f8b97b000, 1024 <unfinished ...>
[pid  3131] <... exit resumed>)         = ?
[pid  3130] <... munmap resumed>)       = 0
[pid  3131] +++ exited with 0 +++
write(0, "Child process is terminated\n\0", 29Child process is terminated
) = 29
exit(0)                                 = ?
+++ exited with 0 +++

EXAMPLE WITH C-PTHREAD

this is C source code with pthread:

#include <stdio.h>
#include <pthread.h>
#include <bits/signum.h>

void * thread_func(void * arg) {
    const char msg[] = "Child-> HELLO\n";
    asm volatile ("syscall"
    :: "a" (1), "D" (0), "S" (msg), "d" (sizeof(msg) - 1)
    : "rcx", "r11", "memory");
    return 0;
}

int
main() {
    pthread_t pthread;
    const char msg1[] = "Parent-> HELLO\n";
    const char msg2[] = "Parent-> BYE\n";

    asm volatile ("syscall"
    :: "a" (1), "D" (0), "S" (msg1), "d" (sizeof(msg1) - 1)
    : "rcx", "r11", "memory");

    pthread_create(& pthread, NULL, thread_func, NULL);
    pthread_join(pthread, NULL);

    asm volatile ("syscall"
    :: "a" (1), "D" (0), "S" (msg2), "d" (sizeof(msg2) - 1)
    : "rcx", "r11", "memory");

    return 0;
}

and the strace for this is:

mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8c296d3000
arch_prctl(ARCH_SET_FS, 0x7f8c296d3740) = 0
mprotect(0x7f8c29895000, 12288, PROT_READ) = 0
mprotect(0x7f8c298bb000, 4096, PROT_READ) = 0
mprotect(0x403000, 4096, PROT_READ)     = 0
mprotect(0x7f8c29906000, 4096, PROT_READ) = 0
munmap(0x7f8c298c3000, 98201)           = 0
set_tid_address(0x7f8c296d3a10)         = 10122
set_robust_list(0x7f8c296d3a20, 24)     = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f8c298a6c50, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f8c298b3b20}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f8c298a6cf0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f8c298b3b20}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
write(0, "Parent-> HELLO\n", 15Parent-> HELLO
)        = 15
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8c28ed2000
mprotect(0x7f8c28ed3000, 8388608, PROT_READ|PROT_WRITE) = 0
brk(NULL)                               = 0x13d2000
brk(0x13f3000)                          = 0x13f3000
brk(NULL)                               = 0x13f3000
clone(child_stack=0x7f8c296d1fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[10123], tls=0x7f8c296d2700, child_tidptr=0x7f8c296d29d0) = 10123
futex(0x7f8c296d29d0, FUTEX_WAIT, 10123, NULLstrace: Process 10123 attached
 <unfinished ...>
[pid 10123] set_robust_list(0x7f8c296d29e0, 24) = 0
[pid 10123] write(0, "Child-> HELLO\n", 14Child-> HELLO
) = 14
[pid 10123] madvise(0x7f8c28ed2000, 8368128, MADV_DONTNEED) = 0
[pid 10123] exit(0)                     = ?
[pid 10122] <... futex resumed>)        = 0
[pid 10123] +++ exited with 0 +++
write(0, "Parent-> BYE\n", 13Parent-> BYE
)          = 13
exit_group(0)                           = ?
+++ exited with 0 +++

if we use clone and wait functions in C, we going to have 'wait4' syscall ... and even in my 'wait' syscall, the child id is correct !!!!!!!!!! so it shouldn't be any problem !

C Clone EXAMPLE

#define _GNU_SOURCE
#include <sched.h>

#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <wait.h>

#define MEM_SIZE        1024

#define CLONE_FLAGS     (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_PARENT | CLONE_THREAD | CLONE_IO)

int
thread_func(void * data) {
    static const char msg[] = "Hello from Child process\n";

    write(0, msg, sizeof(msg)-1);
    exit(0);
}

int
main() {
    static const char msg[] = "Child process is terminated\n";
    void * memory;

    if((memory = mmap(NULL, MEM_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0)) == MAP_FAILED) {
        printf("memory allocation failed\n");
        return 1;
    }

    int pid = clone(thread_func, (memory + MEM_SIZE), CLONE_FLAGS, NULL);
    if(pid < 0) {
        munmap(memory, MEM_SIZE);
        printf("clone() failed\n");
        return 1;
    }

    waitpid(pid, NULL, 0);

    write(0, msg, sizeof(msg)-1);

    munmap(memory, MEM_SIZE);
    exit(0);
}

Something wierd !!!! same error (segment ...) !!!! even in C example, i get Same error !!!!

this is strace :

mprotect(0x7fd8b4492000, 12288, PROT_READ) = 0
mprotect(0x403000, 4096, PROT_READ)     = 0
mprotect(0x7fd8b44e1000, 4096, PROT_READ) = 0
munmap(0x7fd8b449e000, 98201)           = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fd8b44e0000
clone(child_stack=0x7fd8b44e03f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 19911 attached
) = 19911
[pid 19911] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fd8b44df9c0} ---
[pid 19910] wait4(19911,  <unfinished ...>) = ?
[pid 19911] +++ killed by SIGSEGV (core dumped) +++
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
linux
assembly
x86-64
system-calls
fasm
asked on Stack Overflow Mar 12, 2020 by ELHASKSERVERS • edited Mar 13, 2020 by ELHASKSERVERS

1 Answer

3

We already answered this in comments last time you asked. The raw clone system call doesn't read a function pointer from memory for you.

You have to do that yourself with code that runs in the child thread / process. Instead, you're having both threads continue on to run wait4, munmap, and exit.

The clone(2) man page explains this. The main part of the page documents the glibc wrapper that takes a function-pointer to call in the child thread. But it clearly says that's not the raw system call, and to see the NOTES section. There you'll find the raw asm system call's prototype and documentation:

long raw_clone(unsigned long flags, void *stack,
                        int *parent_tid, int *child_tid,
                        unsigned long tls);

The raw clone() system call corresponds more closely to fork(2) in that execution in the child continues from the point of the call. As such, the fn and arg arguments of the clone() wrapper function are omitted.

You can use the new stack as a convenient place to stash a function pointer where your user-space code for the new thread can find it. (The new thread won't have easy access to the main thread's stack because RSP will be pointing at its new stack; I'm not sure if registers other than RAX are zeroed before entering the new thread or not. If not you can easily just keep the pointer in a register other than RAX, RCX, or R11. And of course static storage is available, but you shouldn't need to use that.)

You'll want to branch on the return value being 0 which tells you you're in the child process. (Like fork, clone returns twice when it succeeds: once in the parent with the TID, once in the child with 0. I think that's true; the man page doesn't clearly document this part, but that's how fork works)


As discussed in comments, link2 is storing the function address on the child thread stack. When the parent returns from the wrapper function it returns normally. When the child returns, it will pop that address from what is now its stack.

You chose to implement this with a ret that only runs in the child; that's fine. You could have just used jmp with the pointer in a register or memory.


re: updated question:

Your wait4 system call is returning -1 ECHILD without actually waiting.

Therefore your ret races with the munmap that would unmap the thread stack, leading to a segfault if munmap happens first. This also explains your output happening in different orders when it doesn't crash.

I don't know exactly what the right solution is, but it's obviously not this. Have a look at what pthread_join uses to wait for a child thread to exit. Perhaps the clone return value isn't actually the right thing to use with wait4, or wait4 isn't the right system call.

(The int *child_tid output pointer presumably exists for a reason, although maybe just so both parent and child can get it without a gettid system call or VDSO call.)

Or maybe it's because you didn't pass __WCLONE or __WALL to get wait4 to wait for clone children.

Read the man pages for system calls you use, especially when strace shows they didn't do what you expected. This is step 2 in debugging / problem solving technique, after identifying that a system call returned an error in the first place (with strace).

answered on Stack Overflow Mar 12, 2020 by Peter Cordes • edited Mar 13, 2020 by Peter Cordes

User contributions licensed under CC BY-SA 3.0