Im trying to create a thread using 'clone' syscall ... i searched toooooooo much !
for example,
link1
link2
and now this is my source code in assembly for linux x64:
FORMAT ELF64 EXECUTABLE
ENTRY thread_linux_x64
THREAD_MEM_SIZE = 1024
define PROT_READ 0x1
define PROT_WRITE 0x2
define PROT_EXEC 0x4
define MAP_PRIVATE 0x02
define MAP_ANONYMOUS 0x20
define CLONE_VM 0x00000100
define CLONE_FS 0x00000200
define CLONE_FILES 0x00000400
define CLONE_SIGHAND 0x00000800
define CLONE_PARENT 0x00008000
define CLONE_THREAD 0x00010000
define CLONE_IO 0x80000000
define SIGCHLD 20
CLONE_FLAGS = CLONE_VM OR CLONE_FS OR CLONE_FILES OR CLONE_SIGHAND OR CLONE_PARENT OR CLONE_THREAD OR CLONE_IO
MMAP_FLAG = MAP_PRIVATE OR MAP_ANONYMOUS
MMAP_PERMISSION = PROT_READ OR PROT_WRITE OR PROT_EXEC
SEGMENT READABLE EXECUTABLE
thread_linux_x64:
; Memory allocation using 'mmap' syscall
mov eax, 9 ; sys_mmap
xor edi, edi ; addr = null (0)
mov esi, THREAD_MEM_SIZE ; Memory size
mov edx, MMAP_PERMISSION ; Permission
mov r10d, MMAP_FLAG ; Flag
mov r8d, -1 ; Fd = -1 (invalid fd)
xor r9d, r9d ; Offset = 0
syscall
cmp rax, 0 ; error ?
jl .error_mmap
mov r13, rax ; r13 = memory address
; create a new child process (thread) using 'clone' syscall
mov eax, 56 ; sys_clone
mov edi, CLONE_FLAGS ; flags
lea rsi, [r13 + THREAD_MEM_SIZE - 8] ; stack address - 8 (8-BYTE to store the function address)
mov QWORD [rsi], thread_func ; set function address
xor edx, edx ; parent_tid = NULL (0)
xor r10d, r10d ; child_tid = NULL (0)
xor r8d, r8d ; tid = 0
syscall
cmp rax, 0 ; error ?
jle .error_clone
; wait for the created thread to exit using 'wait4' syscall
mov rdi, rax ; created-thread pid
mov eax, 61 ; sys_wait4
xor esi, esi ; stat_addr = null (0)
xor edx, edx ; options = 0
xor r10d, r10d ; rusage = 0
syscall
; free the allocated memory (r13) using 'munmap' syscall
mov eax, 11 ; sys_munmap
mov rdi, r13 ; memory address
mov esi, THREAD_MEM_SIZE ; memory size
syscall
; exit (return 0 (success))
mov eax, 60 ; sys_exit
xor edi, edi ; return 0
syscall
.error_mmap:
; set error message to print
mov rsi, .mmap_failed_msg ; error message
mov edx, .mmap_failed_msg_len ; error message length
jmp short .error
.error_clone:
; free the allocated memory (r13) using 'munmap' syscall
mov eax, 11 ; sys_munmap
mov rdi, r13 ; memory address
mov esi, THREAD_MEM_SIZE ; memory size
syscall
.error:
; print error message
mov eax, 1 ; sys_write
xor edi, edi ; stdout (0)
syscall
; exit (return 1 (error))
mov eax, 60 ; sys_exit
mov edi, 1 ; return 1
syscall
.mmap_failed_msg db 'Memory allocation failed', 0x0a, 0x00
.mmap_failed_msg_len = $ - .mmap_failed_msg
.clone_failed_msg db 'Unable to create a new child process', 0x0a, 0x00
.clone_failed_msg_len = $ - .clone_failed_msg
thread_func:
; print message
mov eax, 1 ; sys_write
xor edi, edi ; stdout (0)
mov rsi, .message ; message address
mov edx, .message_len ; message length
syscall
; exit (return 0 (success))
mov eax, 60 ; sys_exit
xor edi, edi ; return 0
syscall
.message db 'Child process is called', 0x0a, 0x00
.message_len = $ - .message
everything is looks normal !!!! but when i run this program, i get nothing !!!! NO 'Child process is called' message print ! in fact, i think my thread function is not running ...
i also got strace test and this is the result !!!
trace -f ./thread_linux_x64
execve("./thread_linux_x64", ["./thread_linux_x64"], 0x7fffd4db1b58 /* 53 vars */) = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f32ba3e4000
clone(child_stack=0x7f32ba3e43f8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 32064 attached
) = 32064
[pid 32064] munmap(0x7f32ba3e4000, 1024 <unfinished ...>
[pid 32063] wait4(32064, NULL, 0, NULL) = -1 ECHILD (No child processes)
[pid 32064] <... munmap resumed>) = 0
[pid 32063] munmap(0x7f32ba3e4000, 1024 <unfinished ...>
[pid 32064] write(0, "", 0 <unfinished ...>
[pid 32063] <... munmap resumed>) = 0
[pid 32063] exit(0 <unfinished ...>
[pid 32064] <... write resumed>) = 0
[pid 32063] <... exit resumed>) = ?
[pid 32064] exit(1) = ?
[pid 32064] +++ exited with 1 +++
+++ exited with 0 +++
This problem is driving me crazy! because there is no error ... and everything looks just fine !!!!
Update:
here i change my source code to create the thread without calling thread_create or ... function (in the main function) and now my problem fixed ... in fact, 'thread_func' now called but i have a new problem ! i get Segment failure !!!! i think it's about my CLONE_FLAGS !!!!
FORMAT ELF64 EXECUTABLE
ENTRY thread_linux_x64
THREAD_MEM_SIZE = 1024
define PROT_READ 0x1
define PROT_WRITE 0x2
define PROT_EXEC 0x4
define MAP_PRIVATE 0x02
define MAP_ANONYMOUS 0x20
define CLONE_VM 0x00000100
define CLONE_FS 0x00000200
define CLONE_FILES 0x00000400
define CLONE_SIGHAND 0x00000800
define CLONE_PARENT 0x00008000
define CLONE_THREAD 0x00010000
define CLONE_IO 0x80000000
CLONE_FLAGS = CLONE_VM OR CLONE_FS OR CLONE_FILES OR CLONE_SIGHAND OR CLONE_PARENT OR CLONE_THREAD OR CLONE_IO
MMAP_FLAG = MAP_PRIVATE OR MAP_ANONYMOUS
MMAP_PERMISSION = PROT_READ OR PROT_WRITE OR PROT_EXEC
SEGMENT READABLE EXECUTABLE
thread_linux_x64:
; Memory allocation using 'mmap' syscall (sys_mmap (9))
mov eax, 9 ; sys_mmap
xor edi, edi ; addr = 0 (NULL)
mov esi, THREAD_MEM_SIZE ; Memory allocation size
mov edx, MMAP_PERMISSION ; Permission (PROT_READ, ...)
mov r10d, MMAP_FLAG ; Flag (MAP_PRIVATE, ...)
mov r8d, -1 ; File descriptor (Fd) = -1 (invalid File descriptor)
xor r9d, r9d ; Offset = 0
syscall
test rax, rax ; ERROR ?
jl .error_mmap
mov r13, rax ; R13 = Memory address (RAX)
; Create a new child process (thread) using 'clone' syscall (sys_clone (56))
mov eax, 56 ; sys_clone
mov edi, CLONE_FLAGS ; Flag (CLONE_VM, ...)
lea rsi, [r13 + THREAD_MEM_SIZE - 16] ; End of the stack - 16 (8-BYTE to store the function address and 8-BYTE to store the data (parameter) address)
mov qword [rsi], thread_func ; Set thread function
mov qword [rsi+8], 0 ; No data (parameter = NULL)
xor edx, edx ; * parent_tid = NULL (0)
xor r10d, r10d ; * child_tid = NULL (0)
xor r8d, r8d ; tid = 0
syscall
test rax, rax ; pid == 0 ? | pid < 0 ?
jg short .parent_continue ; parent !
jl .error_clone ; ERROR !
; *** CHILD PROCESS ***
ret ; by using the 'ret' instruction, we called the requested function (thread)
; because we moved the function address into the stack of child process and
; by using the 'ret' instruction, we jump to the thread function (thread_func)
.parent_continue:
; Wait for the created thread to exit using 'wait4' syscall (sys_wait4 (61))
mov rdi, rax ; TID (Thread id)
mov eax, 61 ; sys_wait4
xor esi, esi
xor edx, edx
xor r10d, r10d
syscall
; Free the memory (R13) using 'munmap' syscall (sys_munmap (11))
mov eax, 11 ; sys_munmap
mov rdi, r13 ; Memory address (R13)
mov esi, THREAD_MEM_SIZE ; Memory size
syscall
; Write 'done' message
mov eax, 1 ; sys_write
xor edi, edi ; STDOUT (0)
mov rsi, .message ; Message address
mov edx, .message_len ; Message length
syscall
; exit (return 0)
mov eax, 60 ; sys_exit
xor edi, edi ; return 0
syscall
.error_mmap:
; Set error message to write it to STDOUT
mov rsi, .mmap_failed_msg ; Error message
mov edx, .mmap_failed_msg_len ; Error message length
jmp short .error
.error_clone:
; Free the memory (R13) using 'munmap' syscall (sys_munmap (11))
mov eax, 11 ; sys_munmap
mov rdi, r13 ; Memory address (R13)
mov esi, THREAD_MEM_SIZE ; Memory size
syscall
; Set error message to write it to STDOUT
mov rsi, .clone_failed_msg ; Error message
mov edx, .clone_failed_msg_len ; Error message length
.error:
; Write error message to STDOUT
mov eax, 1 ; sys_write
xor edi, edi ; STDOUT (0)
syscall
; exit (return 1 (error))
mov eax, 60 ; sys_exit
mov edi, 1 ; return 1
syscall
.message db 'Child process is terminated', 0x0a, 0x00
.message_len = $ - .message
.mmap_failed_msg db 'Memory allocation failed', 0x0a, 0x00
.mmap_failed_msg_len = $ - .mmap_failed_msg
.clone_failed_msg db 'Unable to create a new child process', 0x0a, 0x00
.clone_failed_msg_len = $ - .clone_failed_msg
thread_func:
; Write message from child process
mov eax, 1 ; sys_write
xor edi, edi ; STDOUT (0)
mov rsi, .message ; Message address
mov edx, .message_len ; Message length
syscall
; exit (return 0)
mov eax, 60 ; sys_exit
xor edi, edi ; return 0
syscall
.message db 'Child process is called', 0x0a, 0x00
.message_len = $ - .message
here, everything looks good ! but this is the function result ->
Child process is terminated
Segmentation fault (core dumped)
but sometimes i get this too !!!!!!!!
Child process is called
Child process is terminated
also sometimes i get this tooo !!!!!!!!!!!!!!!!!
Child process is terminated
Child process is called
but 100% there is a problem because "Segmentation fault" !!!! what is the problem?
strace
execve("./thread_linux_x64", ["./thread_linux_x64"], 0x7fff7cc37508 /* 53 vars */) = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1f8b97b000
clone(child_stack=0x7f1f8b97b3f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 3131 attached
) = 3131
[pid 3131] write(0, "Child process is called\n\0", 25 <unfinished ...>
Child process is called
[pid 3130] wait4(3131, <unfinished ...>
[pid 3131] <... write resumed>) = 25
[pid 3130] <... wait4 resumed>NULL, 0, NULL) = -1 ECHILD (No child processes)
[pid 3131] exit(0 <unfinished ...>
[pid 3130] munmap(0x7f1f8b97b000, 1024 <unfinished ...>
[pid 3131] <... exit resumed>) = ?
[pid 3130] <... munmap resumed>) = 0
[pid 3131] +++ exited with 0 +++
write(0, "Child process is terminated\n\0", 29Child process is terminated
) = 29
exit(0) = ?
+++ exited with 0 +++
EXAMPLE WITH C-PTHREAD
this is C source code with pthread:
#include <stdio.h>
#include <pthread.h>
#include <bits/signum.h>
void * thread_func(void * arg) {
const char msg[] = "Child-> HELLO\n";
asm volatile ("syscall"
:: "a" (1), "D" (0), "S" (msg), "d" (sizeof(msg) - 1)
: "rcx", "r11", "memory");
return 0;
}
int
main() {
pthread_t pthread;
const char msg1[] = "Parent-> HELLO\n";
const char msg2[] = "Parent-> BYE\n";
asm volatile ("syscall"
:: "a" (1), "D" (0), "S" (msg1), "d" (sizeof(msg1) - 1)
: "rcx", "r11", "memory");
pthread_create(& pthread, NULL, thread_func, NULL);
pthread_join(pthread, NULL);
asm volatile ("syscall"
:: "a" (1), "D" (0), "S" (msg2), "d" (sizeof(msg2) - 1)
: "rcx", "r11", "memory");
return 0;
}
and the strace for this is:
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f8c296d3000
arch_prctl(ARCH_SET_FS, 0x7f8c296d3740) = 0
mprotect(0x7f8c29895000, 12288, PROT_READ) = 0
mprotect(0x7f8c298bb000, 4096, PROT_READ) = 0
mprotect(0x403000, 4096, PROT_READ) = 0
mprotect(0x7f8c29906000, 4096, PROT_READ) = 0
munmap(0x7f8c298c3000, 98201) = 0
set_tid_address(0x7f8c296d3a10) = 10122
set_robust_list(0x7f8c296d3a20, 24) = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x7f8c298a6c50, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f8c298b3b20}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x7f8c298a6cf0, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x7f8c298b3b20}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
prlimit64(0, RLIMIT_STACK, NULL, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
write(0, "Parent-> HELLO\n", 15Parent-> HELLO
) = 15
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f8c28ed2000
mprotect(0x7f8c28ed3000, 8388608, PROT_READ|PROT_WRITE) = 0
brk(NULL) = 0x13d2000
brk(0x13f3000) = 0x13f3000
brk(NULL) = 0x13f3000
clone(child_stack=0x7f8c296d1fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[10123], tls=0x7f8c296d2700, child_tidptr=0x7f8c296d29d0) = 10123
futex(0x7f8c296d29d0, FUTEX_WAIT, 10123, NULLstrace: Process 10123 attached
<unfinished ...>
[pid 10123] set_robust_list(0x7f8c296d29e0, 24) = 0
[pid 10123] write(0, "Child-> HELLO\n", 14Child-> HELLO
) = 14
[pid 10123] madvise(0x7f8c28ed2000, 8368128, MADV_DONTNEED) = 0
[pid 10123] exit(0) = ?
[pid 10122] <... futex resumed>) = 0
[pid 10123] +++ exited with 0 +++
write(0, "Parent-> BYE\n", 13Parent-> BYE
) = 13
exit_group(0) = ?
+++ exited with 0 +++
if we use clone and wait functions in C, we going to have 'wait4' syscall ... and even in my 'wait' syscall, the child id is correct !!!!!!!!!! so it shouldn't be any problem !
C Clone EXAMPLE
#define _GNU_SOURCE
#include <sched.h>
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <wait.h>
#define MEM_SIZE 1024
#define CLONE_FLAGS (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SIGHAND | CLONE_PARENT | CLONE_THREAD | CLONE_IO)
int
thread_func(void * data) {
static const char msg[] = "Hello from Child process\n";
write(0, msg, sizeof(msg)-1);
exit(0);
}
int
main() {
static const char msg[] = "Child process is terminated\n";
void * memory;
if((memory = mmap(NULL, MEM_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0)) == MAP_FAILED) {
printf("memory allocation failed\n");
return 1;
}
int pid = clone(thread_func, (memory + MEM_SIZE), CLONE_FLAGS, NULL);
if(pid < 0) {
munmap(memory, MEM_SIZE);
printf("clone() failed\n");
return 1;
}
waitpid(pid, NULL, 0);
write(0, msg, sizeof(msg)-1);
munmap(memory, MEM_SIZE);
exit(0);
}
Something wierd !!!! same error (segment ...) !!!! even in C example, i get Same error !!!!
this is strace :
mprotect(0x7fd8b4492000, 12288, PROT_READ) = 0
mprotect(0x403000, 4096, PROT_READ) = 0
mprotect(0x7fd8b44e1000, 4096, PROT_READ) = 0
munmap(0x7fd8b449e000, 98201) = 0
mmap(NULL, 1024, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0) = 0x7fd8b44e0000
clone(child_stack=0x7fd8b44e03f0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_PARENT|CLONE_THREAD|CLONE_IOstrace: Process 19911 attached
) = 19911
[pid 19911] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7fd8b44df9c0} ---
[pid 19910] wait4(19911, <unfinished ...>) = ?
[pid 19911] +++ killed by SIGSEGV (core dumped) +++
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
We already answered this in comments last time you asked. The raw clone
system call doesn't read a function pointer from memory for you.
You have to do that yourself with code that runs in the child thread / process. Instead, you're having both threads continue on to run wait4
, munmap
, and exit
.
The clone(2)
man page explains this. The main part of the page documents the glibc wrapper that takes a function-pointer to call in the child thread. But it clearly says that's not the raw system call, and to see the NOTES section. There you'll find the raw asm system call's prototype and documentation:
long raw_clone(unsigned long flags, void *stack, int *parent_tid, int *child_tid, unsigned long tls);
The raw
clone()
system call corresponds more closely tofork(2)
in that execution in the child continues from the point of the call. As such, the fn and arg arguments of theclone()
wrapper function are omitted.
You can use the new stack as a convenient place to stash a function pointer where your user-space code for the new thread can find it. (The new thread won't have easy access to the main thread's stack because RSP will be pointing at its new stack; I'm not sure if registers other than RAX are zeroed before entering the new thread or not. If not you can easily just keep the pointer in a register other than RAX, RCX, or R11. And of course static storage is available, but you shouldn't need to use that.)
You'll want to branch on the return value being 0
which tells you you're in the child process. (Like fork, clone returns twice when it succeeds: once in the parent with the TID, once in the child with 0. I think that's true; the man page doesn't clearly document this part, but that's how fork works)
As discussed in comments, link2 is storing the function address on the child thread stack. When the parent returns from the wrapper function it returns normally. When the child returns, it will pop that address from what is now its stack.
You chose to implement this with a ret
that only runs in the child; that's fine. You could have just used jmp
with the pointer in a register or memory.
re: updated question:
Your wait4
system call is returning -1 ECHILD
without actually waiting.
Therefore your ret
races with the munmap
that would unmap the thread stack, leading to a segfault if munmap
happens first. This also explains your output happening in different orders when it doesn't crash.
I don't know exactly what the right solution is, but it's obviously not this. Have a look at what pthread_join
uses to wait for a child thread to exit. Perhaps the clone
return value isn't actually the right thing to use with wait4
, or wait4
isn't the right system call.
(The int *child_tid
output pointer presumably exists for a reason, although maybe just so both parent and child can get it without a gettid
system call or VDSO call.)
Or maybe it's because you didn't pass __WCLONE
or __WALL
to get wait4
to wait for clone
children.
Read the man pages for system calls you use, especially when strace shows they didn't do what you expected. This is step 2 in debugging / problem solving technique, after identifying that a system call returned an error in the first place (with strace
).
User contributions licensed under CC BY-SA 3.0