Diagnosing SIGBUS in third party jars used with JNI from c++

0

I'm using CLion with a c++ project (cmake), which starts a jvm. The java part is built with gradle. The project works, but I'm having a problem with debugging.

When I start the JVM, I immediately get a SIGSEGV. I understand that it's normal and there's no workaround except ignoring SIGSEGV. A bit annoying but not too bad as it only happens once per session.

BUT, after that, I continue debugging, and I get constant SIGBUS signals.

<unknown> 0x000000011f108385
<unknown> 0x000000011761dca7
<unknown> 0x000000011761dca7
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x0000000117614849
JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*) 0x000000010bf3a582
StackWalk::fetchFirstBatch(BaseFrameStream&, Handle, long, int, int, int, objArrayHandle, Thread*) 0x000000010c227cac
StackWalk::walk(Handle, long, int, int, int, objArrayHandle, Thread*) 0x000000010c2278fc
JVM_CallStackWalk 0x000000010bfb14a2
<unknown> 0x0000000117623950
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x0000000117614849
JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*) 0x000000010bf3a582
InstanceKlass::call_class_initializer(Thread*) 0x000000010bf22af7
InstanceKlass::initialize_impl(Thread*) 0x000000010bf2244f
Reflection::invoke_constructor(oopDesc*, objArrayHandle, Thread*) 0x000000010c1ebdbb
JVM_NewInstanceFromConstructor 0x000000010bfc14f6
<unknown> 0x0000000117623950
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761dae2
<unknown> 0x000000011761dcec
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761dae2
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x000000011761da00
<unknown> 0x0000000117614849
JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*) 0x000000010bf3a582
jni_invoke_static(JNIEnv_*, JavaValue*, _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*) 0x000000010bf7e2af
jni_CallStaticVoidMethodV 0x000000010bf81c69
JNIEnv_::CallStaticVoidMethod(_jclass*, _jmethodID*, ...) jni.h:1521
main main.cpp:80
start 0x00007fff6f6563d5
start 0x00007fff6f6563d5

It doesn't stop in my code. I don't understand why this is happening, or if it's possible to avoid them, aside from ignoring all SIGBUSs.

I minimized my code and created the most simple example which reproduces the issue. Basically I created a cpp project which starts a jni with org/junit/platform/console/ConsoleLauncher as main (junit5), which makes one simple test. And the SIGBUS happens. It happens before my test even run.

I suspect something within JUnit, but not sure. Any way to get to the root cause?

Sample project for reproduction is here: https://github.com/tallavi/sigbus-reproduction

If I run it, you can see that the code stops running after the call to the java part, no "after call", no "CppMainEnd":

CppMainStart
current_path: /Users/tal/Development/v2x/qa-automation/sigbus-reproduction/out
Loading JAR: jars/junit-platform-console-standalone-1.5.2.jar
Loading JAR: jars/.DS_Store
Loading JAR: jars/junit-platform-console-standalone-1.6.0-M1.jar
Loading JAR: jars/sigbus-reproduction.jar
CreateVM:       JVM loaded successfully!
Before call
test START
test END

Thanks for using JUnit! Support its development at https://junit.org/sponsoring

.
+-- JUnit Jupiter [OK]
| '-- FirstTest [OK]
|   '-- myTest() [OK]
'-- JUnit Vintage [OK]

Test run finished after 154 ms
[         3 containers found      ]
[         0 containers skipped    ]
[         3 containers started    ]
[         0 containers aborted    ]
[         3 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]


Process finished with exit code 0

If I just change the main from JUnit5 to my main and run the same code, everything works:

CppMainStart
current_path: /Users/tal/Development/v2x/qa-automation/sigbus-reproduction/out
Loading JAR: jars/junit-platform-console-standalone-1.5.2.jar
Loading JAR: jars/.DS_Store
Loading JAR: jars/junit-platform-console-standalone-1.6.0-M1.jar
Loading JAR: jars/sigbus-reproduction.jar
CreateVM:       JVM loaded successfully!
Before call
main START
main END
After call
CppMainEnd

Process finished with exit code 0

I managed handling signals by @Oo.oO's advice, but it doesn't fix the issue of course. The java code finishes, but if I try to access that JVM, for example, destroying it, it hangs! : The stack trace of the hang

But if I let it run (not trying to debug it), it crashes with a different error:

main(31549,0x1177515c0) malloc: *** error for object 0x7ffee6360628: pointer being freed was not allocated
main(31549,0x1177515c0) malloc: *** set a breakpoint in malloc_error_break to debug

With this trace:

When destroying jvm after handling the signal

Note that the SIGBUS doesn't always happen, but the code after that JVM call stops running 100% of the time.

Hope this makes sense to anyone..

UPDATE: this is how it looks in lldb:

MyComputer:out tal$ lldb main
(lldb) target create "main"
Current executable set to 'main' (x86_64).
(lldb) r
Process 57274 launched: '/Users/tal/Development/v2x/qa-automation/sigbus-reproduction/out/main' (x86_64)
CppMainStart
Process 57274 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSEGV
    frame #0: 0x000000010b33f51b
->  0x10b33f51b: movl   (%rsi), %eax
    0x10b33f51d: leaq   0x30(%rbp), %rsi
    0x10b33f521: movl   $0x10000, %eax            ; imm = 0x10000
    0x10b33f526: andl   0x4(%rsi), %eax
Target 0: (main) stopped.
(lldb) c
Process 57274 resuming
CreateVM:       JVM loaded successfully!
Before call
Process 57274 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGBUS
    frame #0: 0x0000000112e263ff
->  0x112e263ff: testl  %eax, (%r10)
    0x112e26402: retq
    0x112e26403: nop
    0x112e26404: nop
Target 0: (main) stopped.
(lldb) c
Process 57274 resuming
test START
test END

Thanks for using JUnit! Support its development at https://junit.org/sponsoring

╷
├─ JUnit Jupiter ✔
│  └─ FirstTest ✔
│     └─ myTest() ✔
└─ JUnit Vintage ✔

Test run finished after 2740 ms
[         3 containers found      ]
[         0 containers skipped    ]
[         3 containers started    ]
[         0 containers aborted    ]
[         3 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

After call
before destroying
after destroying
CppMainEnd
Process 57274 exited with status = 0 (0x00000000)
java
c++
java-native-interface
clion
sigbus
asked on Stack Overflow Dec 16, 2019 by TalL • edited Dec 22, 2019 by TalL

2 Answers

0

It might be hard to find without knowing exactly what env you have. There are multiple factors here:

  • boost version
  • Java version
  • compiler version
  • etc.

If I take your sample, strip it to bare minimum (like this)

# Linux

> g++ -o obj/main \
  -I${JAVA_HOME}/include -I${JAVA_HOME}/include/linux/ \
  -L${JAVA_HOME}/jre/lib/amd64/server -ljvm \
  -L${BOOST_LIB} -lboost_system -lboost_filesystem \
  -I$BOOST_INC src/main/cpp/main.cpp

> javac -cp jars/junit-platform-console-standalone.jar \
  -d target src/main/java/FirstTest.java

> jar cf jars/sigbus-reproduction.jar -C target .

> ./obj/main

or, slightly modified on macOS

# macOS

> g++ -std=c++11 -o obj/main \
  -I${JAVA_HOME}/include -I${JAVA_HOME}/include/darwin/ \
  -L${JAVA_HOME}/lib/server -rpath ${JAVA_HOME}/lib/server -ljvm \
  -L${BOOST_LIB} -rpath ${BOOST_LIB} -lboost_system -lboost_filesystem \
  -I$BOOST_INC src/main/cpp/main.cpp

it simply works as expected. Also, there are neither SIGSEGV nor SIGBUS inside gdb, lldb

> ./obj/main
CppMainStart
current_path: /Users/michalo/tmp/sigbus-reproduction
Loading JAR: jars/junit-platform-console-standalone.jar
Loading JAR: jars/sigbus-reproduction.jar
CreateVM:       JVM loaded successfully!
test START
test END

Thanks for using JUnit! Support its development at https://junit.org/sponsoring

╷
├─ JUnit Jupiter ✔
│  └─ FirstTest ✔
│     └─ myTest() ✔
└─ JUnit Vintage ✔

Test run finished after 5061 ms
[         3 containers found      ]
[         0 containers skipped    ]
[         3 containers started    ]
[         0 containers aborted    ]
[         3 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

I guess, it may take time and effort to find somebody who can reproduce your issue.

Calling JUnit as method

#include <iostream>
...
...
...

int main(int argc, char **argv) {

  // make sure to store oryginal stdout
  // JVM (JUnit) will mess with it
  int old_stdout = dup(1);

  std::cout << "CppMainStart" << std::endl;

...
...
...

  env->SetObjectArrayElement(argsArray, 0, env->NewStringUTF("--class-path"));
  env->SetObjectArrayElement(argsArray, 1, env->NewStringUTF(V2X_FILE_NAME.c_str()));
  env->SetObjectArrayElement(argsArray, 2, env->NewStringUTF((std::string("--scan-classpath")).c_str()));

// instead of calling main, you can call execute

  jclass system_class     = env->FindClass( "java/lang/System");
  jfieldID field_id_out   = env->GetStaticFieldID(system_class, "out", "Ljava/io/PrintStream;");
  jobject field_id_out_v  = env->GetStaticObjectField(system_class, field_id_out);

  jfieldID field_id_err   = env->GetStaticFieldID(system_class, "err", "Ljava/io/PrintStream;");
  jobject field_id_err_v  = env->GetStaticObjectField(system_class, field_id_err);

  jmethodID execMethod = env->GetStaticMethodID(mainClass,
    "execute",
    "(Ljava/io/PrintStream;Ljava/io/PrintStream;[Ljava/lang/String;)Lorg/junit/platform/console/ConsoleLauncherExecutionResult;");

  jobject result = env->CallStaticObjectMethod(mainClass, execMethod, field_id_out_v, field_id_err_v, argsArray);

  jvm->DestroyJavaVM();

  // restore oryginal stdout
  FILE *fp2 = fdopen(old_stdout, "w");
  *stdout = *fp2;

  std::cout  << "CppMainEnd" << std::endl << std::flush;

  return 0;
}

and here you go. There is CppMainEnd at the end.

> ./obj/main
CppMainStart
current_path: /Users/michalo/tmp/sigbus-reproduction
Loading JAR: jars/junit-platform-console-standalone.jar
Loading JAR: jars/sigbus-reproduction.jar
CreateVM:       JVM loaded successfully!
test START
test END

Thanks for using JUnit! Support its development at https://junit.org/sponsoring

╷
├─ JUnit Jupiter ✔
│  └─ FirstTest ✔
│     └─ myTest() ✔
└─ JUnit Vintage ✔

Test run finished after 5060 ms
[         3 containers found      ]
[         0 containers skipped    ]
[         3 containers started    ]
[         0 containers aborted    ]
[         3 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

CppMainEnd

I'd suggest to minimise the content of your code. Make is as essential as possible. Otherwise, it will be hard for you to find the source of the issue.

If I run this kind of code (which is really close to the essence of JNI calls).

#include <iostream>
#include <jni.h>
#include <unistd.h>

int main(int argc, char **argv) {

  int old_stdout = dup(1);

  std::cout << "Cpp Start" << std::endl;

  JavaVM *jvm;
  JNIEnv *env;
  JavaVMInitArgs vm_args;
  JavaVMOption* options = new JavaVMOption[1];

  options[0].optionString = const_cast<char *>("-Djava.class.path=jars/junit-platform-console-standalone.jar:jars/sigbus-reproduction.jar");
  vm_args.version = JNI_VERSION_1_6;
  vm_args.nOptions = 1;
  vm_args.options = options;
  vm_args.ignoreUnrecognized = false;

  long status = JNI_CreateJavaVM(&jvm, (void**)&env, &vm_args);

  jclass mainClass = env->FindClass("org/junit/platform/console/ConsoleLauncher");

  jclass stringClass = env->FindClass("java/lang/String");

  jobject emptyStringObject = env->NewStringUTF("");

  jobjectArray argsArray = env->NewObjectArray(3, stringClass, emptyStringObject);

  env->SetObjectArrayElement(argsArray, 0, env->NewStringUTF("--class-path"));
  env->SetObjectArrayElement(argsArray, 1, env->NewStringUTF("jars/sigbus-reproduction.jar"));
  env->SetObjectArrayElement(argsArray, 2, env->NewStringUTF("--scan-classpath"));

  jclass system_class     = env->FindClass( "java/lang/System");
  jfieldID field_id_out   = env->GetStaticFieldID(system_class, "out", "Ljava/io/PrintStream;");
  jobject field_id_out_v  = env->GetStaticObjectField(system_class, field_id_out);

  jfieldID field_id_err   = env->GetStaticFieldID(system_class, "err", "Ljava/io/PrintStream;");
  jobject field_id_err_v  = env->GetStaticObjectField(system_class, field_id_err);

  jmethodID execMethod = env->GetStaticMethodID(mainClass,
    "execute",
    "(Ljava/io/PrintStream;Ljava/io/PrintStream;[Ljava/lang/String;)Lorg/junit/platform/console/ConsoleLauncherExecutionResult;");

  jobject result = env->CallStaticObjectMethod(mainClass, execMethod, field_id_out_v, field_id_err_v, argsArray);

  jvm->DestroyJavaVM();

  // restore oryginal stdout
  FILE *fp2 = fdopen(old_stdout, "w");
  *stdout = *fp2;

  std::cout  << "CppMainEnd" << std::endl << std::flush;

  delete[] options;

  return 0;
}

there is nothing strange in the lldb

lldb obj/main
(lldb) target create "obj/main"
Current executable set to 'obj/main' (x86_64).
(lldb) run
Process 921 launched: '.../main' (x86_64)
Cpp Start
Process 921 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSEGV
    frame #0: 0x000000010b33f51b
->  0x10b33f51b: movl   (%rsi), %eax
    0x10b33f51d: leaq   0x30(%rbp), %rsi
    0x10b33f521: movl   $0x10000, %eax            ; imm = 0x10000
    0x10b33f526: andl   0x4(%rsi), %eax
Target 0: (main) stopped.
(lldb) cont
Process 921 resuming
test START
test END

Thanks for using JUnit! Support its development at https://junit.org/sponsoring

╷
├─ JUnit Jupiter ✔
│  └─ FirstTest ✔
│     └─ myTest() ✔
└─ JUnit Vintage ✔

Test run finished after 5060 ms
[         3 containers found      ]
[         0 containers skipped    ]
[         3 containers started    ]
[         0 containers aborted    ]
[         3 containers successful ]
[         0 containers failed     ]
[         1 tests found           ]
[         0 tests skipped         ]
[         1 tests started         ]
[         0 tests aborted         ]
[         1 tests successful      ]
[         0 tests failed          ]

CppMainEnd
Process 921 exited with status = 0 (0x00000000)

Running multiple times

No matter how many times I run the code, there is no SIGBUS :(

You can easily run the code (thousands of times) like this:

--- 8< --- CUT HERE --- lldb_run --- 8< --- CUT HERE ---

target create main
break set -n main -C "process handle --pass true --stop false SIGSEGV" -C "continue"
run
script import os; os._exit(0)

--- 8< --- CUT HERE --- lldb_run --- 8< --- CUT HERE ---

and then, running it in the loop: for i in {1..100}; do lldb --source ./lldb_run; done

answered on Stack Overflow Dec 18, 2019 by Oo.oO • edited Dec 23, 2019 by Oo.oO
0

You are incorrectly assuming that signals such as SIGSEGV or SIGBUS indicate a problem in Java. You are also likely breaking things such as null pointer detection.

Why am I seeing SIGSEGV when I strace a Java application on Linux?!

Main Article

Most people that have used Unix for any amount of time are familiar with occasionally seeing "Segmentation Fault (core dumped)" from poorly written programs. If that's all you knew about Unix and you looked at the output of strace on a Java process you'd think something was seriously wrong ("Wow, look at all these segfaults. Those guys at Sun/Oracle must be terrible programmers and they don't know what the hell they're doing!").

The real story is quite different - SIGSEGV in a Java process is almost always perfectly normal and completely safe.

...

The JVM is a multi-threaded process and so under the covers it's using signals to do OS level threading. ...

...

Signal Description

  • SIGSEGV, SIGBUS, SIGFPE, SIGPIPE, SIGILL Used in the implementation for implicit null check, and so forth.
  • SIGQUIT Thread dump support: To dump Java stack traces at the standard error stream. (Optional.)

...

Table stolen wholesale from http://download.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/signals.html

Per that link:

6.1 Signal Handling on Solaris OS and Linux

The HotSpot Virtual Machine installs signal handlers to implement various features and to handle fatal error conditions. For example, in an optimization to avoid explicit null checks in cases where java.lang.NullPointerException will be thrown rarely, the SIGSEGV signal is caught and handled, and the NullPointerException is thrown.

In general there are two categories of situations where signal/traps arise.

  • Situations in which signals are expected and handled. Examples include the implicit null handling cited above. Another example is the safepoint polling mechanism, which protects a page in memory when a safepoint is required. Any thread that accesses that page causes a SIGSEGV, which results in the execution of a stub that brings the thread to a safepoint.

  • Unexpected signals. This includes a SIGSEGV when executing in VM code, JNI code, or native code. In these cases the signal is unexpected, so fatal error handling is invoked to create the error log and terminate the process.

If you need to handle fatal signals, see Signal Handling on Linux when using Java/JNI.

answered on Stack Overflow Dec 21, 2019 by Andrew Henle

User contributions licensed under CC BY-SA 3.0