This is a Blue Screen of Death stop code. More information is available in the Knowledge Base article Bug Check 0x108: THIRD_PARTY_FILE_SYSTEM_FAILURE.
This code indicates success, rather than an error. This may not be the correct interpretation of this code, or possibly the program is handling errors incorrectly.
|Description||The default facility code.|
|Error Code||264 (0x0108)|
TL;DR: The observed behavior is very likely caused by a bug in the
ptxas component of the CUDA 7.5 toolchain, specifically the loop unroller. It is possible that the bug is already fixed in CUDA 8.0 RC, which is publicly available.
I was able to reproduce the behavior reported in the question on a 64-bit Windows 7 platform with a Quadro K2200 GPU, which is an
sm_50 device. The primary difference in the generated machine code (SASS) with
ENABLE_BUG defined is that the loop is unrolled by a factor of four. This is a direct consequence of the loop increment being changed from a variabe, namely
threadIdx.x, to a compile time constant,
32, which allows the compiler to compute trip count at compile time.
It is interesting to note that at the intermediate PTX level, the loop is rolled even with increment of
BB7_4: ld.global.u32 %r12, [%rd10]; add.s32 %r16, %r12, %r16; add.s64 %rd10, %rd10, 128; add.s32 %r15, %r15, 32; setp.lt.s32 %p3, %r15, 50000; @%p3 bra BB7_4;
As the loop is unrolled in machine code, it must be the
ptxas unroller applying that transformation.
If I lower the
ptxas optimization level to
-O1, by specifying
-Xptxas -O1 on the
nvcc command line, the code works as expected. If I build the code for
sm_30 (causing JIT compilation when running on an
sm_50 device) the code works as expected when run with the latest driver, Windows 369.26. This strongly suggests that there is a bug in the unroller of the
ptxas component of CUDA 7.5, which however has already been fixed, since the
ptxas component inside the CUDA driver is much more recent than the
ptxas component of the CUDA 7.5 toolchain.
#pragma unroll 4 directly in front of the loop also fixes the problem, since in this case the unrolling is performed by the
nvvm component of the compiler, meaning the unrolled loop is already present at the PTX level:
#if ENABLE_BUG #pragma unroll 4 for (int i = idx; i < MAX_INDEX; i += 32) thread_sum += data[i]; #else
BB7_5: .pragma "nounroll"; ld.global.u32 %r34, [%rd14]; add.s32 %r35, %r34, %r45; ld.global.u32 %r36, [%rd14+128]; add.s32 %r37, %r36, %r35; ld.global.u32 %r38, [%rd14+256]; add.s32 %r39, %r38, %r37; ld.global.u32 %r40, [%rd14+384]; add.s32 %r45, %r40, %r39; add.s64 %rd14, %rd14, 512; add.s32 %r44, %r44, 128; setp.lt.s32 %p5, %r44, %r3; @%p5 bra BB7_5;
See Technical Note TN2151:Understanding and Analyzing iPhone OS Application Crash Reports. Symbolication would normally help you track down the source of a crash but since there is no backtrace it may not help in this instance.
Don't bother testing on the simulator. The simulator build and the device build are wholly separate compiles for two different pieces of hardware. Just because it runs on simulator tells you nothing about a failure on device.
Remember that Apple will stress test the app by doing things like launching it on iOS4 with other apps eating up most of the memory. You will need to do that as well on your test device.
You will most likely have to wipe your test device back to defaults to replicate the test Apple does. Then open every possible app before launching your own.
You can make some information from the ARM thread state. The PC register is the only one containing the invalid address that the crash report is complaining about. That means your app tried to execute code at that address.
SIGSEGV means that the address in question is invalid. The system has setup no memory pages with this address.
I don't think the iOS will allow you to simply execute code from any address, but it is possible that the stack frame was corrupted and the return address was invalid when a function returned. That supports the "backtrace not available" problem.
Fouling the stack may be a result of a buffer overrun. If you use memcpy or a loop of sets on a local variable array and overrun the end of the array, you can destroy the stack.
Unlike the M-profile architectures, with their very different exception model which does permit tail-chaining exceptions, the classic/A-profile architectures do things in a completely straightforward manner.
Interrupts are checked for at instruction boundaries, when the respective CPSR.F/CPSR.I bits is clear. Thus, assuming the FIQ handler is straightforward, once the instruction at 0x108 completes, the FIQ is taken (as it has priority over the IRQ) from whatever mode the CPU was in, the FIQ handler runs with FIQs and IRQs masked, then performs an exception return to 0x110. The fact that there happened to be an IRQ pending throughout makes no difference whatsoever.
The point of note is the boundary between the return instruction at the end of the FIQ handler and the one being returned to. The FIQ return will restore the previous SPSR, which (presumably) has IRQs unmasked. Thus, after executing that return instruction but before executing the one at 0x110, the CPU is back in the initial mode, with IRQs unmasked, and an IRQ pending. So it takes it; the IRQ handler runs with IRQs masked, then performs an exception return to 0x110, whereupon execution eventually continues having served both interrupts.
For ARM7TDMI, that's really all there is to it. In newer architecture versions (ARMv7 onwards), there are some rules tightening up precisely when asynchronous exceptions are expected to be taken, since once CPU designs start becoming superscalar and/or out-of-order the notion of "instruction boundary" gets a bit blurry. This particular situation, though, would be no different on modern CPUs, as the exception return from FIQ constitutes a context-synchronising event after which any pending asynchronous exception (i.e. the IRQ) must be immediately taken.
A segfault is unlikely to be a build error. To reproduce this problem, try clearing out any saved information on the iPhone simulator before running the project; it is possible that you are assuming the existence of certain entries in NSUserDefaults that are present on your own iPhone, but which would not be available on a default installation. If that doesn't reproduce the problem, then you should create unit tests for each of your components, ruling out each component at a time as the cause of failure. Eventually, you will have ruled out every cause of failure except for the true cause of failure.
User contributions licensed under CC BY-SA 3.0