Recently, I implemented OpenMP into our group's project code. Main runs in two for loops; the outer controls the 'run', while the inner controls the 'generation.' Generations are completely independent from different runs, though dependent on other generations in the same run.
The idea is to parallelize the outer loop, the 'run' loop, while letting each thread maintain evolution of generations on whatever specific run number it was assigned to.
When setting OMP_THREADS = 1
, i.e. letting the program run with only one thread, it runs without a hitch. If this number is any higher, I get the following error:
Unhandled exception at 0x00F5C4C3 in projectc.exe: 0xC0000005: Access violation writing location 0x00000072.
with the following appearing in the "Autos" section of Visual Studio:
(Note: t
, t->active_cells
, and t->cellx
are "error red" while the rest are white when I get this error)
If I change default(none)
to default(shared)
in the #pragma
right above the outer loop, and remove t
, s
, and bn
from threadprivate
(these are structures initialized in external files), then the program runs normally for a generation on each thread before freezing (though CPU activity shows that both threads are still running with the same intensity as before).
I cannot figure out what is going wrong. Trying a simple #pragma omp parallel for
outside of the outer loop of course doesn't work, but I have also tried declaring all of main as #pragma omp parallel
and the outer loop as #pragma omp for
. A few other subtle approaches were tried like this as well, which leads me to the conclusions that it must be something to do with the way the variables are shared between threads...because all runs, and so threads, are independent, really all of the variables could be set as private; though there is some overlap that you see reflected in shared(..)
.
The code is attached below.
/* General Includes */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <omp.h>
/* Project Includes */
#include "main.h"
#include "randgen.h"
#include "board7.h"
#include "tissue.h"
#include "io.h"
#define BitFlp(arg,posn) ((arg) ^ (1L << (posn)))
#define BitClr(arg,posn) ((arg) & ~(1L << (posn)))
#define display_dbg 1 //Controls whether print statements in main.c are displayed.
#define display_time 1 //Controls whether timing print statements are executed.
#define BILLION 1000000000L;
#define num_runs 10 //Controls number of runs per simulation
#define num_gens 4000//Controls number of generations per run
#define OMP_THREADS 1 // Max number of threads used if OpenMP is enabled
int n, i, r, j, z, x, sxa, y, flagb, m;
int j1, j2;
char a;
int max_fit_gen, collect_data, lb_run, w, rn, sx;
float f, max_fitness;
tissuen *fx;
input_vec dx;
calookup ra;
#pragma omp threadprivate(n, r, j, x, z, sxa, y, flagb, m, \
j1, j2, a, max_fit_gen, collect_data, lb_run, w, \
rn, sx, f, max_fitness, fx, dx, ra, run_data, t, s, bn)
int main(int argc, char *argv[])
{
int* p = 0x00000000; // pointer to NULL
char sa[256];
char ss[10];
long randn;
boardtable ba;
srand((unsigned)time(NULL));
init_mm();
randn = number_range(1, 100);
#ifdef OS_WINDOWS
// Timing parameters
LARGE_INTEGER clk_freq;
LARGE_INTEGER t1, t2, t3;
#endif
#ifdef OS_UNIX
struct timespec clk_freq, t1, t2, t3;
#endif
double avg_gen_time, avg_run_time, run_time, sim_time, est_run_time, est_sim_time;
// File System and IO Parameters
char cwd[FILENAME_MAX];
getcwd(&cwd, sizeof(cwd));
char curState[FILENAME_MAX];
char recState[FILENAME_MAX];
char recMode[FILENAME_MAX];
char curGen[FILENAME_MAX];
char curRun[FILENAME_MAX];
char genTmp[FILENAME_MAX];
strcpy(curState, cwd);
strcpy(recState, cwd);
strcpy(recMode, cwd);
strcpy(curGen, cwd);
strcpy(curRun, cwd);
strcpy(genTmp, cwd);
#ifdef OS_WINDOWS
strcat(curState, "\\current.txt");
strcat(recState, "\\recover.txt");
strcat(recMode, "\\recovermode.txt");
strcat(curGen, "\\gen.txt");
strcat(curRun, "\\run");
strcat(genTmp, "\\tmp\\gentmp");
#endif
#ifdef OS_UNIX
strcat(curState, "/current.txt");
strcat(recState, "/recover.txt");
strcat(recMode, "/recovermode.txt");
strcat(curGen, "/gen.txt");
strcat(curRun, "/run");
strcat(genTmp, "/tmp/gentmp");
#endif
//Read current EA run variables (i.e. current run number, generation, recover mode status)
z = readorcreate(curState);
x = readorcreate(recState);
sxa = readorcreate(recMode);
y = readorcreate(curGen);
//Initialize simulation parameters
s.count = 0;
s.x[0] = 0;
s.y[0] = 0;
s.addvec[0] = 0;
s.bestnum = 0;
s.countb = 0;
s.count = 0;
initialize_sim_param(&s, 0, 200);
collect_data = 0;
//Build a collection of experiment initial conditions
buildboardcollection7(&bn);
//Determine clock frequency.
#ifdef OS_WINDOWS
if (display_time) get_frequency(&clk_freq);
#endif
#ifdef OS_UNIX
if (display_time) get_frequency(CLOCK_REALTIME, &clk_freq);
#endif
//Start simulation timer
#ifdef OS_WINDOWS
if (display_time) read_clock(&t1);
#endif
#ifdef OS_UNIX
if (display_time) read_clock(CLOCK_REALTIME, &t1);
#endif
#pragma omp parallel for schedule(static) default(none) num_threads(OMP_THREADS) \
private(sa, ss, randn, ba, t2, t3, avg_gen_time, avg_run_time, sim_time, \
run_time, est_run_time, est_sim_time) \
shared(i, cwd, recMode, curRun, curGen, curState, genTmp, clk_freq, t1)
for (i = z; i < num_runs; i++)
{
// randomly initialize content of tissue population
initialize_tissue_pop_s2(&(t.tgen[0]), &s);
initialize_tissue_pop_s2(&(t.tgen[1]), &s);
max_fit_gen = 0;
max_fitness = 0.0;
flagb = 0;
if ((i == z) && (x == 1))
{
w = y;
}
else
{
w = 0;
}
rn = 200;
j1 = 0;
s.run_num = i;
s.maxfitness = 0.0;
//Start run timer
#ifdef OS_WINDOWS
if (display_time) read_clock(&t2);
#endif
#ifdef OS_UNIX
if (display_time) read_clock(CLOCK_REALTIME, &t2);
#endif
#if defined(_OPENMP)
printf("\n ======================================= \n");
printf(" OpenMP Status Message \n");
printf("\n --------------------------------------- \n");
printf("| RUN %d : \n", i);
printf("| New Thread Process (Thread %d) \n", omp_get_thread_num());
printf("| Available Threads: %d of %d \n", omp_get_num_threads(), omp_get_max_threads());
printf(" ======================================= \n\n");
#endif
for (j = w; j < num_gens; j++)
{
// Flips on lightboard data collection. See board7.h.
if (enable_collection == 1) {
if ((i >= run_collect) && (j >= gen_collect)) { collect_data = 1; }
}
sx = readcurrent(recMode);
// Pseudo loop code. Uses bit flipping to cycle through boards.
j2 = ~(j1)& 1;
if (display_dbg) printf("start evaluation...\n");
// evaluate tissue
// Most of the problems in the code happen here.
evaluatepopulation_tissueb(&(t.tgen[j1]), &ra, &bn, &s, j, i);
if (display_dbg) printf("\n");
// display fitness stats to screen
printmaxfitness(&(t.tgen[j1]), i, j, j1, &cwd);
if (display_dbg) printf("start tournament...\n");
// Perform tournament selection and have children ready for evaluation
// Rarely have to touch. Figure out best parents. Crossover operator.
// Create a subgroup. Randomly pick individuals from the population.
// Pick fittest individuals out of the random group.
// 2 parents and 2 children. Children replace parents.
tournamentsel_tissueb(&(t.tgen[j1]), &(t.tgen[j2]), &s);
printf("Tournament selection complete.\n");
// keep track of best fitness during run
if (t.tgen[j1].fit_max > max_fitness)
{
max_fitness = t.tgen[j1].fit_max;
max_fit_gen = j;
}
if ((t.tgen[j1].fit_max > 99.0) && (flagb == 0))
{
flagb = 1;
run_data.fit90[i] = t.tgen[j1].fit_max;
run_data.gen90[i] = j;
}
sa[0] = 0;
strcat(sa, curRun);
sprintf(ss, "%d", i);
strcat(sa, ss);
strcat(sa, ".txt");
printf("Write fitness epc...\n");
// write fitness stats to file
writefitnessepc(sa, &(t), j1, j);
printf("Write fitness complete.\n");
// trunk for saving population to disk
if (sx != 0)
{
sa[0] = 0;
strcat(sa, genTmp);
sprintf(ss, "%d", 1);
strcat(sa, ss);
strcat(sa, ".txt");
if (display_dbg) printf("Saving Current Run\n");
}
//update current generation to file
writecurrent(curGen, j + 1);
if (display_time && j > 0 && (j % 10 == 0 || j % (num_gens - 1) == 0))
{
#ifdef OS_WINDOWS
read_clock(&t3);
sim_time = (t3.QuadPart - t1.QuadPart) / clk_freq.QuadPart;
run_time = (t3.QuadPart - t2.QuadPart) / clk_freq.QuadPart;
#endif
#ifdef OS_UNIX
read_clock(CLOCK_REALTIME, &t3);
sim_time = (double)(t3.tv_sec - t1.tv_sec);
run_time = (double)(t3.tv_sec - t2.tv_sec);
#endif
avg_gen_time = run_time / (j + 1);
est_run_time = avg_gen_time * (num_gens - j);
avg_run_time = est_run_time + run_time;
est_sim_time = (est_run_time * (num_runs - i)) / (i + 1);
printf("\n============= Timing Data =============\n");
printf("Time in Simulation: %.2fs\n", sim_time);
printf("Time in Run: %.2fs\n", run_time);
printf("Est. Time to Complete Run: %.2fs\n", est_run_time);
printf("Est. Time to Complete Simulation: %.2fs\n\n", est_sim_time);
printf("Average Time Per Generation: %.2fs/gen\n", avg_gen_time);
printf("Average Time Per Run: %.2fs/run\n", avg_run_time);
printf("=======================================\n\n");
if (j % (num_gens - 1) == 0) {
}
}
//Display Position Board
//displayboardl(&bn.board[0]);
j1 = j2;
}
}
}
typedef struct boardcollectionn
{
boardtable board[boardnumb];
} boardcollection;
boardcollection bn;
typedef struct tissue_gent
{
tissue_population tgen[2];
} tissue_genx;
typedef struct sim_paramt //struct for storing simulation parameters
{
int penalty;
int addnum[cell_numz];
int x[9];
int y[9];
uint8_t addvec[9];
uint8_t parenta[50];
uint8_t parentb[50];
int errorstatus;
int ones[outputnum][5000];
int zeros[outputnum][5000];
int probcount;
int num;
int numb;
int numc;
int numd;
int nume;
int numf;
int bestnum;
int count;
int col_flag;
int behaviour[outputnum];
int memm[4];
int sel;
int seldecnum;
int seldec[200];
int selx[200];
int sely[200];
int selz[200];
int countb;
float maxfitness;
float oldmaxfitness;
int run_num;
int collision;
} sim_param;
tissue_genx t;
sim_param s;
The code is too big for a proper testing and the use of global variables really doesn't help to figure out the data dependencies. However I can just make a few remarks:
i
is declared shared
whereas it is the index of the parallelised loop. This is wrong! If there is a variable that you really want to be private
in a omp for
loop, it is the loop index. I didn't find anything clear about that in the OpenMP standard for C and C++, whereas for Fortran, the loop index (and the ones of all enclosed loops) is implicitly privatised. Nonetheless, the Intel compiler gives an error while attempting to explicitly declare shared
such an index:
sharedi.cc(11): warning #2555: static control variable for parallel loop
for ( i=0; i<10; i++ ) {
^
sharedi.cc(10): error: index variable "i" of for statement following an OpenMP for pragma must be private
#pragma omp parallel for shared(i) schedule(static)
^
compilation aborted for sharedi.cc (code 2)
by the mean-time, gcc version 5.1.0 doesn't emit any warning or error for the same code, and acts as if the variable had been declared private
... I tend to find Intel's compiler's behaviour more reasonable, but I'm not 100% sure which one is correct. What I know however is that declaring i
shared
is definitely a very very bad idea (and even a bug AFAIC). So I feel like this is a grey area where your compiler may or may not do a sensible job, which could all by itself explain most of your problems.
You seem to output your data into files, which names might conflict across threads. Be careful with that as you might end-up with a big mess...
Your printing is very likely to be all messed-up. I don't know what importance you put in that, but that won't be pretty the way it is written for now.
In summary, your code is just to tangled for me to get a clear view on what's happening. Try to address at least the two first points I mentioned, it might be sufficient for getting it to "work". However, I couldn't encourage you enough to clean the code up and to get rid of your global variables. Likewise, try to only declare your variables as late in the sources as possible, since this reduces the need of declaring them private
for OpenMP, and it improves greatly readability.
Good luck with your debugging.
User contributions licensed under CC BY-SA 3.0