Moving from Multithreaded CPU program to GPU in C++

Question

Moving from Multithreaded CPU program to GPU in C++

I created a program that needs to call a function multiple times (lots !!) with different input parameters. To speed things up, I multithreaded this like this:

std::vector< MTDPDS* > mtdpds_list;
boost::thread_group thread_gp;
for (size_t feat_index = 0; feat_index < feat_parser.getNumberOfFeat(); ++feat_index)
{
    Feat* feat = feat_parser.getFeat(static_cast<unsigned int>(feat_index));

    // != 0 has been added to avoid a warning message during compilation
    bool rotatedFeat = (feat->flag & 0x00000020) != 0;
    if (!rotatedFeat)
    {
        Desc* desc = new Desc(total_sb, ob.size());

        MTDPDS* processing_data = new MTDPDS();
        processing_data->feat = feat;
        processing_data->desc = desc;
        processing_data->img_info = image_info;
        processing_data->data_op = &data_operations;
        processing_data->vecs_bb = vecs_bb;

        mtdpds_list.push_back(processing_data);

        thread_gp.add_thread(new boost::thread(compute_desc, processing_data));
    }
}

// Wait for all threads to complete
thread_gp.join_all();

This code is a piece of a much larger code, so don't worry too much about variable names, etc... The important thing is that I create an object (MTDPDS) for each thread that contains input and output parameters, then spawn a thread calling my processing function compute_desc, and wait for all threads to complete before continuing.

However, my for loop has about 2000+ iterations, meaning that I start about 2000+ threads. I run my code on a cluster, so it's pretty fast, though it still takes too long IMO.

I would like to move this part to the GPU (as it has much more cores), though I'm new to GPU programming.

Is there a way (as I already have a separated computing function) to move this easily without changing the whole code? Like a function that could start threads on GPU in a similar way as boost (like replacing boost thread with GPU thread)?
Also, my computing function is accessing some data loaded in memory (RAM here), does the GPU requires to have these data loaded into GPU memory, or can it access RAM (and then in this case, which one is faster)?
And one last question (though I'm pretty sure I know the answer), is it possible to make it hardware independent (so my code could run on Nvidia, ATI, etc...)?

Thank you.

c++

multithreading

cuda

gpu

gpu-programming

asked on Stack Overflow Jul 21, 2017 by

whiteShadow • edited Jul 21, 2017 by

whiteShadow

1 Answer

1) The simplest solution is to use #pragma directive (OpenACC) which should be already present in GCC7.
2) your data should be GPU friendly, understand Structure of Array
3) your compute_desc "kernel" should be GPU compliant, if you do not know let say it should vectorizable by the compiler.

I hope it will help a bit, I think a little tutorial on OpenACC tuto should the best solution for you, CUDA/OpenCL should come later. My 2 cents

answered on Stack Overflow Jul 21, 2017 by

Timocafé • edited Jul 21, 2017 by

Timocafé

User contributions licensed under CC BY-SA 3.0