How do I initialize a Vec from multiple threads?

1

I have code that generates a CSR sparse matrix by reading multiple parquet files in parallel, preprocessing the data, then finally acquiring a mutex to sequentially write into the sparse array data structure. This is roughly the pseudocode for each thread:

read parquet data
get list of nonzero values
acquire mutex
append nonzero values to sparse array (basically memcpy)
release mutex
repeat

The speed of the code doesn't increase with more CPUs, so I suspect the mutex contention is a bottleneck. So I would like to try replacing the mutex with an atomic "offset" variable into the sparse array, so that I can do something like this:

fn push(&self, rhs: &Self, offset: &AtomicU64) -> MyResult<()> {
    let old = offset.fetch_add(rhs.data.0.len() + (1 << 32), Ordering::SeqCst);
    let (nnz, m) = (old & 0xffffffff, old >> 32);
    let indptr = nnz.try_into()? + rhs.data.0.len().try_into()?;
    let range = nnz..nnz + rhs.data.0.len();

    self.data[range].copy_from_slice(&rhs.data.0);
    self.indices[range].copy_from_slice(&rhs.indices.0);
    self.indptr[m] = indptr;
    self.y[m] = rhs.y.0[0];

    Ok(())
}

I.e. I want each thread to first atomically update the pointer into the sparse array, then write into the preceding data. Problem is, the data is an immutable reference & (in order to pass it to each thread), and I need a mutable reference &mut (in order to update it). Reading similar questions and Rust documentation it seems you can't do this simple task like you would in C, but no alternative is provided, which is quite irritating.

multithreading
rust
parallel-processing
atomic
asked on Stack Overflow Jan 2, 2021 by grasevski • edited Jan 3, 2021 by vallentin

0 Answers

Nobody has answered this question yet.


User contributions licensed under CC BY-SA 3.0