Parallelizing these algorithms is much easier than parallelizing the FFT/Poisson solver, because the only communication required is typically a reduction (Indeed, existing iterative solvers in Julia may well work as-is with DistributedArrays, because the latter already supports norm and dot and similar linear-algebra operations.) for parallel dot products), which is trivial with MPI.jl or similar. Parallelizing these algorithms is much easier than parallelizing the FFT/Poisson solver, because the only communication required is typically a reduction (e.g. If you look at the MPI section of the FFTW manual, you’ll see that the caller is required to explicitly create distributed arrays.ĭistributedArrays.jl makes this somewhat easier, but it doesn’t have built-in FFT support yet. MPI FFTW parallelism is not hidden completely in a library, precisely because in a distributed memory setting the user has to be aware of the data distribution. Has anyone experience with MPI.jl ? Is it ready/ stable/ performant ? What I see in the documentation is some basically thread parallel approach or spawning of independent workers. aries, infiniband, omnipath) and its software stack underneath. Parallelization appears to be generally more of an afterthought at the moment (ok, that is true for most other languages, too) but I did not hear of any (large scale) codes using julia and serious MPI for either linear algebra or fft where you spend much time using a high performance network (e.g. So again my question: what would be a technically viable parallelization strategy with julia for typical electronic structure theory given the current approach of the language to parallelization and the current ecosystem ? Does anyone know ?īecause I do not see it. On top of course a Hamiltonian matrix must be diagonalized, so usually you start thinking about parallel block Davidson, LOPCG and similar (on top of fast BLAS/Lapack), eventually based on blacs/ScaLapack (which is not the right toolbox) or homegrown. But that is usually done with MPI (because usually you overflow a single nodes memory at some point even ignoring cores per node and speed) by distributing explicitely one fft dimension across nodes, so not hidden in a library like a MPI parallel fftw. My understanding is that PW codes rely essentially on parallel FFT.įFT primarily for the Poisson solver (or GW if you want to do GW), that is certainly true.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |