Please use this identifier to cite or link to this item:
http://hdl.handle.net/20.500.12386/32451
Title: | The Gaia AVU-GSR parallel solver: preliminary porting with OpenACC parallelization language of a LSQR-based application in perspective of exascale systems | Authors: | CESARE, VALENTINA BECCIANI, Ugo VECCHIATO, Alberto PITARI, FABIO RACITI, MARIO TUDISCO, GIUSEPPE Aldinucci, Marco |
Issue Date: | 2022 | Series: | INAF Technical Reports - Rapporti Tecnici INAF | Report: | 163 | Abstract: | The Gaia Astrometric Verification Unit-Global Sphere Reconstruction (AVU-GSR) Parallel Solver aims to find the positions and the proper motions for ~10^8 stars in our galaxy, besides the attitude and the instrumental settings of the Gaia satellite, and the global parameter 𝛾 of the post Newtonian formalism. To find these parameters, the code solves a system of linear equations, 𝐀 × 𝒙 = 𝒃, where the coefficient matrix 𝐀 is large, containing ~10^11 x 10^8 elements, and sparse. The system of equations is solved with a customized implementation of the iterative preconditioned (PC)-LSQR algorithm and is parallelized on the CPU with MPI+OpenMP, where the computation related to different horizontal portions of the coefficient matrix is assigned to different MPI processes and it is further parallelized on the OpenMP threads. To improve the code performance, we explored the feasibility of a porting of this application on a GPU environment, by replacing the OpenMP directives with the OpenACC correspondent ones. In this preliminary porting, the ~95% of the data is copied from the host (CPU) to the device (GPU) before the entire cycle of iterations, making the code compute bound rather than data-transfers bound. The OpenACC code accelerates of a factor of ~1.5 compared to the OpenMP code. The OpenACC application runs on multiple GPUs and it was tested on the CINECA SuperComputer Marconi100, with 4 V100 GPUs per node having 16 GB of memory each. A following porting, where the OpenACC language is replaced with CUDA, was performed, optimizing the preliminary porting with OpenACC. The CUDA code has just been put into production on Marconi100 and we plan to run it on the future pre-exascale platform Leonardo of CINECA, with 4 next-generation A100 GPUs per node. | URI: | http://hdl.handle.net/20.500.12386/32451 https://doi.org/10.20371/INAF/TechRep/163 |
Fulltext: | open |
Appears in Collections: | 4.01 Rapporti tecnici INAF |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Technical_report_Gaia_MPI_OpenACC_Valentina_Cesare_et_al.pdf | 1.82 MB | Adobe PDF | View/Open |
Page view(s)
200
checked on Apr 18, 2024
Download(s)
45
checked on Apr 18, 2024
Google ScholarTM
Check
Items in DSpace are published in Open Access, unless otherwise indicated.