Implementation of parallel tridiagonal solvers for a heterogeneous computing environment

Hamish J Macintosh; David J Warne; Neil A Kelson; Jasmine E Banks; Troy W Farrell

doi:10.21914/anziamj.v56i0.9371

Authors

Hamish J Macintosh
David J Warne
Neil A Kelson
Jasmine E Banks
Troy W Farrell

DOI:

https://doi.org/10.21914/anziamj.v56i0.9371

Abstract

Tridiagonal diagonally dominant linear systems arise in many scientific and engineering applications. The standard Thomas algorithm for solving such systems is inherently serial, forming a bottleneck in computation. Algorithms such as cyclic reduction and SPIKE reduce a single large tridiagonal system into multiple small independent systems which are solved in parallel. We develop portable cyclic reduction and the SPIKE algorithm for Open Computing Language implementations on a range of co-processors in a heterogeneous computing environment, including field programmable gate arrays, graphics processing units and other multi-core processors. We evaluate these designs in the context of solver performance, resource efficiency and numerical accuracy. References

Altera. Implementing FPGA design with the OpenCL standard. Technical report, Altera Inc., November 2013. http://www.altera.com/literature/wp/wp-01173-opencl.pdf
Altera. Altera SDK for OpenCL: Best practices guide. Technical report, Altera Inc., May 2015. https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/hb/opencl-sdk/aocl_optimization_guide.pdf
P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland. A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems. Parallel and Distributed Computing Practices, 2:385–400, 1999. http://www.scpe.org/index.php/scpe/article/view/152
P. Arbenz, A. Cleary, J. Dongarra, and M. Hegland. A comparison of parallel solvers for diagonally dominant and general narrow-banded linear systems II. In Euro-Par'99 Parallel Processing, Berlin, vol. 1685 of Lecture Notes in Computer Science pp. 1078–1087. Springer, 1999. doi:10.1007/3-540-48311-X_151
L.-W. Chang, J. A. Stratton, H.-S. Kim, and W. W. Hwu. A scalable, numerically stable, high-performance tridiagonal solver using GPUs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE Computer Society Press, 2012. doi:10.1109/SC.2012.12
Y. Dou, Y. Lei, G. Wu, S. Guo, J. Zhuo, and L. Shen. FPGA accelerating double/quad-double high precision floating-point applications for exascale computing. In Proceedings of the 24th ACM International Conference on Supercomputing, pp. 325–336, 2010. doi:10.1145/1810085.1810129
C. R. Dun, M. Hegland, and M. R. Osborne. Parallel stable solution methods for tridiagonal linear systems of equations. In Computational Techniques and Applications: CTAC95 Proceedings of the Seventh Biennial Conference, pp. 267–274. World Scientific, 1996. doi:10.1142/9789814530651
W. Gander and G. H. Golub. Cyclic reduction–-history and applications. In Scientific Computing, Proceedings of the Workshop, 1997, pp. 73–111. Springer, 1998. http://www.springer.com/fr/book/9789813083608
Khronos OpenCL Working Group. The OpenCL specification. Technical report, Khronos Group, October 2009. http://www.khronos.org/registry/cl/specs/opencl-1.0.pdf
M. Hegland. On the parallel solution of tridiagonal systems by wrap-around partitioning and incomplete LU factorization. Numer. Math., 59(1):453–472, 1991. doi:10.1007/BF01385791
D. Jensen and A. F. Rodrigues. Embedded systems and exascale computing. Comput. Sci. Eng., 12(6):20–29, 2010. doi:10.1109/MCSE.2010.95
S. Palmer. Accelerating implicit finite difference schemes using a hardware optimised implementation of the thomas algorithm for FPGAs. Technical Report, 2014. http://arxiv.org/abs/1402.5094
E. Polizzi and A. H. Sameh. A parallel hybrid banded system solver: the spike algorithm. Parallel Comput., 32:177–194, 2006. doi:10.1016/j.parco.2005.07.005
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in FORTRAN: The art of Scientific Computation. Cambridge University Press, Cambridge, England, 2nd edition, 1992. http://www.cambridge.org/au/academic/subjects/mathematics/numerical-recipes/numerical-recipes-fortran-77-art-scientific-computing-volume-1-2nd-edition
J. Shalf, S. Dosanjh and J. Morrison. Exascale computing technology challenges. In High Performance Computing for Computational Science–-VECPAR 2010, vol. 6449 of Lecture Notes in Computer Science, pp 1–25. Springer, 2011. doi:10.1007/978-3-642-19328-6_1
S. Skalicky, S. Lopez, M. Lukowiak, J. Letendre and M. Ryan. Performance modeling of pipelined linear algebra architectures on FPGAs. In Reconfigurable Computing: Architectures, Tools and Applications, vol. 7806, Lecture Notes in Computer Science pp. 146-153. Springer, 2013. doi:10.1007/978-3-642-36812-7_14
D. J. Warne, R. F. Hayward, N. A. Kelson, J. E. Banks, and L. Mejias. Pulse-coupled neural network performance for real-time identification of vegetation during forced landing. In Engineering Mathematics and Applications Conference (EMAC2013), vol. 55 of ANZIAM J., pp. C1–C16, 2014. http://journal.austms.org.au/ojs/index.php/ANZIAMJ/article/view/7851
D. J. Warne, N. A. Kelson, and R. F. Hayward. Solving tri-diagonal linear systems using field programmable gate arrays. In Proceedings of the 4th International Conference on Computational Methods (ICCM2012), 2012. http://eprints.qut.edu.au/54894/
D. J. Warne, N. A. Kelson, and R. F. Hayward. Comparison of high level FPGA hardware design for solving tri-diagonal linear systems. Proc. Comput. Sci., 29:95–101, 2014. doi:10.1016/j.procs.2014.05.009
L. Wirbel. Xilinx SDAccel: A unified development environment for tomorrow's data center. Technical report, The Linley Group Inc., November 2014. http://www.xilinx.com/publications/prod_mktg/sdnet/sdaccel-wp.pdf
G. Wu, Y. Dou, J. Sun and G. D. Peterson. A high performance and memory efficient LU decomposer on FPGAs. IEEE T. Comput., 61:366–378, 2012. doi:10.1109/TC.2010.278

Implementation of parallel tridiagonal solvers for a heterogeneous computing environment

Authors

DOI:

Abstract

Published

Issue

Section