Accelerated methods for performing the LDLT decomposition

Peter E.  Strazdins

doi:10.21914/anziamj.v42i0.648

Authors

Peter E. Strazdins

DOI:

https://doi.org/10.21914/anziamj.v42i0.648

Abstract

This paper describes the design, implementation and performance of parallel direct dense symmetric-indefinite matrix factorisation algorithms. These algorithms use the Bunch-Kaufman diagonal pivoting method. The starting point is numerically identical to LAPACK _sytrf() algorithm, but out-performs zsytrf() by approximately 15% for large matrices on the UltraSPARC family of processors. The first variant reduces symmetric interchanges, particularly important for parallel implementation, by taking into account the growth attained by any preceding columns that did not require any interchanges. However, it achieves the same growth bound. The second variant uses a lookahead technique with heuristic methods to predict whether interchanges are required over the next block column; if so, the block column can be eliminated using modified Cholesky methods, which can yield both computational and communication advantages. These algorithms yield best performance gains on `weakly indefinite' matrices (i.e. those which have generally large diagonal elements), which often arise from electro-magnetic field analysis applications. On UltraSPARC processors, the first variant generally achieves a 1--2% performance gain; the second is faster still for large matrices by 2% for complex double precision and 6% for double precision. However, larger performance gains are observed on distributed memory machines, where symmetric interchanges are relatively more expensive. On a 16 node 300MHz UltraSPARC-based Fujitsu AP3000, the first variant achieved a 10-15% improvement for small-moderate sized matrices, decreasing to 7% for large matrices. For N=10000 , it achieved a sustained speed of 5.6GFLOPs and a parallel speedup of 12.8.

Accelerated methods for performing the LDLT decomposition

Authors

DOI:

Abstract

Published

Issue

Section