A robust combination technique

Brendan Harding, Markus Hegland


One of the challenges for efficiently and effectively using petascale and exascale computers is the handling of run-time errors. Without such robustness, applications developed for these machines will have little chance of completing successfully. The sparse grid combination technique approximates the solution to a given problem by taking the linear combination of its solution on multiple grids. It is successful in many high performance computing applications due to its ability to tackle the curse of dimensionality. We present several approaches to fault tolerance using the combination technique. The first of these is implemented within the MapReduce model in order to utilise the existing fault tolerance of this framework. In addition, we present a method which utilises the redundancy shared by solutions on different grids. Finally, we describe a novel approach in which the solution is computed on additional grids which are used for alternative combinations if other grids experience failure. We include some results based on the solution of the 2D scalar advection PDE.

  • S. Balay, J. Brown, K. Buschelman, W. D. Gropp, D. Kaushik, M. G. Knepley, L. C. McInnes, B. F. Smith, and H. Zhang. PETSc Web page (2012). http://www.mcs.anl.gov/petsc.
  • G. Bosilca, R. Delmas, J. Dongarra, and J. Langou. Algorithm-based fault tolerance applied to high performance computing. Journal of Parallel and Distributed Computing, 69(4):410--416 (2009). doi:10.1016/j.jpdc.2008.12.002.
  • H. J. Bungartz and M. Griebel. Sparse grids. Acta Numerica, 13:147--269 (2004). doi:10.1017/S0962492904000182.
  • F. Cappello. Fault Tolerance in Petascale/Exascale Systems: Current Knowledge, Challenges and Research Opportunities. International Journal of High Performance Computing Applications, 23(3):212--226, (2009). doi:10.1177/1094342009106189.
  • J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113 (2008). doi:10.1145/1327452.1327492.
  • J. Garcke. Sparse grids in a nutshell. In J. Garcke and M. Griebel, editors, Sparse grids and applications, volume 88 of Lecture Notes in Computational Science and Engineering, pages 57--80. Springer (2013). doi:10.1007/978-3-642-31703-3_3.
  • M. Griebel, M. Schneider, and C. Zenger. A combination technique for the solution of sparse grid problems. In P. de Groen and R. Beauwens, editors, Iterative Methods in Linear Algebra, pages 263--281. IMACS, Elsevier, North Holland (1992). Zbl 0785.65101.
  • M. Hegland. Adaptive sparse grids. ANZIAM Journal, 44:C335--C353 (2003). http://journal.austms.org.au/ojs/index.php/ANZIAMJ/article/view/685.
  • K.-H. Huang and J. A. Abraham. Algorithm-based fault tolerance for matrix operations. Computers, IEEE Transactions on, C-33(6):518--528 (1984). doi:10.1109/TC.1984.1676475.
  • C. Zenger. Sparse Grids. In W.Hackbusch, editor, Parrallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar, Kiel, 1990, volume 31 of Notes on Num. Fluid Mech., Vieweg--Verlag, 31:241--251 (1991). Zbl 0763.65091.

Full Text:


DOI: http://dx.doi.org/10.21914/anziamj.v54i0.6321

Remember, for most actions you have to record/upload into this online system
and then inform the editor/author via clicking on an email icon or Completion button.
ANZIAM Journal, ISSN 1446-8735, copyright Australian Mathematical Society.