Parallelization of a finite element surface fitting algorithm for data mining

Authors

  • Peter Christen
  • Irfan Altas
  • Markus Hegland
  • Stephen Roberts
  • Kevin Burrage
  • Roger Sidje

DOI:

https://doi.org/10.21914/anziamj.v42i0.604

Abstract

A major task in data mining is to develop automatic techniques to process and to detect patterns in very large data sets. An important data mining technique is multivariate regression, and an essential sub task is the estimation of interaction surfaces, i.e. the estimation of functions of two variables. Thin plate splines provide a very good method to determine an approximating surface. Obtaining standard thin plate splines requires the solution of a dense linear system of equations of order n , where n is the number of observations. Standard thin plate splines may not be practical, because the number of observations for data mining applications is often in the millions. We have developed a finite element approximation of a spline that can handle data sizes with millions of records. The resolution of the finite element method can be chosen independently from the number of observations. The observation data is read from secondary storage once, and does not need to be stored in memory. In this paper, we present a first parallel implementation of this method in an MPI environment.

Published

2000-12-25

Issue

Section

Proceedings Computational Techniques and Applications Conference