Computational challenges in data mining

Markus  Hegland

doi:10.21914/anziamj.v42i0.588

Authors

Markus Hegland

DOI:

https://doi.org/10.21914/anziamj.v42i0.588

Abstract

Data mining is applied in business to find new market opportunities from data stored in operational data bases which are used for day-to-day management. The tools applied combine ideas from statistics, machine learning, data base technology and high performance computing to find nuggets of knowledge. Data mining is also applied in science for example to find taxonomies of variable stars and in the national administration for the management of health care. Major computational challenges originate in the size of the data and its complexity. The analysis of complex or high-dimensional data suffers from the curse of dimensionality which is made worse if very large data sets have to be processed. Many current techniques are very good in dealing with high-dimensional data sets of moderate size or with very large data sets of moderate complexity but hardly any techniques are able to analyse very large data sets of high complexity. The challenges are further explored and computational techniques are examined with respect to their capability to handle these challenges. It is seen in particular that finite element methods are very good in dealing with very large data sets but suffer under the curse of dimensionality and radial basis functions can deal with very high dimensions but not with very large data sets. Additive functions lead to models which can be used to analyse both high-dimensional and very complex data sets, in particular when parallel computers are used for their identification. Examples include multivariate adaptive regression splines.

Computational challenges in data mining

Authors

DOI:

Abstract

Published

Issue

Section