Computational challenges in data mining

Markus Hegland

Abstract


Data mining is applied in business to find new market opportunities from data stored in operational data bases which are used for day-to-day management. The tools applied combine ideas from statistics, machine learning, data base technology and high performance computing to find nuggets of knowledge. Data mining is also applied in science for example to find taxonomies of variable stars and in the national administration for the management of health care. Major computational challenges originate in the size of the data and its complexity. The analysis of complex or high-dimensional data suffers from the curse of dimensionality which is made worse if very large data sets have to be processed. Many current techniques are very good in dealing with high-dimensional data sets of moderate size or with very large data sets of moderate complexity but hardly any techniques are able to analyse very large data sets of high complexity. The challenges are further explored and computational techniques are examined with respect to their capability to handle these challenges. It is seen in particular that finite element methods are very good in dealing with very large data sets but suffer under the curse of dimensionality and radial basis functions can deal with very high dimensions but not with very large data sets. Additive functions lead to models which can be used to analyse both high-dimensional and very complex data sets, in particular when parallel computers are used for their identification. Examples include multivariate adaptive regression splines.

Full Text:

PDF


DOI: http://dx.doi.org/10.21914/anziamj.v42i0.588



Remember, for most actions you have to record/upload into this online system
and then inform the editor/author via clicking on an email icon or Completion button.
ANZIAM Journal, ISSN 1446-8735, copyright Australian Mathematical Society.