mr-tree - a scalable mapreduce algorithm for building decision trees
;Vasile PURDILĂ;Stefan-Gheorghe PENTIUC
current eye research2014Vol. 8pp. 16-19
243
purdil2014journalmr-tree
Abstract
Learning decision trees against very large amounts
of data is not practical on single node computers due to the huge
amount of calculations required by this process. Apache Hadoop
is a large scale distributed computing platform that runs on
commodity hardware clusters and can be used successfully for
data mining task against very large datasets. This work presents
a parallel decision tree learning algorithm expressed in
MapReduce programming model that runs on Apache Hadoop
platform and has a very good scalability with dataset size.