Problems Installing Hadoop 0.20 and Dumbo 0.21 on Ubuntu
Oct 18, 2009 · 2 minute readThe Hadoop wiki has a great introduction to installing this piece of software, which I wanted to do to have a play with Dumbo. The Dumbo docs also have a good getting started section which includes a few patches than need to be applied.
Dumbo can be considered to be a convenient Python API for writing MapReduce programs
Unfortunately it’s not quite that simple, at least on Ubuntu Jaunty. Hadoop now uses Java6, but if you just follow the instructions on the wikis you’ll hit a problem when you run “ant package”, namely that a third party application (Apache Forrest) requires Java 1.5. Once you fix that, the build script will complain again that you need to install Forrest. Here’s what I did to get everything working:
pre. sudo apt-get install ant sun-java5-jdk
pre. su - hadoop wget http://mirrors.dedipower.com/ftp.apache.org/forrest/apache-forrest-0.8.tar.gz tar xzf apache-forrest-0.8.tar.gz cd /usr/local/hadoop patch -p0 < /path/to/HADOOP-1722.patch patch -p0 < /path/to/HADOOP-5450.patch patch -p0 < /path/to/MAPREDUCE-764.patch ant package -Djava5.home=/usr/lib/jvm/java-1.5.0-sun -Dforrest.home=/home/hadoop/apache-forrest-0.8/
With all that out of the way you should be able to run the simple examples found on the rather excellent dumbotics blog. If you’re using the Cloudera distribution, or when the Hadoop 0.21 gets a release, these problems will disappear but in the meantime hopefully this saves someone else a bit of head scratching.