Pig & mongo-hadoop on a local ubuntu cluster

I had a surprisingly hard time getting pig and mongo-hadoop to work on my local ubuntu machine. In this post I'll go through the steps of installing pig-0.12.0 and MongoDB 2.2.3 locally. Code which I used to make sure everything is running correctly can be found at https://github.com/phildeutsch.

I will install everything in $HOME/hadoop.

Installing pig

wget http://mirror2.klaus-uwe.me/apache/pig/stable/pig-0.12.0.tar.gz
tar xzf pig-0.12.0.tar.gz
cd pig-0.12.0
cd contrib/piggybank/java
cd ~/hadoop/pig-0.12.0
ant clean jar-all -Dhadoopversion=23

You also need to add the path of pig-0.12.0 to your .bashrc file and source it: Add this line to ~/.bashrc:

export PATH=$PATH:[path to pig-0.12.0]

and type

source ~/.bashrc

Installing MongoDB

For this i followd the guide at http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/:

apt-get install mongodb-10gen=2.2.3
echo "mongodb-10gen hold" | sudo dpkg --set-selections

Make sure that everything is running OK by typing "mongo", which should get you to the MongoDB shell.

Downloading the MongoDB Java driver

cd hadoop
wget https://github.com/downloads/mongodb/mongo-java-driver/mongo-2.10.1.jar

Installing mongo-hadoop

cd hadoop
git clone https://github.com/mongodb/mongo-hadoop
cd mongo-hadoop
./sbt package

To push data from Pig to MongoDB, you need to register the three .jars by adding the following three lines at the beginning of your .pig script:

REGISTER [.../]hadoop/mongo-2.10.1.jar 
REGISTER [.../]hadoop/mongo-hadoop/core/target/mongo-hadoop-core_2.2.0-1.2.0.jar
REGISTER [.../]hadoop/mongo-hadoop/pig/target/mongo-hadoop-pig_2.2.0-1.2.0.jar

Leave a Reply

Your email address will not be published. Required fields are marked *