I had a surprisingly hard time getting pig and mongo-hadoop to work on my local ubuntu machine. In this post I'll go through the steps of installing pig-0.12.0 and MongoDB 2.2.3 locally. Code which I used to make sure everything is running correctly can be found at https://github.com/phildeutsch.
I will install everything in $HOME/hadoop.
wget http://mirror2.klaus-uwe.me/apache/pig/stable/pig-0.12.0.tar.gz tar xzf pig-0.12.0.tar.gz cd pig-0.12.0 ant cd contrib/piggybank/java ant cd ~/hadoop/pig-0.12.0 ant clean jar-all -Dhadoopversion=23
You also need to add the path of pig-0.12.0 to your .bashrc file and source it: Add this line to ~/.bashrc:
export PATH=$PATH:[path to pig-0.12.0]
For this i followd the guide at http://docs.mongodb.org/manual/tutorial/install-mongodb-on-ubuntu/:
apt-get install mongodb-10gen=2.2.3 echo "mongodb-10gen hold" | sudo dpkg --set-selections
Make sure that everything is running OK by typing "mongo", which should get you to the MongoDB shell.
Downloading the MongoDB Java driver
cd hadoop wget https://github.com/downloads/mongodb/mongo-java-driver/mongo-2.10.1.jar
cd hadoop git clone https://github.com/mongodb/mongo-hadoop cd mongo-hadoop ./sbt package
To push data from Pig to MongoDB, you need to register the three .jars by adding the following three lines at the beginning of your .pig script:
REGISTER [.../]hadoop/mongo-2.10.1.jar REGISTER [.../]hadoop/mongo-hadoop/core/target/mongo-hadoop-core_2.2.0-1.2.0.jar REGISTER [.../]hadoop/mongo-hadoop/pig/target/mongo-hadoop-pig_2.2.0-1.2.0.jar