Return to page

H2O Quick Start on Hadoop 


H2O 3.0 is for enterprise and open-source use, with easy installations for Spark, Python, R, Yarn, Hadoop 1, Amazon EC2, Maven, laptops and standalone clusters. Users can now utilize H2O Flow, a notebook-style graphical user interface with command-line computing, to access fast, scalable H2O algorithms.



Video Narrator

Read the Full Transcript





This tutorial walks you through the installation of H2O on your Hadoop server and shows you how to launch a multi noted H2O cluster using MapReduce or YARN. The prerequisite for this walkthrough includes an installation of Java version 1.6 or newer. To start, navigate to our website at on your web browser. Click on the download button, which will land you on the downloads page. Scroll down to the latest H2O dev release. Hit the fourth tab on the top menu for instructions on installing on Hadoop. Depending on your distribution of Hadoop, choose the right installation link. If you are unsure about the version of Hadoop you are using, go to your Hadoop server and type in the Hadoop version. In my case, I am running HTP 2.1, so I would choose the HTP 2.1 zip file. Going back to the box where you run your Hadoop commands, do a Wget of the release. This release will come with an H2O driver that will allow you to launch on Hadoop. Unzip the installation file and Cd into the folder. Going back to our website with our instructions, copy and paste the Hadoop jar command that is necessary to launch an H2O instance.


From the command line, you'll be able to vary the number of nodes that you will launch as well as the size of each node. In our particular case, we're going to launch a cluster with one node and one gig on the node. It is important to point out that you will need to change the output HDFS directory each time you launch H2O. Once the cluster is up, choose any one of the nodes that you've launched and navigate to your web browser to access flow. To gain access to the cluster from R or Python, simply specify any one of the instances, IP address and port on the cluster in H2O.indentfunction, R or Python. You have just finished launching H2O on top of Hadoop.