Return to page

H2O-3 Demo: Customer Churn

H2O data scientist Nick Martin explains how to use H2O in the Flow UI to predict customer churn.



Talking Points:



Nick Martin, Data Scientist, H2O

Read the Full Transcript





Nick Martin:


Hi there. My name is Nick Martin and I'm a data scientist with H2O. H2O is an open source, fast, scalable machine learning platform. During today's brief demo, I'm going to show you how you can quickly and easily download H2O, install it on your local laptop, and have it up and running in just a few minutes. After that, we'll take a brief look at predicting customer churn using the 2009 KDD Cup challenge data set. Of course, predicting customer churn is of great interest to a lot of organizations, and it's a use case we hear about quite often. It's often said that retaining a customer is infinitely more cost effective than acquiring a new one. And I'm going to show you how H2O can play a key role in your data science pipeline by enabling you to rapidly create predictive models, the output of which can then be used in your existing business processes or applications. In this case, hopefully to mitigate customer churn and increase customer retention. So let's get started.


Getting Started: H2O Download and Install


I'm on the H2O homepage at and I'm going to click the download button. The download page has several key sections that we'll take a quick look at. At the top, you can enter in your email address to sign up for H2O updates, which of course I strongly recommend you do. You can download our latest stable release, our new exciting integration with Spark called Sparkling water, and at the bottom are some of our older builds under the classic section. For today's demo we're going to use our latest stable release, which is called Simons. I'll click on that. This takes me to a download page where I can download the H2O zip file, which I've already done. And then in three easy steps we'll have H2O up and ready to go. I can also install it in R, Python, install it on Hadoop, or use it from Maven. Since I've already downloaded the zip file, we can jump right into this set of instructions here. You'll see the H2O zip file is already in my downloads folder, so all I have to do is then zip it. Once that's done, I'll change into the H2O directory and we'll start H2O.


Once H2O is up and running, all I need to do is point my local host to 54321. And you'll see our very nice user interface called Flow and Flow is designed to help data scientists rapidly and easily create models, import files, split data frames, and do all the things that would normally require quite a bit of typing in other environments. From here on the right hand side, we have a help section, which I'd encourage. If you're new to H2O, you spend some time getting familiar with, including quick start videos, example Flows, and then a general help section down at the bottom.


Predicting Customer Churn: 2009 KDD Cup Challenge data set 


For today, we're going to view the example Flows and select the KDD Cup 2009 churn Flow. I'll load that notebook and you'll see some information at the top about the challenge and two links to data sets, one of which is the training data set, the other is the validation data set. I've already downloaded those to my Downloads folder so I can go right into one of the cells and Flow and change the data directory to point to my downloads folder. Once I've done that, I can run the cell by either clicking the play button or hitting control enter.


After that's done, I'm ready to parse the files. The first of which is the training data set. H2Os parser is really fast, as you can see, it is already done parsing the training data so that it is available for us to view. But first I'm going to go ahead and parse the validation data set as well. Now that I'm done parsing both the training and the validation data set, we can take a look at a couple of the other user interface aspects of Flow that help data scientists do their job easier. To view the validation data set, I can just click on the key and I can see not only the data sides from a volume standpoint, there's 10,000 rows, 231 columns, and H2O has compressed it down to four megs. But I can also see important information about my data, including the labels, the data types, men's max and mean cardinality. And if I need to, I can convert data types from here. Once I've got a good feel for what my data contains, I'm ready to build a model.


Building a Model: H2O Data Science Pipeline


Within Flow. I can either type in the model parameters that I'd like to use, such as what's been provided here. This is a gradient boosted machine that's using the training data set as the training frame and the validation set as the validation frame, and several key parameters have been set here. However, I could also use the user interface portion of Flow to build a model, in which case I would just select the algorithm I want to use, in this case GBMs, but H2O ships most core ML algorithms that you use in your day-to-day work, including distributed random forests, K means GMs, et cetera. And of course, I encourage you to check out what we've done in the deep learning space. You can Google H2O deep learning and find quite a few YouTube videos and other meetup presentations that we've given. And if I click on one of these algorithms such as gbm, NICE User interface comes up, that lets me select my training frames, my validation frames, and I can set all the other model parameters that would normally require me to do quite a bit of typing all right here in one convenient place. And then I can build a model.


However, for demo purposes, I'll use this text block that's provided in the example Flow so that you can follow along, if you're doing this yourself. I simply hit run and H2O will begin to build the model. As that's happening, I can monitor the progress right here from the ui and you'll see that H2O is nearly done building this gbm, and this should finish in about 25 seconds or so. And there we go. The model has been built and so I can view my model. As a data scientist, I'm presented with quite a few graphics that can sort of easily let me judge how well my model's done. And then I also get quite a few outputs down here at the bottom that I can use to dig into various other output metrics. And again, these are things that I'd normally have to code up using a different platform, but thankfully Flow does all of that work for me.


You can see that I get variable importances, I can look at the area under the curve for my model and all in one nice little ui. I've got a good feel for how well my model performed and it's ready to be integrated with perhaps an application. If I have an internal application that we use for monitoring customer churn and hopefully preventing it, I can quickly and easily export a plain old Java object and hand that off to my developers to integrate into one of their existing applications so that my H2O predictive model can become part of my business process and business application.


As you're thinking about potentially using H2O to model your customer churn, I'd highly encourage you to give us a try. Hopefully this demo helped show you how you can quickly and easily get H2O installed and use one of our example Flows as a template for developing your own work. Thanks for your time and I hope to see you soon. Bye.