Return to page


H2O & LiblineaR: A tale of L2-LR


By Team | minute read | October 10, 2013

Category: Uncategorized
Blog decorative banner image

tl;dr: H2O and LiblineaR have nearly identical predictive performance. 


In this blog, we examine the single-node implementations of L2-regularized logistic regression (LR)  by H2O  and LiblineaR .
Both LibR and H2O are driven from the R console on the same hardware and evaluated on the same datasets. We compare regression  coefficients and behavior (AUC, Precision, Recall, F1) on hold out data. Before starting into the performance comparison, let’s discuss some of the differences between the two packages.

Implementation Differences

Whooa… there shouldn’t be any modeling differences, right? Well.. no, but there can  be subtle implementation differences! Here we explain a few of the implementation details of H2O’s GLM and LiblineaR’s.


While we don’t focus on the distributed aspects of H2O, it should be acknowledged that H2O’s GLM modeling results come back as if the model was built on a single machine and retain the higher-quality single-machine results! H2O’s state-of-the-art GLM uses Stephen Boyd’s ADMM solver , allows for any combination of L1 & L2, performs automatic factor expansion (easily handling factors with thousands of levels), cross-validation , and optionally performs a grid search over the parameters. There are all sorts of model evaluation metrics reported by H2O’s GLM: AUC, AIC, Error, by-class error, and deviances.

How does H2O distribute GLM? 

A Gram matrix is built in a parallel and distributed way. The algorithm is essentially a two-step, iterative process of building a Gram matrix and then solving for betas, building a Gram, solving for betas, and so on, until convergence on the betas. In a distributed setting with N  nodes, each node computes a Gram over its data. The Gram’s are reduced together and the result is bit-for-bit identical to doing it all locally. If you want more, here are some slides on what we implemented: . Also here is a link to the implementation in our git: .


LiblineaR is also an open source  implementation of GLM in C++. We note that it is discussed extensively elsewhere  [pdf ], but also point out that it too has grid search capabilities and cross-validation.

In order to make fair comparisons, we match the input parameters between H2O and LiblineaR. Note that the cost parameter in LiblineaR is inversely proportional to the lambda used in H2O, scaled inversely by the number of parameters in the model:

$$C = \cfrac{1}{(\ell \times \lambda)}$$

where $$C$$ is the cost parameter in LiblineaR, $$\ell$$ is the number of features, and $$\lambda$$ is the shrinkage parameter.

Hardware, Software, & Datasets



All comparisons were performed on a single machine with the following attributes (from /proc/cpuinfo)

processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
microcode : 0x710
cpu MHz : 1200.000
cache size : 20480 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 47
initial apicid : 47
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 5199.90
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual


We used R version 3.0.2 “Frisbee Sailing” to interface with both LiblineaR (version 1.93) and H2O (build 1064).

Driving H2O from within R is easy! Checkout this blog  and some slides from a recent meetup on the subject  and of course this is all documented, 


We used 3 datasets: Prostate, Sample Airlines (years 1987 – 2008), and Full Airlines (years 1987 – 2013). These data are publicly available to download . The parameters and models built on these datasets are as follows:

ProstateSample Airlines(’87 – ’08)Full Airlines(’87 – ’13)
Features in Model633
Number of Training Instances30624,442128,654,471
Number of Testing Instances762,69214,290,947



Prostate: capsule ~ gleason + dpros + psa + dcaps + age + vol

family = binomialtype = 0
link = logit..
lambda = 1 / 700cost = 100
alpha = 0.0..
beta_epsilon = 1E-4epsilon = 1E-4
nfolds = 1cross = 0



Small Airlines(years 1987 – 2008 sampled): isdepdelayed ~ deptime + arrtime + distance

family = binomialtype = 0
link = logit..
lambda = 0.0033333cost = 100
alpha = 0.0..
beta_epsilon = 1E-4epsilon = 1E-4
nfolds = 1cross = 0



Full Airlines(years 1987 – 2013): isdepdelayed ~ deptime + arrtime + distance

family = binomialtype = 0
link = logit..
lambda = 0.0033333cost = 100
alpha = 0.0..
beta_epsilon = 1E-4epsilon = 1E-4
nfolds = 1cross = 0



Numerical Performance





Mean relative difference: 0.01601093

Test EvaluationAUCPrecisionRecallF1 Score



Sample Airlines (years 1987 – 2008 sampled)



Mean relative difference: 0.01759207

Test EvaluationAUCPrecisionRecallF1 Score



Full Airlines (years 1987 – 2013)



Mean relative difference: 0.006942185

Test EvaluationAUCPrecisionRecallF1 Score



Remarks & Conclusions

We can see that the H2O and LiblineaR do not vary much from one another (they all have a small mean relative difference of $$\approx 1 – 2\%$$). Typically, we would expect the objective functions being minimized to match exactly, and allow for differences in the coefficients (we see here that the betas are usually within $$10^{-3}$$). What is emphasized here are the similarities in predictive power, and we note that the AUCs above are all nearly identical.

It would be informative to involve a third reference (e.g. glmnet) to bolster the comparisons here. As this is a first stab at comparing H2O and LiblineaR, it is by no means complete. We will continue to add to this blog other datasets fit for comparison, and additionally give benchmark characteristics.

Additionally, we have skipped over a couple of obvious things: no categoricals were used here and the models aren’t very good. For this comparison, we stripped down to the bare minimum (expanding categoricals for LiblineaR will be something that is tackled in the future) and studied non-categorical data only. All modeling was done by first setting the cost parameter to 100 and then proceeding (nothing magic about $$C = 100$$).


The data are here: 
And the R scripts are here: 

 headshot Team

At, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.