Return to page


Improving NLP Model Performance with Context-Aware Feature Extraction


By Jo-Fai Chow | minute read | October 08, 2021

Blog decorative banner image

I would like to share with you a simple yet very effective trick to improve feature engineering  for text analytics. After reading this article, you will be able to follow the exact steps and try it yourself using our H2O AI Cloud .

First of all, let’s have a look at the off-the-shelf natural language processing (NLP) recipes in H2O Driverless AI  (one of our AI Cloud’s AutoML  products). We have some standard text transformation recipes like Term Frequency-Inverse Document Frequency (TF-IDF) as well as some complex ones like Convolutional Neural Network  (CNN), Bi-directional Gated Recurrent Unit (BiGRU), and Bidirectional Encoder Representations from Transformers (BERT) . You can find the full list of available text transformers here .

Off-the-shelf NLP recipes in H2O Driverless AI
www.h2o.ai2019/02/driverless-ai-npl-recipes.png www.h2o.ai2019/02/driverless-ai-npl-recipes.png

So, in other words, we already have many general-purpose NLP recipes to cover the most common text analytics use cases. But we don’t stop right there. We know that it is possible to further improve predictive performance with smart and, more importantly, domain-specific feature extraction. That’s why we make the NLP capabilities in Driverless AI extensible via custom recipes . We can leverage state-of-the-art NLP models from the research community and perform context-aware feature extraction with minimal effort in Driverless AI.

Let me show you how.

A Quick Tutorial – Airline Twitter Sentiment

The Airline Twitter Sentiment dataset was scraped in 2015 and contributors were asked to classify positive, negative, and neutral tweets. You can find out more about the dataset and download it from here . Out of the 20 columns available in the dataset, we are only interested in text (the single feature) and airline_sentiment (the target).

Airline Twitter Sentiment Dataset
www.h2o.ai2021/10/Screenshot-2021-10-07-230826.png www.h2o.ai2021/10/Screenshot-2021-10-07-230826.png

Step 1 – Split the Data

Follow these steps  to import the airline dataset into Driverless AI. Since the Airline Twitter Sentiment dataset is just a single CSV without a dedicated test dataset, we can split the dataset into airline_train and airline_test using the dataset splitter  as shown below.

Dataset splitter interface in Driverless AI
www.h2o.ai2021/10/nlp_blog_split.png www.h2o.ai2021/10/nlp_blog_split.png

Step 2 – Build a Baseline Model

Now we are ready to train our first model using airline_train and then evaluate the out-of-bag performance with airline_test. For the first baseline model, we are going to leave most settings as default. Since we are only using the text column as a single feature for this exercise, we need to remove the rest (see dropped columns settings below) before we launch the experiment.

Driverless AI model training settings for the baseline model
www.h2o.ai2021/10/nlp_blog_baseline_setting.png www.h2o.ai2021/10/nlp_blog_baseline_setting.png
Remember to drop everything but text in dropped columns setting
www.h2o.ai2021/10/nlp_blog_drop_features.png www.h2o.ai2021/10/nlp_blog_drop_features.png

As we haven’t switched on complex text transformation (e.g. CNN, BiGRU, BERT), the transformed features from this simple experiment are all TF-IDF-based. We can certainly improve this baseline model with more complex transformation so let’s move on to the next step.

The most important features for the baseline model are TF-IDF-based word embeddings
www.h2o.ai2021/10/nlp_blog_baseline_perf.png www.h2o.ai2021/10/nlp_blog_baseline_perf.png

Step 3 – Improve the Baseline with CNN and BiGRU Feature Transformation

In order to switch on more complex text transformation, we need to change two values in expert settings  as shown below. This will activate word-based CNN and BiGRU text transformation in the automatic feature engineering pipeline. As a result, we can see that the dominant features in the experiment are created based on CNN and BiGRU (instead of TF-IDF-based features in the baseline model). We can also see an improvement in model performance (i.e. lower logloss and error rate). Can we further improve this? Read on.

Enable word-based CNN and BiGRU models in NLP expert settings
www.h2o.ai2021/10/nlp_blog_cnn_bigru_setting.png www.h2o.ai2021/10/nlp_blog_cnn_bigru_setting.png
New Features from CNN and BiGRU lead to better predictive performance
www.h2o.ai2021/10/nlp_blog_cnn_bigru_perf.png www.h2o.ai2021/10/nlp_blog_cnn_bigru_perf.png

Enter the Hugging Face Model Hub

Before we get to the next step, let me introduce a fantastic platform called Hugging Face. Here is the statement on their website:

“We are helping the community work together towards the goal of advancing Artificial Intelligence  . Not one company, even the Tech Titans, will be able to “solve AI” by themselves – the only way we’ll achieve this is by sharing knowledge and resources. On the Hugging Face Hub we are building the largest collection of models, datasets and metrics in order to democratize and advance AI for everyone  . The Hugging Face Hub works as a central place where anyone can share and explore models and datasets.” (Source )

For our Airline Twitter Sentiment exercise, we are going to find a relevant transformer on Hugging Face so that we can perform better feature extraction than those from the general-purpose text transformers in Driverless AI.

Find out more on Hugging Face’s website
www.h2o.ai2021/10/nlp_blog_huggingface.png www.h2o.ai2021/10/nlp_blog_huggingface.png

Step 4 – Find a Domain-Specific Transformer

From a quick search on Hugging Face using the keyword twitter, we can find the twitter-roberta-base-sentiment model from Cardiff NLP  group. The model was trained on many different tweets. That sounds relevant to our use case here so let’s give it a try!

Searching for domain-specific models on Hugging Face
www.h2o.ai2021/10/nlp_blog_twitterroberta_transformer1.png www.h2o.ai2021/10/nlp_blog_twitterroberta_transformer1.png
Example outputs of the twitter-roberta-base-sentiment model that can be used as new features
www.h2o.ai2021/10/nlp_blog_twitterroberta_transformer2.png www.h2o.ai2021/10/nlp_blog_twitterroberta_transformer2.png

Step 5 – Extract Context-Aware Features with the Twitter-Roberta-based Transformer

Now, this is the most important step . If you get this right, you will be able to import many more transformers from Hugging Face.

First, we need to write a simple Python script that imports the twitter-roberta-base-sentiment transformer into Driverless AI. Let’s call this script The most important parameters in this script are MODEL_NAME and class. Replace them with other transformers from Hugging Face and you will be able to import many other transformers into Driverless AI.

from h2oaicore.systemutils import config
from h2oaicore.transformer_utils import CustomTransformer
from h2oaicore.transformers_nlp import BERTTransformer

MODEL_NAME = 'cardiffnlp/twitter-roberta-base-sentiment'

class TwitterRoberta(BERTTransformer, CustomTransformer):
 _mojo = False

 def get_default_properties():
 return dict(col_type="text",

 def get_parameter_choices():
 return dict(model_type=[MODEL_NAME],

Once we have the script ready, we can go to the recipes tab in expert settings and upload the script as shown below. You will also need to enable it by selecting TwitterRoberta in the specific transformers setting. After that, you should be able to see TwitterRoberta in the feature engineering search space.

Adding new feature transformation via custom recipe
www.h2o.ai2021/10/nlp_blog_twitterroberta_settings1.png www.h2o.ai2021/10/nlp_blog_twitterroberta_settings1.png
Twitter-Roberta-based transformation is now available for the feature engineering pipeline
www.h2o.ai2021/10/nlp_blog_twitterroberta_settings2.png www.h2o.ai2021/10/nlp_blog_twitterroberta_settings2.png

As expected, we can get better predictive performance with domain-specific features from the twitter-roberta-base-sentiment model.

Twitter-Roberta-based features further improve the predictive performance
www.h2o.ai2021/10/nlp_blog_twitterroberta_perf.png www.h2o.ai2021/10/nlp_blog_twitterroberta_perf.png

Quick Recap

In short, we start with a simple baseline model using the standard text transformations like TF-IDF and then improve the performance with CNN/BiGRU feature transformations. In order to perform context-aware and domain-specific feature extraction, we import the twitter-roberta-base-sentiment transformer and further improve the model performance.

Comparing model performance based on various text transformations
(score = logloss, lower = better)
www.h2o.ai2021/10/comparison.png www.h2o.ai2021/10/comparison.png

Your Turn to Try!

It is possible to improve the model even further (see screenshot below). I am not going to reveal the exact procedure but I am sure you can figure it out fairly quickly. Here are a few hints:

Mix and match different text transformers. Yes, you can do better than this!
www.h2o.ai2021/10/nlp_blog_even_higher_perf.png www.h2o.ai2021/10/nlp_blog_even_higher_perf.png

Key Takeaways

With custom recipes , it is possible to extend and improve text transformation in Driverless AI using state-of-the-art models from the AI community. Thus, we already have the technology in place to future-proof our automatic feature engineering pipeline. We are excited to see what our users can do with different transformers. For example, could you extract predictive features with BioBert  for health care use cases? Could you get a competitive edge in the stock market with features from FinBert ? The possibilities are endless. We hope that our technology will enable our users to benefit from the latest transformers with minimal effort for many years to come.

How to Get Started?

H2O AI Cloud is the best way to get free, hands-on experience. No installation. All you need is a web browser. Request a demo today. 


The advanced text analytics feature discussed in this article is brought to you by Sudalai Rajkumar , Maximilian Jeblick , and Trushant Kalyanpur .


Jo-Fai Chow

Jo-fai (or Joe) has multiple roles (data scientist / evangelist / community manager) at Since joining the company in 2016, Joe has delivered H2O talks/workshops in 40+ cities around Europe, US, and Asia. Nowadays, he is best known as the H2O #360Selfie guy. He is also the co-organiser of H2O's EMEA meetup groups including London Artificial Intelligence & Deep Learning - one of the biggest data science communities in the world with more than 11,000 members.