Easing into natural language processing with Hugging Face Transformers
Posted in #machinelearning #datascience
Advancements in AI have brought a lot of attention to a number of subdomains within this vast field. One interesting one is natural language processing.
What is a Hugging Face Transformer?
Why don’t we let their pretrained models answer this question?
Transformers provides general-purpose architectures for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages . The library currently contains PyTorch, Tensorflow and Flax implementations, pretrained model weights, usage scripts and conversion utilities for the following models .
Not bad AI. Not bad at all. The above quote is what a pretrained model using a summarization pipeline provides when applied to the contents of the Hugging Face Transformers documentation.
Using these pipelines allow pretty much anybody to get started down the road of natural language processing without much insight into the back-end of PyTorch or TensorFlow.
How to use Hugging Face Text Summarization
First you have to install the transformers package for Python.
pip3 install transformers
Once you have this installed it is a simple matter of importing the pipeline, specifying the type of model we want to run; in this case summarization, and then passing it your content to summarize.
            from transformers import pipeline
            text = "Insert a wall of text here"
            summarization = pipeline("summarization")
            summary_text = summarization(text)[0]['summary_text']
            print(summary_text)
For beginners and experts
The simplicity of these libraries mean you can get started quickly. You can do a lot out of the gate with these libraries and you’ll quickly notice the limitations of the vanilla models. Don’t get me wrong, they are amazing, but if you want to do fine tuning, expect to get reading on some documentation.
I’d suggest identifying a community contributed model that seems interesting and then reverse engineering that if you want to see how they come together.
Ultimately, I believe Hugging Face brings a democratization of NLP for developers in a sense. It is much easier to apply pretrained models to accomplish common tasks such as sentiment analysis, text summarization, and even question generation!
It also opens up NLP and AI practitioners to get involved by contributing to model building and improving the quality of the output that enthusiasts such as myself can enjoy without pouring through documentation tuning parameters when that isn’t my day job!
Give these transformers and pretrained models a try and let me know what you think! Have you found interesting uses for these on any projects?