Given good enough architecture, the larger the model, the more learning capacity it has. Thus, these new models have huge learning capacity and are trained on very, very large datasets. Because of that, they learn the entire distribution of the datasets they are trained on. One can say that they encode compressed knowledge of these datasets. This allows these models to be used for very interesting applications—the most common one being transfer learning. Transfer learning is fine-tuning pre-trained models on custom datasets/tasks, which requires far less data, and models converge very quickly compared to training from scratch. Read: [How machines see: everything you need to know about computer vision]

How pre-trained models are the algorithms of the future

Although pre-trained models are also used in computer vision, this article will focus on their cutting-edge use in the natural language processing (NLP) domain. Transformer architecture is the most common and most powerful architecture that is being used in these models.

Although BERT started the NLP transfer learning revolution, we will explore GPT-2 and T5 models. These models are pre-trained—fine-tuning them on specific applications will result in much better evaluation metrics, but we will be using them out of the box, i.e., with no fine-tuning.

Pre-trained NLP models: OpenAI’s GPT-2

GPT-2 created quite a controversy when it was released back in 2019. Since it was very good at generating text, it attracted quite the media attention and raised a lot of questions regarding the future of AI. Trained on 40 GB of textual data, GPT-2 is a very large model containing a massive amount of compressed knowledge from a cross-section of the internet. GPT-2 has a lot of potential use cases. It can be used to predict the probability of a sentence. This, in turn, can be used for text autocorrection. Next, word prediction can be directly used to build an autocomplete component for an IDE (like Visual Studio Code or PyCharm) for writing code as well as general text writing. We will use it for automatic text generation, and a large corpus of text can be used for natural language analysis.

Text generation

The ability of a pre-trained model like GPT-2 to generate coherent text is very impressive. We can give it a prefix text and ask it to generate the next word, phrase, or sentence. An example use case is generating a product reviews dataset to see which type of words are generally used in positive reviews versus negative reviews. Let’s look at some examples, starting with what we get if we start with the positive prefix, “Really liked this movie!” As you can see, the word review was not anywhere in the prefix, but as most reviews are titles followed by the body of the review, this forced the model to adapt to that distribution. Also notice the reference to Batman v Superman. Let’s see another example. Instead of a movie review, we’ll try to generate a product review using the negative prefix, “A trash product! Do not buy.” Again, the prefix can be inferred as the title of a product review, so the model starts generating text following that pattern. GPT-2 can generate any type of text like this. A Google Colab notebook is ready to be used for experiments, as is the “Write With Transformer” live demo.

Question answering

Yes, since GPT-2 is trained on the web, it “knows” a lot of human knowledge that has been published online up till 2019. It can work for contextual questions as well, but we will have to follow the explicit format of “Question: X, Answer:” before letting it attempt to autocomplete. But if we force the model to answer our question, it may output a pretty vague answer. Here’s what happens trying to force it to answer open-ended questions to test its knowledge: As we can see, the pre-trained model gave a pretty detailed answer to the first question. For the second, it tried its best, but it does not compare with Google Search. It’s clear that GPT-2 has huge potential. Fine-tuning it, it can be used for the above-mentioned examples with much higher accuracy. But even the pre-trained GPT-2 we are evaluating is still not that bad.

Pre-trained NLP models: Google’s T5

Google’s T5 is one of the most advanced natural language models to date. It builds on top of previous work on Transformer models in general. Unlike BERT, which had only encoder blocks, and GPT-2, which had only decoder blocks, T5 uses both. GPT-2 being trained on 40 GB of text data was already impressive, but T5 was trained on a 7 TB dataset. Even though it was trained for a very, very large number of iterations, it could not go through all the text. Although T5 can do text generation like GPT-2, we will use it for more interesting business use cases.

Summarization

Let’s start with a simple task: text summarization. For those AI development companies wanting to build an app that summarizes a news article, T5 is perfectly suited for the task. For example, giving this article to T5, here are three different summaries it produced: As we can see, it has done a pretty nifty job of summarizing the article. Also, each summary is different from the others. Summarizing using pre-trained models has huge potential applications. One interesting use case could be to generate a summary of every article automatically and put that at the start for readers who just want a synopsis. It could be taken further by personalizing the summary for each user. For example, if some users have smaller vocabularies, they could be served a summary with less complicated word choices. This is a very simple example, yet it demonstrates the power of this model. Another interesting use case could be to use such summaries in the SEO of a website. Although T5 can be trained to generate very high-quality SEO automatically, using a summary might help out of the box, without retraining the model.

Reading comprehension

T5 can also be used for reading comprehension, e.g., answering questions from a given context. This application has very interesting use cases we will see later. But let’s start with a few examples: There is no explicit mention that Darwin invented the theory, but the model used its existing knowledge along with some context to reach the right conclusion. How about a very small context? Okay, that was pretty easy. How about a philosophical question? Although we know the answer to this question is very complicated, T5 tried to come up with a very close, yet sensible answer. Kudos! Let us take it further. Let’s ask a few questions using the previously mentioned Engadget article as the context. As you can see, the contextual question answering of T5 is very good. One business use case could be to build a contextual chatbot for websites that answers queries relevant to the current page. Another use case could be to search for some information from documents, e.g., ask questions like, “Is it a breach of contract to use a company laptop for a personal project?” using a legal document as context. Although T5 has its limits, it is pretty well-suited for this type of task. Readers may wonder, Why not use specialized models for each task? It’s a good point: The accuracy would be much higher and the deployment cost of specialized models would be much lower than T5’s pre-trained NLP model. But the beauty of T5 is precisely that it is “one model to rule them all,” i.e., you can use one pre-trained model for almost any NLP task. Plus, we want to use these models out of the box, without retraining or fine-tuning. So for developers creating an app that summarizes different articles, as well as an app that does contextual question answering, the same T5 model can do both of them.

Pre-trained models: the deep learning models that will soon be ubiquitous

In this article, we explored pre-trained models and how to use them out of the box for different business use cases. Just like a classical sorting algorithm is used almost everywhere for sorting problems, these pre-trained models will be used as standard algorithms. It’s pretty clear that what we explored was just scratching the surface of NLP applications, and there is a lot more that can be done by these models. Pre-trained deep learning models like StyleGAN-2 and DeepLabv3 can power, in a similar fashion, applications of computer vision. The Toptal Engineering Blog is a hub for in-depth development tutorials and new technology announcements created by professional software engineers in the Toptal network. You can read the original piece written by Nauman Mustafa here. Follow the Toptal Design Blog on Twitter and LinkedIn.

How to use pre trained models in your next business project - 42How to use pre trained models in your next business project - 60How to use pre trained models in your next business project - 97