What are Large Language Models?
Are you wondering What are Large Language Models and Why is there so much news recently coming around them? Even if you haven’t heard the acronym LLM or its full form Large Language Model, you might have heard about ChatGPT and its incredible capabilities.
LLMs are changing many things in our daily life and work life. In this article, we will explain about Large Language Models and their capabilities.So grab a coffee and read on.
Natural Language Processing
Natural Language Processing(NLP) is a subset of Artificial Intelligence(AI) where it tries to solve many language related problems using machine learning(ML) and/or deep learning(DL) techniques.
If you don’t know about machine learning or deep learning, don’t worry. Here is a quick read on Introduction to AI
For example Sentiment Analysis is a NLP problem which can be solved using machine learning/deep learning.
Here we are trying to understand the Sentiment of the tweet. Before some years, people used to train a text based ML/DL model with training from scratch. The problems of this approach
- Model has limited understanding of the language
- Very huge amount of data needed for training to get better results
- You can not reuse the same model for some other similar NLP tasks
How Humans learn language?
But humans don’t learn and understand the language in this way Humans learn language when they are kids and take some years to understand the language properly. They can apply this language understanding to different language tasks. For example, a human doesn't need huge amount of data to do Sentiment Analysis task. They can start work on the task with little or no examples of given task
If you wonder how humans can do NLP tasks very efficiently, you need to understand the learning process of humans. We start to learn our mother tongue very yearly and we spend many years learning to speak properly in our mother tongue. We actually go through large amount of speech and text when learning a language as a kid or learning a new language as an adult. All these learnings give us the ability to have good understanding of the language and we have the ability to apply this knowledge to perform well in different NLP tasks.
Language Models
So AI researchers have worked on building Deep Learning models which can understand the language and can perform well and adapt to different NLP tasks with few examples of the task. Just like humans do. These AI models are called Language Models (shortly LM). They have similarities with humans like
- They are trained on huge amounts of data.
- Good understanding of the language
- Ability to apply the learning in different NLP tasks
Language Models have become popular since Google introduced Transformers. These Transformer based models showed best performance in NLP tasks and many new transformers models with different architecture and size have been created by many organisations and It led to the high growth of NLP adaptation in industry.
Large Language Models
Do you ever wonder why some species have more intelligent and advanced communication skills? Like Humans, Primates, Dolphins, Whales, Elephants, Parrots and Crows. They generally have bigger brains, more neurons compared to brain size and better neuron connections.
Guess what? Deep Learning models use Neural Networks and they are similar to brain structure. As nature shows that having big brains and better neurons can be very beneficial for being intelligent and Neural Networks are similar to brains, increasing the size of Deep Learning models to increase the performance of them is an obvious choice. This applies to Language Models too. These are called Large Language Models (shortly LLM).
Training Large Language Models was very hard a few years ago. Increasing the model size means increasing the compute power and data it requires. The compute power required for training these models are very high and very costly. It's still the same case. But few big companies have trained LLM models with the help of advanced compute infrastructure. For example, GPT family of models from OpenAI, Llama family of models from Meta (formerly Facebook).
Some of the advanced large language models developed by these companie are not made available to the public. Instead, they are exclusively utilized within the companies' own products and services. Ex. Microsoft's Turing-NLG
Nevertheless, numerous Large Language Models are accessible to the public for general use, allowing developers to create AI features that harness the power of LLMs efficiently.
Open Source LLM vs Closed Source LLM
Open source large language models are models whose architectures, pre-trained weights, and code are publicly available, allowing developers, researchers, and businesses to use, modify, and build upon them. Examples of open source LLMs include OpenAI's GPT-2, Meta's Llama v2 and Google's BERT models.
Closed source large language models are proprietary models developed and owned by private companies or organizations. They are available to public use via API. Examples include OpenAI's GPT-3, GPT-4 and Google's Palm v2.
Finetuned LLM
Finetuning is the process of modifying a pre-trained LLM to perform a specific task. This enable developers to create AI features for specific use cases without having to train a model from scratch. This allows developers and businesses to create moat AI solutions by leveraging the power of LLMs.
Many companies who provide access to their LLMs allow us to finetune their LLMs via API. For example, OpenAI's GPT-3 finetuning
We can also finetune open source LLMs on our own. For example, we can finetune Meta's Llama v2 on our own.