Text Completion

Understanding How LLMs Actually Work

AI models generally accepts a input and returns a output. Type of the input and output will be different for different models. For example, a image recognition model will take an image as input and return a label number as output. A image generation model will take a text prompt as input and return an image as output. In Large Language Models, the input is a 'text' and the output is also a 'text'.

Generate the next word

Imagine you're sitting with a friend who has memorized every book in the world. You give them some text (we call this the prompt), and they try to predict what comes next (we call this the completion).

Here's the fundamental truth: ALL an LLM does is look at the prompt and generate the next word. That's it! Let's see this prediction game in action:

Prompt: Students opened their

Likely completions: book, laptop, notepad

Repeated Generation

For real life use cases, generating just one word is not enough. So we need to make the LLM generate multiple words. After the LLM generates the first word, we append the first word to the end of the prompt and generate the next word. We repeat this process multiple times to generate a stream of text.

Initial Prompt: "Students opened their"
LLM generates: "book" (high probability)
New prompt becomes: "Students opened their book"
LLM generates: "and" (medium probability)
New prompt becomes: "Students opened their book and"
LLM generates: "began" (medium probability)
And so on...

This is crucial to understand: The LLM isn't "thinking" about students or books. It's just looking at patterns: After the sequence "The student opened their", the word "book" appeared frequently in its training data. Even complex tasks are just text completion.

Tip

It's ALL just the LLM predicting what word typically follows the given text

Stopping the Generation

LLMs need to know when to stop generating text. This happens in two ways:

Built-in Stop Token: When the LLM is trained, it is trained to generate a end token when it needs to complete a text.

Human: Tell me a joke
Assistant: Why did the chicken cross the road?
To get to the other side!
<end_of_response>  # LLM knows to stop here

User-Defined Stop Words: LLMs have multiple generation settings that can be configured. One of them is stop_words. When the LLM generates a text, it will stop generating text when it encounters a stop word.
```
Prompt: "Write product names until you see 'STOP':"
Output: "SuperPhone
MegaLaptop
UltraWatch
STOP"  # Generation halts here
```

In the repeated generation process, if we encounter either end token or one of the stop words, we stop the generation. This is crucial because without stop conditions, the LLM would keep generating forever, trying to predict the next most likely word!

Temperature Setting

Temperature is one of the most important LLM generation setting. Temperature controls the randomness of the generated text. Lower temperature means more deterministic and higher temperature means more random.

Randomness Dial

Think of temperature like this:

Low Temperature (0.1):

"The cat sat on the"
Almost always completes as: "mat"
Because it always picks the most probable next word

High Temperature (0.8):

"The cat sat on the"
Might complete as:
- "windowsill" (less common but plausible)
- "keyboard" (getting creative)
- "spaceship" (getting wild)

Multiple Attempts

Here's where it gets interesting. Let's try the same prompt multiple times with different temperatures:

Temperature = 0.1 (3 tries):

Prompt: "Write a slogan for a tech startup"
Try 1: "Innovation for the future"
Try 2: "Innovation for the future"
Try 3: "Innovation for the future"
(Boring but consistent)

Temperature = 0.7 (3 tries):

Prompt: "Write a slogan for a tech startup"
Try 1: "Disrupting tomorrow, today"
Try 2: "Code your dreams into reality"
Try 3: "Where bytes meet brilliance"
(More creative, varied results)

Temperature = 1.0 (3 tries):

Prompt: "Write a slogan for a tech startup"
Try 1: "Quantum puppies in the metaverse"
Try 2: "Dancing with digital dragons"
Try 3: "Cybernetic sunflowers bloom at midnight"
(Wild and unpredictable)

Tip

Picking the right temperature according to the task is crucial. For example, if we want to generate a creative text for a marketing campaign, we should use a higher temperature. If we want to generate a deterministic text for a question answering task, we should use a lower temperature.

Performance Differences

Different LLMs = Different Completion Qualities

Just like different students might complete the sentence "The capital of France is __" differently:

A good student: "Paris"
A confused student: "London"
An uncertain student: "I think it's Paris but maybe Berlin"

LLMs vary in how well they can predict appropriate completions:

Some are excellent at completing code but struggle with creative writing
Others excel at factual completions but generate messy code
Some give consistent completions but lack creativity
Others provide creative completions but can be unreliable

Note

Performance of a LLM depends on multiple factors. For example, the LLM architecture, the training data, data quality, etc.

Key Takeaway

Everything an LLM does is just text completion:

Questions → completing what an answer would look like
Conversations → completing what the next response would be
Code → completing what the solution would look like
Stories → completing what happens next

The magic (and limitation) is that it's ALL just predicting what text should come next based on the prompt you provide!

Understanding this fundamental mechanism helps you:

Write better prompts (they're the setup for completion)
Understand LLM behavior
Handle limitations
Use the LLMs more effectively by understanding what they really do