Text Completion
Understanding How LLMs Actually Work
AI models generally accepts a input and returns a output. Type of the input and output will be different for different models. For example, a image recognition model will take an image as input and return a label number as output. A image generation model will take a text prompt as input and return an image as output. In Large Language Models, the input is a 'text' and the output is also a 'text'.
Generate the next word
Imagine you're sitting with a friend who has memorized every book in the world. You give them some text (we call this the prompt
), and they try to predict what comes next (we call this the completion
).
Here's the fundamental truth: ALL an LLM does is look at the prompt and generate the next word. That's it! Let's see this prediction game in action:
Prompt: Students opened their
Likely completions: book
, laptop
, notepad
Repeated Generation
For real life use cases, generating just one word is not enough. So we need to make the LLM generate multiple words. After the LLM generates the first word, we append the first word to the end of the prompt and generate the next word. We repeat this process multiple times to generate a stream of text.
Initial Prompt: "Students opened their"
LLM generates: "book" (high probability)
New prompt becomes: "Students opened their book"
LLM generates: "and" (medium probability)
New prompt becomes: "Students opened their book and"
LLM generates: "began" (medium probability)
And so on...
This is crucial to understand: The LLM isn't "thinking" about students or books. It's just looking at patterns: After the sequence "The student opened their", the word "book" appeared frequently in its training data. Even complex tasks are just text completion.
Tip
It's ALL just the LLM predicting what word typically follows the given text
Stopping the Generation
LLMs need to know when to stop generating text. This happens in two ways:
-
Built-in Stop Token: When the LLM is trained, it is trained to generate a end token when it needs to complete a text.
Human: Tell me a joke Assistant: Why did the chicken cross the road? To get to the other side! <end_of_response> # LLM knows to stop here
-
User-Defined Stop Words: LLMs have multiple generation settings that can be configured. One of them is
stop_words
. When the LLM generates a text, it will stop generating text when it encounters a stop word.Prompt: "Write product names until you see 'STOP':" Output: "SuperPhone MegaLaptop UltraWatch STOP" # Generation halts here
In the repeated generation process, if we encounter either end token or one of the stop words, we stop the generation. This is crucial because without stop conditions, the LLM would keep generating forever, trying to predict the next most likely word!
Temperature Setting
Temperature is one of the most important LLM generation setting. Temperature controls the randomness of the generated text. Lower temperature means more deterministic and higher temperature means more random.
Randomness Dial
Think of temperature like this:
Low Temperature (0.1):
"The cat sat on the"
Almost always completes as: "mat"
Because it always picks the most probable next word
High Temperature (0.8):
"The cat sat on the"
Might complete as:
- "windowsill" (less common but plausible)
- "keyboard" (getting creative)
- "spaceship" (getting wild)
Multiple Attempts
Here's where it gets interesting. Let's try the same prompt multiple times with different temperatures:
Temperature = 0.1 (3 tries):
Prompt: "Write a slogan for a tech startup"
Try 1: "Innovation for the future"
Try 2: "Innovation for the future"
Try 3: "Innovation for the future"
(Boring but consistent)
Temperature = 0.7 (3 tries):
Prompt: "Write a slogan for a tech startup"
Try 1: "Disrupting tomorrow, today"
Try 2: "Code your dreams into reality"
Try 3: "Where bytes meet brilliance"
(More creative, varied results)
Temperature = 1.0 (3 tries):
Prompt: "Write a slogan for a tech startup"
Try 1: "Quantum puppies in the metaverse"
Try 2: "Dancing with digital dragons"
Try 3: "Cybernetic sunflowers bloom at midnight"
(Wild and unpredictable)
Tip
Picking the right temperature according to the task is crucial. For example, if we want to generate a creative text for a marketing campaign, we should use a higher temperature. If we want to generate a deterministic text for a question answering task, we should use a lower temperature.
Performance Differences
Different LLMs = Different Completion Qualities
Just like different students might complete the sentence "The capital of France is __" differently:
- A good student: "Paris"
- A confused student: "London"
- An uncertain student: "I think it's Paris but maybe Berlin"
LLMs vary in how well they can predict appropriate completions:
- Some are excellent at completing code but struggle with creative writing
- Others excel at factual completions but generate messy code
- Some give consistent completions but lack creativity
- Others provide creative completions but can be unreliable
Note
Performance of a LLM depends on multiple factors. For example, the LLM architecture, the training data, data quality, etc.
Key Takeaway
Everything an LLM does is just text completion:
- Questions → completing what an answer would look like
- Conversations → completing what the next response would be
- Code → completing what the solution would look like
- Stories → completing what happens next
The magic (and limitation) is that it's ALL just predicting what text should come next based on the prompt you provide!
Understanding this fundamental mechanism helps you:
- Write better prompts (they're the setup for completion)
- Understand LLM behavior
- Handle limitations
- Use the LLMs more effectively by understanding what they really do