How are AI responses generated?
About the AI
You don’t need to understand this to use Questsmith.
This article is intended to be a technical explanation, and may be a bit confusing for most people. We put this here to help users understand Model Settings, experiment with the AI, and do more advanced Troubleshooting.
While understanding the underlying mechanics of large language models is not required to create narratives on Questsmith, this technical breakdown clarifies the sequence of events that occurs behind the scenes every time an action is initialized.
The transition from a user input to a fully realized, coherent story sequence involves five distinct computational phases:
The Five Phases of Generation
Payload Compilation
Your interface transmits your latest action to our secure servers. The system immediately aggregates this input with your active Plot Essentials, triggered Story Cards, memory blocks, and past narrative history to compile a single, unified data payload.
Tokenization
Large language models do not process raw text. A specialized program called a Tokenizer breaks your text down into numerical units called tokens. On average, one token equals roughly four characters. Your advanced configuration settings directly control these raw token limits.
Neural Network Processing
The numerical tokens are injected into a deep neural network matrix containing billions of statistical variables called weights. These weights store the linguistic patterns the AI learned from analyzing terabytes of public literature, books, and scripts. The network uses this data to predict what should happen next in your story.
Statistical Sampling
The AI does not just write a sentence; it calculates a massive probability distribution list of every possible next token. It selects tokens one by one using statistical equations. This sampling phase is heavily influenced by your custom adjustments:
- Temperature increases creativity or forces rigid logical consistency.
- Top K and Top P filter out highly unlikely words to keep the plot grounded.
Detokenization and Refinement
Once the generation loop finishes, the numerical tokens are translated back into human readable text. Our servers apply final post processing algorithms to ensure the output ends on a clean, completed sentence and cache alternative variations to power your next Retry options. The text then renders on your device, ready for your next move.