Context Window

AI & LLM Testing

// Definition

The maximum number of tokens (roughly ¾ of a word each) an LLM can consider in a single inference call — the total of the system prompt, conversation history, retrieved documents, and the model's own generated output. When input exceeds the window, tokens are truncated (typically from the middle or start), which can silently drop instructions or facts. QA implications: test behaviour at high token counts near the window limit, verify the application chunks or summarises long inputs rather than silently truncating, and confirm truncation does not cause the model to discard critical system-level instructions.

// Related terms