21 Oct 2025

AI Glossary

Disclaimer: This glossary is for explanatory purposes only. Examples of AI systems, products or companies are included illustratively and do not imply endorsement or recommendation. The entries are designed to provide neutral definitions and context, not guidance on how or whether to use specific tools in editorial practice.

accountability: occurs when two parties are in a relationship where one is answerable for their actions to the other. Regulatory or social bodies can make AI designers and providers accountable for their actions. See also traceability.
In practice: accountability means there is a clear line of responsibility. For example, if an AI tool makes biased suggestions, the company that built it may need to explain why.

agent: a type of AI system that can carry out tasks using different tools and, if designed to have a memory, will retain or adapt to information about the user. Unlike chatbots, which respond mainly within a conversation, agents are designed to decide on and perform a sequence of steps toward a goal. They can use external tools or data, manage multi-step workflows and operate with a higher level of autonomy. Some are built into commercial platforms, while others are custom-made for in-house use only. Eg frameworks such as LangChain agents, or platform-integrated agents like Microsoft Copilot Researcher and similar task automation tools. See also chatbot and virtual assistant.
In practice: an agent might take a goal such as ‘book travel’ and break it into steps like finding flights, comparing prices and reserving tickets, using different tools along the way.

algorithm: a defined set of instructions that a computer system follows to transform its inputs into outputs.
In practice: spellcheck in Microsoft Word uses an algorithm to identify misspelt words and suggest corrections.

artificial general intelligence (AGI): a hypothetical type of AI system trained to perform or outperform humans on all kinds of human tasks (intellectual, physical and emotional). Related terms include strong AI, broad AI, general AI and general-purpose AI. Compare artificial narrow intelligence. See also large language model and foundation model.
In practice: AGI is what people mean when they talk about science-fiction AI that could match human intelligence in everything.

artificial intelligence (AI): (1) a broad field of scientific research, focussed on computerised systems which perform tasks associated with human cognitive abilities (making choices, recognising patterns, processing language, solving problems etc). AI systems can show some degree of adaptability (to new data or environments) and autonomy (ability to act or make decisions without explicit human programming) during their development and/or deployment; (2) such a computerised system. When used in this sense, 'AI' is often accompanied by the word 'system' to help avoid attribution of human characteristics. See also machine learning and generative AI.
In practice: a plagiarism checker that learns to identify possibly suspicious patterns is a form of AI, as is a medical imaging system that highlights possible tumours in X-rays.

artificial narrow intelligence (ANI): a type of AI system trained to perform one or a few specific tasks. Narrow AI systems typically cannot generalise beyond these tasks; in their specialised domain they tend to outperform more generalised AI systems. Also known as weak AI. Compare artificial general intelligence. See also small language model.
In practice: spam filters, predictive text or autocorrect are examples of narrow AI. They are very good at a single or very specific task or a narrow range of related tasks.

bias: typically, unfair treatment of an individual or group, often based on an attribute such as gender or ethnicity. A bias may be deliberate or unintentional. AI systems can reflect, reproduce or reinforce human biases when they are trained on data which is itself unrepresentative or imbalanced. AI systems can also introduce biases when their algorithms connect data in unanticipated ways. See also ethical AI.
In practice: if an AI writing tool always suggests male pronouns for doctors, it reflects bias in its training data.

chatbot (bot): conversational AI that can converse with human users in natural language inside a single app or chat. Chatbots react and respond to user prompts, which can include user-uploaded files, but do not typically act on files or systems outside the conversation. Chatbots are generally less complex than agents and often follow a scripted flow or single-task design, although some advanced ones can connect to tools for specific functions. Eg ChatGPT and Claude. See also virtual assistant and agent.
In practice: typing questions into tools like ChatGPT or Claude to get answers is an example of how a chatbot works.

confidentiality: shows respect for access or disclosure restrictions imposed on information, typically of a sensitive nature.
In practice: confidentiality depends on how a tool processes, stores and deletes submitted text, which may vary by provider.

context window: a finite quantity of previously seen information which an AI system uses as context for processing a new piece of information.
In practice: Chatbots can remember only a certain number of words from earlier in a conversation. As the context window is exceeded, older parts drop out.

corpus (corpora): a collection of information which has been assembled for training an AI system. A corpus typically contains material on topics which are relevant and representative of the data an AI system will work on in deployment. See also data, training data and dataset.
In practice: a language corpus might be built from novels, news articles and essays so an AI can learn to generate natural-sounding text.

data: any form of information which can be collected, processed and/or interpreted. Data may be structured (organised and consistently formatted) or unstructured, whether sourced from real life or synthetic. AI systems are typically trained on large amounts of data. See also corpus.
In practice: a tabulated list of customer addresses is structured data, while an unsorted collection of meeting notes or interview transcripts may be considered unstructured data.

dataset: a collection of related data, used for training, testing or validating an AI system. A corpus is a specific and more structured type of dataset. See also data, training data and corpus.
In practice: a dataset of academic articles might be used to train an AI that helps researchers with literature reviews.

deepfake: an image (still or moving) or audio piece which simulates real people and situations to show something which did not happen. See also disinformation, misinformation and hallucination.
In practice: a fabricated video of a politician saying something they never said is a deepfake.

disinformation: information which is false, inaccurate or manipulated and has been produced or shared with the deliberate intention to mislead. See also deepfake, misinformation and hallucination.
In practice: a deepfake leveraged to sway public opinion is an example of disinformation.

ethical AI: follows a framework of moral principles, societal norms and regulatory frameworks during the design, development and deployment of an AI system. See also responsible AI.
In practice: ethical AI might include designing a chatbot so it avoids generating harmful content, making sure users know when they are interacting with AI rather than a person, or training a tool on licensed and representative data rather than scraping unlicensed text.

explainability: the degree to which the external context and internal functionality of a system can be expressed and/or justified in a reasonably understandable way. Compare transparency.
In practice: explainability means being able to show how an AI decided on a suggestion, such as why it flagged a sentence as unclear.

fine-tuning: a technique in machine learning to re-train a large language model to perform a more context-specific task. See also large language model, foundation model, artificial narrow intelligence and small language model.
In practice: fine-tuning could involve retraining a model on thousands of documents edited to a specific style guide, so it learns to apply those conventions automatically. Lighter changes can be made in systems which apply preset instructions without changing the underlying model, such as OpenAI’s Custom GPTs, Anthropic’s Claude Artifacts and Microsoft’s Copilot Studio.

foundation model: a type of generative AI model trained on a large dataset which can perform a large number of tasks. Foundation models can be adapted to perform various more specialised tasks. See also fine-tuning, generative AI, large language model and artificial general intelligence.
In practice: some foundation models, such as GPT-4, have been adapted into products like Open AI’s ChatGPT and Microsoft Copilot.

generative AI (genAI): a subfield of machine learning concerned with systems that create new content (text, images, audio, code, video) based on patterns they learn in training data and prompts. The term is also used as an umbrella label for applications built on these models, such as chatbots, agents and virtual assistants. See also artificial intelligence, machine learning and large language model.
In practice: generative AI is what powers tools that can generate text, image, and audio, like OpenAI’s ChatGPT (text) and DALL-E (images) and ElevenLabs products for text-to-audio and music. 

guardrail: a restriction placed on an AI system to safeguard against potentially harmful or undesirable outputs.
In practice: a grammar-checking AI might have a guardrail that prevents it from generating offensive words or private data when asked for examples.

hallucination: an output from an AI system that appears plausible or true but is false or unsupported by the data it was trained on. Hallucinations can result from biased training data, incomplete training data or incorrect assumptions made by the system. See also disinformation, misinformation and deepfake.
In practice: an AI tool that generates a reference to a non-existent journal article is hallucinating.

human in the loop (HITL): processes by which humans critically monitor and review outputs to correct and improve an AI system. A related idea is Expert in the Lead (XITL), where the human is the expert and retains the final authority over the AI’s decisions. See also: ethical AI and accountability.
In practice: HITL means a human checks outputs; for example, a person reviewing AI suggestions. XITL is when the human remains the decision-maker; for example, approving or rejecting AI changes before they take effect.

large language model (LLM): a type of machine learning model trained on a vast text dataset and designed for tasks involving natural language. LLMs work using natural language processing, building associations between text snippets. Compare small language model. See also machine learning,  foundation model, natural language processing and training data.
In practice: well-known families of LLMs include OpenAI’s GPT models, Anthropic’s Claude, Google DeepMind’s Gemini, Meta’s Llama, Mistral’s Mixtral, Cohere’s Command R and xAI’s Grok.

machine learning (ML): a subfield of artificial intelligence, concerned with machines that learn patterns from training data to make predictions on new data, with minimal human intervention. Such machines can learn to improve their performance over time. See also artificial intelligence, generative AI and training data.
In practice: predictive text in email is powered by machine learning, as it improves based on the words people most often type.

misinformation: information which is false or inaccurate and has been produced or shared without deliberate intention. See also deepfake, disinformation and hallucination.
In practice: if an AI incorrectly states that a book was published in 2018 instead of 2015, that is misinformation.

natural language processing (NLP): a subfield of machine learning concerned with systems which analyse, interpret, synthesise and generate language in a way that is both meaningful and useful to humans. NLP requires machines to be trained on natural language training data. See also large language model and natural language query.
In practice: spellcheckers, translation software and chatbots all use NLP.

natural language query (NLQ): a type of prompt that allows users to express their information needs to an AI system in everyday conversational language. See also natural language processing and prompt.
In practice: asking ‘check this text for grammar’ in plain English is a natural language query.

prompt: an input or instruction provided to an AI system to generate an output or response. The format of a prompt can be natural language, code, visual or other. See also prompt engineering.
In practice: typing ‘summarise this article in 100 words’ into a chatbot is giving it a prompt. Asking a chatbot to write a block of code or giving an image for analysis are also prompts.

prompt engineering: the practice of crafting and refining prompts to get an AI system to consistently generate desired outputs. See also prompt.
In practice: specifying ‘generate a meal plan for someone who is vegetarian and gluten-free’ shows prompt engineering.

responsible AI: See ethical AI.

retrieval-augmented generation (RAG): a method where an AI combines its training data with up-to-date information retrieved from a database or the web before generating a response. This improves accuracy and reduces hallucinations. See also: training data, hallucination, generative AI and large language model.
In practice: some search-based tools, such as Perplexity AI or Copilot built into the Microsoft Edge browser, use RAG so that, alongside generated answers, they provide links or citations to the documents they retrieve.

small language model (SLM): a type of machine learning model specialised for natural language tasks, and trained on a small, more specialised dataset. Compare large language model. See also artificial narrow intelligence, fine-tuning and generative AI.
In practice: an SLM might be trained only on medical text so that it can work more efficiently in that domain.

tagging: annotating parts (or all) of a phrase with information such as part of speech, grammatical or semantic relationships, cultural views or other information. Tagging is often used to prepare training data for AI models.
In practice: in the sentence ‘She will run every morning’, tagging might look like <VERB>run</VERB> to show that ‘run’ is a verb, as opposed to the sentence ‘She will go for a run every morning’, where tagging might look like <NOUN>run</NOUN>. Editors may also see tagging when AI systems highlight words for grammar, style or sentiment.

token: a chunk of text, usually a word or part of a word, that an AI model processes as a unit. Tokens are how AI measures input length, memory and cost. See also: context window, training data and dataset.
In practice: the word ‘editing’ may be treated as a single token or split into two tokens: ‘edit’ and ‘ing’, depending on the model’s tokeniser. The sentence ‘Editing is essential for clarity.’ might be split into seven tokens: ‘Edit’, ‘ing’, ‘is’, ‘essential’, ‘for’, ‘clarity’, ‘.’ since words, spaces and punctuation are also counted.

traceability: the extent to which it is possible to transparently follow and monitor a system’s decision-making steps and processes. See also accountability and transparency.
In practice: some AI tools provide traceability by showing the sources or data they drew from.

training data: the information used to teach an AI system during its development. Training data is fed into machine learning models so that they can learn patterns and associations before being applied to new inputs. It can be structured or unstructured, and may include text, images, audio or other formats. The quality, representativeness and size of training data strongly affect how well an AI system performs. See also dataset, corpus and token.
In practice: a language model might be trained on millions of books, articles and websites as its training data, which it uses to learn how to generate coherent sentences.

transparency: appropriate, easily understandable disclosure of information about a model’s external context, such as how people’s data is processed or what assessments were undertaken to confirm compliance. Compare explainability.
In practice: a tool might include a note saying ‘this AI has been trained on licensed text data up to 2023’ so users understand its limits and data sources.

virtual assistant: a type of AI system that interacts with users and performs tasks in response to queries. Virtual assistants combine conversational ability with access to specific applications or services, so they can complete tasks beyond the chat itself. Eg Microsoft Copilot in Word/Excel, Siri with Apple Intelligence, Alexa+. See also chatbot and agent.
In practice: a virtual assistant might schedule a meeting, read out a definition or draft an email on request, depending on the application it is built into.