Vocabulary

Agnaistic has several common terms which may not immediately be understood by new users. Below is a rough attempt to outline and define some of these terms.

Note if a term you’re curious about is not listed below, please reach out on our discord and we can update this guide.

Terminology Definition
Advanced Settings Could refer to the “Advanced” section of a character card, or the “Advanced” section in the preset of a chat. In the context of a character card, advanced settings let you override the system prompt, post-conversation history instruction, and other advanced features of the character. In the context of a tab in the preset for a chat, allows you to control a number of parameters that are sent to the LLM / Inference Engine / API of your choice.
Author’s Note See “Insert / Depth Prompt”
Character Book A memory book, specific to the character. Can contain things this character has experienced. Used across chats.
Character Cards The greeting, persona, persona schema, example dialog, picture, name, lore book (character book), and other information about a specific character. Character cards can be found around the internet on websitesl like chub, on reddit boards, and in discord servers. Often times, character cards are shared by embedding information within images and so if someone shares their character with you by sending you an image, it’s likely that image itself is the character card.
Character System Prompt The system prompt is the initial instruction sent to the language model as part of the prompt template. Often times, the system prompt is something like “Respond with the next message in a fictional role-play between and .” Overriding this system prompt allows you to make more advanced character cards.
Chat Embeddings A chat embedding is a previous chat message that has been “indexed” in a special “embeddings” database. When you load a chat in Agnaistic, the messages that load on the screen get embedded in a database within your browser. This database then allows Agnaistic to look up “similar” or “relevant” chat messages and add them (aka embed them) into your prompt. This semantic retrieval enables a sort of “long term memory” without wasting context and confusing the model with irrelevant chat messages. This is a great way to have messages from longer chats “remembered” by the character.
Chat History The previous messages you’ve sent to the character in the chat. Chat history fills the remaining context after all other context calculations are considered.
Context The tokens added to your prompt to the model. Think of this as information relevant to the current situation. This includes everything – the character description, the scene, previous chat messages, memory, etc. Anything that is not within the context of the current prompt is simply not considered by the LLM when generating the response. Responses are generated entirely based on the context.
Context Limit Due to limitations inherent in the underlying technology that powers language models, context is limited for a number of reasons. Some models are trained on specific context sizes and do not understand numbers outside that range. Other models have a dynamic context limit but as the context size increases the performance of the model (it’s ability to generate coherent replies and to access data within the context) is decreased. Most concerning is that higher context sizes require more physical hardware to process and thus ultimately cost more to use. This is why 4k models can be free but 8k models require a subscription. The larger the language model, the higher the cost of additional context – this is why the 70b model currently has a context limit of 6k. Context limits take into account not only your chat history, but also memory, prompt template, system prompt, character card, and also the new response tokens – the higher you set your max new tokens, the less context you’ll have available for memory. This can cause problems with larger character cards (i.e. > 500 tokens) on the free models where you only have 4000 tokens of context available.
Eta Eta (η) in regards to Mirostat acts as the learning rate for the algorithm. It determines how quickly Mirostat adapts the parameters (for example, “k” in top-k) decoding based on the current perplexity and the target value. A higher learning rate leads to faster adjustments but might be more prone to fluctuations, while a lower rate makes for smoother adaptation but potentially slower convergence towards the target perplexity.
Example Chats / Example Dialog The example chats, aka example dialog, provide guidance as to how the bot should respond. These are a strong indicator for chat styles and can be very helpful in getting specific types of responses. Going too heavy here will eat up your token count, however, and could reduce overall creativity. Putting these like:

: I say blah blah blah
: Oh, I love it when you say blah blah blah!

will let you group user messages with character responses. Separate distinct topics in example dialog with a blank line.
Frequency Penalty A penalty to apply to all repeated tokens. Increases as the number of repetitions increases. Where P’(t) is the new probability, and P(t) is the old probability of a token, N is the number of repetitions and α is the frequency penalty, P'(t) = P(t) - (α * N)
Greeting An option for the first message the bot sends when the conversation starts. This is also a good way to “set the stage” for the conversation. It can either serve as a literal “greeting” message, or it can provide backstory that gets injected into the context as part of the chat history. Some people make the greeting quite elaborate. You can also add multiple greetings and one will be randomly chosen when the conversation starts.
Group Chats In a chat with a character, if you go to the chat settings icon there you will see “participants”. You can then either add a temporary character (a character you create just for this chat), another character you’ve previously created, or even invite a user to chat with you.
Insert / Depth Prompt An instruction inserted into your prompt a certain distance above the last message in chat history. It has a softer affect than a UJB, but a stronger effect than a system prompt or scenario.
Insert Depth The number of messages, starting from the bottom message (i.e. the most recent). For example, if this is “1”, then the “Insert / Depth Prompt” will be inserted just above the most recent message in the chat history.
LLM / Models Large Language Model (LLM), or simply “Model”, is the mathematical tool used to generate text. Different models have different technical structures, use different numbers (aka weights), and interpret your prompt differently resulting in variations in the output generation. LLMs work on tokens and their output can be adjusted by altering the probability of the tokens via advanced settings (for example, temperature).
Library In the main navigation, there is a page called “Library”. In here, you can see all of your memory books, scenarios, prompt templates, and embeddings. The Library is the only place you can create scenarios with events.
Longterm Memory See “Chat Embeddings”
Lore book See “Memory Book”
Memory A section of the preset for a chat that allows configuration of various memory specific settings. “Memory” refers to a combination of Memory Book entries, Character Book entries, Chat Embeds (long term memory), and User Embeds. See also “Memory Context Limit”
Memory Book A set of manually added memory entries. Each memory entry has a set of keywords that the memory triggers on. Unlike chat embeds and user embeds, memory entries are only added to the prompt when their keywords are triggered. It’s best to keep memory entries short so that you can fit more memories in a smaller amount of context. Since memories are retrieved by keyword, keeping relevant details behind separate keywords allows you to maximize the value gain from the dynamic nature of these entries. Memories are given names, as are the memory books, and memories can be toggled on and off. A disabled memory is not included into the prompt.
Max Context Limit An advanced setting to control the number of tokens used in prompts to the model. See “Context Limit”
Max New Tokens The maximum number of tokens that the language model responds with. Max new tokens eat into your context limit just like other parts of your prompt template, so this is a useful consideration with longer character cards.
Memory Context Limit The number of tokens in context to use for adding entries from the memory book or character book. The memory context limit will be filled with the highest priority memory entries and those entries will be ordered by their weight.
Chat Embed Context Limit The number of tokens to reserve from the max context limit for Chat Embeddings. Chat embeddings are retrieved based on semantic relevance and injected into the prompt until this context limit is reached. See “Chat Embeddings”
User Embed Context Limit The number of tokens to reserve from the max context limit for User Embeddings. User embeddings are retrieved based on semantic relevance and injected into the prompt until this context limit is reached. See “User Embeddings”
Mirostat Mirostat is an algorithm for text generation that aims to create high-quality text by directly controlling its “surprisingness” or “interestingness.” This is measured by a metric called perplexity, which basically reflects how unexpected a sequence of words is. Mirostat adjusts several “creativity knobs” depending on the implementation, including Top K and possibly others.
Narrator Bots A character card designed to tell a story or role-play from the perspective of multiple characters. Best named something like “Narrator” or “Playmaster” or “Director”. Helpful to have an advanced system prompt that instructs the model to talk from the perspective of each of the characters in the story, listed out by name. Good to have a character book which has persona information about each character, and sufficient memory context limit for the character book entries to be inserted alongside other memories and embeddings. See the documentation on narrator bots and group chats for additional information.
Persona A prompt template tag. Inserts “personality” from the charcter card. See “Personality”
Persona Schema The format for the persona. Attributes, W++, and Plain Text are the most common. Attributes are simply “key: value1, value2, value3” pairings. Plain-text is more free-form and could be one or more paragraphs or could be a custom attribute format. W++ is a legacy format that is not recommended as it’s a strange combination of parenthesis and + signs with other notational components that eats into your context limit and does not always behave as expected.
Personality The way your character behaves. Influenced by many factors including your prompt template, chat history, embeddings, advanced settings, and character card (among other variables). See “persona” and the documentation on chat settings and character creation for more information.
Post-conversation History Instruction A section on the character card’s “advanced” tab. A sort of character-level UJB (See “User Jail Break (UJB)”). Conflicts with the UJB defined in the chat. If the character card’s post-conversation history instruction is set, it overrides the UJB in the chat unless the “override character UJB” switch is toggled.
Presence Penalty A flat penalty to apply to all repeated tokens. If P’(t) is the new probability and P(t) is the old probability and α is the presence penalty, P'(t) = P(t) - α where t is the token that has been repeated atleast once.
Preset A group of settings applied to the chat. Includes model selection, context limits, advanced settings, prompt templates, and more.
Prompt The message sent to the LLM. The prompt contains all of the context for the current conversation including your most recent chat message.
Prompt Template The structure of the message to send to the LLM. Several prompt template formats exist including Vicuna and Alpaca. Advanced prompt template features are implemented with placeholder tags like – these placeholder tags are replaced during prompt generation with relevant content from the message, memory books, embeddings, character, and elsewhere.
Vicuna Useful for the 70b model. Vicuna is the base model for the 70b that follows a specific prompt format which looks like:

USER: [the message from the user, or the instruction the model should follow]
ASSISTANT:

Note that the Assistant response is left blank to tell the LLM where it should continue the conversation.
Alpaca Useful for the 7b and 13b models. Alpaca is a base model that follows a specific prompt format which looks like:

### Instruction:
{instruction}

### Input:
{user}: {message}

### Response:

Note that the response is left blank to tell the LLM where it should continue the conversation.
Repetition Penalty An exponential penalty which increases rapidly while factoring in both range and slope. If P’(t) is the new probability and P(t) is the old probability, N is the number of repetitions and α is the repetition penalty, then P'(t) = P(t) * exp(-α * N) – as N increases, exp(-a * N) gets rapidly smaller and thus P’(t) decreases as well. The number of repetitions are only counted over the range.
Repetition Penalty Range The number of tokens to look at, starting from the bottom of the context and counting backwards, when calculating repetition. A higher range can result in higher repetition penalties for commonly used tokens.
Repetition Penalty Slope What percentage of context receives full repetition penalties. When set to 0, penalties are applied uniformly to all tokens. When slope is between 0 and 1, the tokens closest to the end of the context get full penalties while the penalty drop-off decreases as you move farther back in context. A smaller slope means a slower drop-off in penalties. A higher slope will weight the more recent tokens considerably higher with a sharper drop-off as you go back farther in context.
Repetition Range See “Repetition Penalty Range”
Scenario Global scenarios are different from charcter scenarios in that the global scenarios can have prompt text, user instructions, scenario events, and state. Chats can move through state in various ways, and triggered events are linked to various states as defined in the scenario.
Character Scenario Character scenarios are simplified scenarios with only the scenario text. This helps “set the stage” for a conversation. Global scenarios assigned to a chat can optionally override character scenarios.
System Prompt Typically the first message sent as part of your prompt template. This is usually a bare instruction that tells the LLM how to interpret the rest of the context. In a role-play chat, it is typical to see a system prompt like “Respond with the next message in a fictional role-play between and . Speak only as and only describe what is doing, seeing, saying, etc. Respond only once and respond with extremely detailed and extremly long replies.”
Tau Tau (τ) in Mirostat serves as the target perplexity level. It essentially defines the desired level of “surprisingness” or “interestingness” you want the generated text to have. Mirostat dynamically adjusts the text generation process to bring the actual perplexity as close to this target value as possible.
Temperature Essentially, the temperature changes the shape of the probability distribution. As temperature increases, differences in probability are reduced, resulting in more “random” output from the model. This manifests in a LLM as more “creative” output. Conversely, a lower temperature makes the output more deterministic. Too much creativity will have the model talking nonsense, going off script, and responding with strange symbols and gibberish.
Tokens A token is the fundamental unit of text that a model processes and generates. It’s a way of breaking down natural language into manageable pieces that the model can understand and manipulate. While tokens can be individual words, they can also represent: Subwords (e.g., “play” being split into “play” and “ing”), Characters, Punctuation marks, Special symbols (e.g., “[CLS]” for classification tasks), and Numbers. The process of dividing text into tokens is called tokenization. In summary, tokens are the building blocks that language models use to understand and process language. By breaking down text into tokens, models can effectively analyze, generate, and manipulate language for various tasks.
Top K A k-value of 40 or 50 is common. Top-K sampling works like this: 1. Order the tokens in descending order of probability. 2. Select the first K tokens to create a new distribution. 3. Sample from those tokens. – increasing top k gives you more tokens to sample from and can help with repetition and creativity.
Tail-Free Sampling / TFS A more advanced alternative way of sampling using derivatives and complex math as opposed to simple probability thresholds. Tail-free sampling transcends traditional sampling methods by directly targeting rare events. It bypasses explicit probability calculations and instead leverages the second derivative of the distribution. This “surprise” metric identifies tokens that induce the sharpest curvature shifts, leading to the discovery of unexpected yet highly relevant elements within the long tail. This curvature-based approach fosters novelty and diversity in outputs, leading to less predictable and more engaging outcomes across various applications. For a better understanding of TFS, see https://www.trentonbricken.com/Tail-Free-Sampling
Top A This setting allows for a more flexible version of sampling, where the number of words chosen from the most likely options is automatically determined based on the likelihood distribution of the options, but instead of choosing the top P or K words, it chooses all words with probabilities above a certain threshold. Not all models support Top A, and it is highly experimental. Recommended to disable.
Top P / Nucleus sampling This strategy (also called nucleus sampling) is similar to Top-K, but instead of picking a certain number of tokens, we select enough tokens to “cover” a certain amount of probability defined by the parameter p in the following manner: 1. Order the tokens in descending order of probability. 2. Select the smallest number of top tokens such that their cumulative probability is at least p. 3. Sample from those tokens.
Min p The minimum probability that a token must have to be considered. Tokens with probabilities less than min-p are filtered out.
User Jail Break / Ultimate Jailbreak / UJB Usually one of the last bits of information included in your prompt. Given its location at the bottom of the default prompt template, the UJB is a heavily weighted instruction. It can be used to completely change the response of the model, and is often used to add special adjectives, command additional details, or trigger specific behaviors. The UJB is located in the Chat Settings -> Preset -> Prompt panel. If “Override Character Jailbreak” is not enabled, then any “Post Conversation History Instruction” included in the character card will override the UJB
User Embeddings Similar to chat embeddings, user embeddings are semantically retrieved groups of text. They’re inserted dynamically into the context. User embeddings are added like memories – either through the library or in the Memory setting in chat. By providing a URL or a supported embeddable file, the contents are parsed and split on new-lines. These individual chunks are then loaded into the embedding engine and used to fill the assigned User Embed Context Limit portion of your prompt template. Note that User Embeds may not be present in your prompt template, so if you’re trying to use user embeds, make sure there’s a placeholder for them in your prompt template.