AI Credits in SQAI is a combination of input and output tokens per Large Language Model (LLM) that is being used by the SQAI services.
The chosen LLMs for your instance depends on several parameters and is being optimized by our orchestration.
For example:
In your region, only Anthropic Claude is available for the interations, but we can use Mistral as well for processing documents and create the knowledge. Our orchestration will be managed in the best way to get the best results our of the LLMs in your region.
Every input (documents, prompts,...) and output (script, cases,...) will be calculated per LLM and results in a cost that will be considered part of the AI credits in your package.
Let's have a look at the different possible calcultations:
Token Definition Per Model
1. OpenAI Models (GPT-3, GPT-3.5, GPT-4):
Token Definition: OpenAI models use Byte Pair Encoding (BPE) to break text into tokens. Each token can be a word, subword, or a part of a word. Common words may be a single token, while complex words or symbols may split into multiple tokens.
Average Tokens per Word: On average, 1 English word is about 1-2 tokens.
Context Limits:
GPT-3.5: 4,096 tokens
GPT-4 (8k): 8,192 tokens
GPT-4 (32k): 32,768 tokens
2. Anthropic Models (Claude):
Token Definition: Claude models, like GPT, use subword tokenization, where common words are fewer tokens, and uncommon words are split into smaller components.
Average Tokens per Word: Similar to GPT models.
Context Limits:
Claude offers up to 100k tokens in the largest context windows, making it ideal for very long documents or conversations.
3. Mistral Models:
Token Definition: Mistral also uses subword tokenization via BPE or similar tokenization methods, where words are split into common subwords and less frequent characters for efficient processing.
Average Tokens per Word: Roughly similar to GPT models, with 1-2 tokens per English word.
Context Limits: Mistral models often support around 4,096 tokens, with some models aiming for higher limits.
4. Cohere Models:
Token Definition: Cohere models tokenize text similarly, breaking words into subword units. Cohere's tokenization is optimized for different languages and use cases.
Average Tokens per Word: Similar to GPT and other large language models, 1-2 tokens per word.
Context Limits: Cohere typically supports between 2,048 and 4,096 tokens.
5. Meta (LLaMA):
Token Definition: LLaMA models use Byte Pair Encoding (BPE) like OpenAI, where text is split into subwords. This makes it efficient for longer text generation and language understanding.
Average Tokens per Word: Same, around 1-2 tokens per English word.
Context Limits: LLaMA-2 supports context lengths of up to 4,096 tokens, though future releases may support longer contexts.
Typical Token Pricing Examples Per Model
1. OpenAI Models:
GPT-3.5 (Turbo):
Input Tokens: ~$0.0015 per 1,000 tokens
Output Tokens: ~$0.002 per 1,000 tokens
GPT-4 (8k context):
Input Tokens: ~$0.03 per 1,000 tokens
Output Tokens: ~$0.06 per 1,000 tokens
GPT-4 (32k context):
Input Tokens: ~$0.06 per 1,000 tokens
Output Tokens: ~$0.12 per 1,000 tokens
Example:
Prompt: 500 tokens
Response: 1,500 tokens
Total tokens = 2,000 tokens
Cost for GPT-4 (8k context):
Input: 500 × $0.03/1,000 = $0.015
Output: 1,500 × $0.06/1,000 = $0.09
Total = $0.105
2. Anthropic Models (Claude):
Claude 1:
Input Tokens: ~$0.04 per 1,000 tokens
Output Tokens: ~$0.06 per 1,000 tokens
Claude 2 (larger model with higher token limit):
Input Tokens: ~$0.02-0.04 per 1,000 tokens
Output Tokens: ~$0.03-0.06 per 1,000 tokens
Example:
Prompt: 1,000 tokens
Response: 2,000 tokens
Total tokens = 3,000 tokens
Cost for Claude 1:
Input: 1,000 × $0.04/1,000 = $0.04
Output: 2,000 × $0.06/1,000 = $0.12
Total = $0.16
3. Mistral Models:
Mistral is typically open-source or available at competitive prices on cloud platforms.
Token Pricing: For custom deployments, pricing can vary based on infrastructure, but general estimates are:
~$0.002-0.003 per 1,000 tokens (input and output combined).
Example:
Prompt: 500 tokens
Response: 1,000 tokens
Total tokens = 1,500 tokens
Cost = 1,500 × $0.002/1,000 = $0.003
4. Cohere Models:
Cohere’s Large Models (similar to GPT-3):
Input Tokens: ~$0.002 per 1,000 tokens
Output Tokens: ~$0.002 per 1,000 tokens
Smaller models may have reduced pricing, closer to $0.0015 per 1,000 tokens.
Example:
Prompt: 600 tokens
Response: 1,200 tokens
Total tokens = 1,800 tokens
Cost: 1,800 × $0.002/1,000 = $0.0036
5. Meta (LLaMA):
LLaMA 2 (Open-Source):
Pricing is often lower or free when self-hosted, but cloud-hosted versions on platforms like Hugging Face or other cloud services may incur infrastructure fees.
Token Pricing on Cloud Platforms:
Typically ~$0.0015 to $0.0025 per 1,000 tokens, depending on the platform.
Example:
Prompt: 300 tokens
Response: 700 tokens
Total tokens = 1,000 tokens
Cost = 1,000 × $0.0015/1,000 = $0.0015
Summary Table of Token Pricing (Approximate):
Model | Input Tokens (per 1,000) | Output Tokens (per 1,000) | Context Limit (tokens) |
GPT-3.5 | $0.0015 | $0.002 | 4,096 |
GPT-4 (8k) | $0.03 | $0.06 | 8,192 |
GPT-4 (32k) | $0.06 | $0.12 | 32,768 |
Claude 1 | $0.04 | $0.06 | 100,000 |
Claude 2 | $0.02-0.04 | $0.03-0.06 | 100,000 |
Mistral | $0.002-0.003 | $0.002-0.003 | 4,096 |
Cohere | $0.002 | $0.002 | 2,048-4,096 |
LLaMA 2 | $0.0015-0.0025 | $0.0015-0.0025 | 4,096 |