Skip to main content
OpenAIFastUltra

GPT-5.5

OpenAI's newest flagship model with a 1.05M token context window and 128K max output tokens. Supports cached inputs at 10ร— discount and improved reasoning, coding, and multimodal performance over the GPT-5.4 series.

5 credits
per request
1.05M token context window
128K max output tokens
Cached input pricing (10ร— cheaper)
Adjustable reasoning effort
Function calling, web search, file search, computer use
Native vision support

Run it right now

Test this model instantly in the Console Playground โ€” no code required

Sign in to try

Use with AI Assistant

Copy usage instructions for Claude, ChatGPT, or other AI

llms.txt

Model Specifications

Context Window
1.1M
tokens
Max Output
128K
tokens
Training Cutoff
2025-12
Compatible SDK
OpenAI

Capabilities

Vision
Function Calling
Streaming
JSON Mode
System Prompt

Token Pricing (per 1M tokens)

Token TypeCreditsUSD Equivalent
Input Tokens10,000$10.00
Output Tokens60,000$60.00
Cached Tokens1,000$1.00

* 1 credit โ‰ˆ $0.001 (actual charges may vary based on usage)

Quick Start

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a research assistant. Cite sources by ID."
    },
    {
      "role": "user",
      "content": "Given the attached corpus of internal docs, summarize the key risks discussed and propose mitigations grouped by severity."
    }
  ],
  "reasoning_effort": "high",
  "max_completion_tokens": 8000
}'

Parameters

ParameterTypeRequiredDefaultDescription
messagesarrayYes-Array of message objects with role and content
modelstringYesgpt-5.5Model identifier
max_completion_tokensintegerNo4096Maximum tokens in response (up to 128000). Note: use max_completion_tokens, not max_tokens
reasoning_effortstringNomediumReasoning effort level: none, low, medium, high, or xhigh
nonelowmediumhighxhigh
temperaturefloatNo1.0Sampling temperature (0-2)
streambooleanNofalseEnable Server-Sent Events streaming
top_pfloatNo1.0Nucleus sampling threshold (0-1)

Examples

Long-context RAG

Process up to 1M tokens of context with GPT-5.5

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a research assistant. Cite sources by ID."
    },
    {
      "role": "user",
      "content": "Given the attached corpus of internal docs, summarize the key risks discussed and propose mitigations grouped by severity."
    }
  ],
  "reasoning_effort": "high",
  "max_completion_tokens": 8000
}'

Cached Repeated Context

Reuse a large system prompt with 10ร— cached input pricing

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "<large repeated system prompt or codebase context>"
    },
    {
      "role": "user",
      "content": "Refactor the auth middleware to use the new session API."
    }
  ],
  "temperature": 0.3,
  "max_completion_tokens": 4000
}'

Tips & Best Practices

1Use cached inputs ($0.50/M) for repeated system prompts and RAG context
21.05M context โ€” chunk only when content exceeds the window
3reasoning_effort 'high' or 'xhigh' for the most complex tasks
4128K output enables single-shot long-form generation
5Lower temperature (0.2-0.5) for coding and analytical tasks

Use Cases

Frontier reasoning and analysis
Long document processing up to 1M tokens
Advanced agentic workflows with tool use
Repeated context with cached inputs (RAG, codebases)
Enterprise-grade content generation