OpenAIFastUltra

GPT-5.5

OpenAI's newest flagship model with a 1.05M token context window and 128K max output tokens. Supports cached inputs at 10× discount and improved reasoning, coding, and multimodal performance over the GPT-5.4 series.

5 credits

per request

1.05M token context window

128K max output tokens

Cached input pricing (10× cheaper)

Adjustable reasoning effort

Function calling, web search, file search, computer use

Native vision support

Run it right now

Test this model instantly in the Console Playground — no code required

Use with AI Assistant

Copy usage instructions for Claude, ChatGPT, or other AI

llms.txt

Model Specifications

Context Window

1.1M

tokens

Max Output

128K

tokens

Training Cutoff

2025-12

Compatible SDK

OpenAI

Capabilities

Vision

Function Calling

Streaming

JSON Mode

System Prompt

Token Pricing (per 1M tokens)

Token Type	Credits	USD Equivalent
Input Tokens	10,000	$10.00
Output Tokens	60,000	$60.00
Cached Tokens	1,000	$1.00

* 1 credit ≈ $0.001 (actual charges may vary based on usage)

Quick Start

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a research assistant. Cite sources by ID."
    },
    {
      "role": "user",
      "content": "Given the attached corpus of internal docs, summarize the key risks discussed and propose mitigations grouped by severity."
    }
  ],
  "reasoning_effort": "high",
  "max_completion_tokens": 8000
}'

Parameters

Parameter	Type	Required	Default	Description
`messages`	array	Yes	-	Array of message objects with role and content
`model`	string	Yes	gpt-5.5	Model identifier
`max_completion_tokens`	integer	No	4096	Maximum tokens in response (up to 128000). Note: use max_completion_tokens, not max_tokens
`reasoning_effort`	string	No	medium	Reasoning effort level: none, low, medium, high, or xhigh nonelowmediumhighxhigh
`temperature`	float	No	1.0	Sampling temperature (0-2)
`stream`	boolean	No	false	Enable Server-Sent Events streaming
`top_p`	float	No	1.0	Nucleus sampling threshold (0-1)

Examples

Long-context RAG

Process up to 1M tokens of context with GPT-5.5

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "You are a research assistant. Cite sources by ID."
    },
    {
      "role": "user",
      "content": "Given the attached corpus of internal docs, summarize the key risks discussed and propose mitigations grouped by severity."
    }
  ],
  "reasoning_effort": "high",
  "max_completion_tokens": 8000
}'

Cached Repeated Context

Reuse a large system prompt with 10× cached input pricing

curl -X POST "https://api.core.today/llm/openai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gpt-5.5",
  "messages": [
    {
      "role": "system",
      "content": "<large repeated system prompt or codebase context>"
    },
    {
      "role": "user",
      "content": "Refactor the auth middleware to use the new session API."
    }
  ],
  "temperature": 0.3,
  "max_completion_tokens": 4000
}'