GoogleFastStandard

Gemini 3.1 Flash Lite Preview

Ultra-lightweight variant of Gemini 3.1 Flash. The most cost-effective Gemini model with support for cached input and audio input. Ideal for high-throughput, budget-conscious applications.

100 credits

per request

Most cost-effective Gemini model

1,048,576 token context window

65,536 max output tokens

Multimodal input: text, image, audio, video, PDF

Cached input token support

Function calling, structured outputs, thinking, search grounding, code execution

Run it right now

Test this model instantly in the Console Playground — no code required

Use with AI Assistant

Copy usage instructions for Claude, ChatGPT, or other AI

llms.txt

Model Specifications

Context Window

1.0M

tokens

Max Output

66K

tokens

Training Cutoff

January 2025

Compatible SDK

OpenAI, Google AI

Capabilities

Vision

Function Calling

Streaming

JSON Mode

System Prompt

Token Pricing (per 1M tokens)

Token Type	Credits	USD Equivalent
Input Tokens	250	$0.25
Output Tokens	1,500	$1.50

* 1 credit ≈ $0.001 (actual charges may vary based on usage)

Quick Start

curl -X POST "https://api.core.today/llm/gemini/v1beta/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gemini-3.1-flash-lite-preview",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following text as: spam, not_spam. Respond with only the label."
    },
    {
      "role": "user",
      "content": "Congratulations! You have been selected for a special prize. Click here to claim now!"
    }
  ],
  "max_tokens": 50,
  "temperature": 0
}'

Parameters

Parameter	Type	Required	Default	Description
`messages`	array	Yes	-	Array of message objects (OpenAI format). Supports text, image, audio, video, and PDF inputs.
`temperature`	float	No	1	Sampling temperature (0-2). Lower values produce more deterministic outputs.
`top_p`	float	No	0.95	Nucleus sampling parameter (0-1).
`max_tokens`	integer	No	-	Maximum output tokens. Max: 65,536. Context window (input + output): 1,048,576 tokens.
`stop`	string \| array	No	-	Up to 4 sequences where the model stops generating.
`response_format`	object	No	-	Output format constraint. Use `{ type: 'json_object' }` for structured JSON output.
`presence_penalty`	float	No	0	Penalty (-2.0 to 2.0) for repeating tokens already present in the text.
`frequency_penalty`	float	No	0	Penalty (-2.0 to 2.0) for tokens proportional to their frequency in the text.
`seed`	integer	No	-	Seed for deterministic sampling (best-effort).
`stream`	boolean	No	false	Enable Server-Sent Events streaming.

Examples

Quick Classification

Lightweight text classification with Flash Lite

curl -X POST "https://api.core.today/llm/gemini/v1beta/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gemini-3.1-flash-lite-preview",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following text as: spam, not_spam. Respond with only the label."
    },
    {
      "role": "user",
      "content": "Congratulations! You have been selected for a special prize. Click here to claim now!"
    }
  ],
  "max_tokens": 50,
  "temperature": 0
}'