Skip to main content
Core.Today
|
GoogleFastStandard

Gemini 3.1 Flash Lite Preview

Ultra-lightweight variant of Gemini 3.1 Flash. The most cost-effective Gemini model with support for cached input and audio input. Ideal for high-throughput, budget-conscious applications.

100 credits
per request
Most cost-effective Gemini model
1,048,576 token context window
65,536 max output tokens
Multimodal input: text, image, audio, video, PDF
Cached input token support
Function calling, structured outputs, thinking, search grounding, code execution

Run it right now

Test this model instantly in the Console Playground โ€” no code required

Sign in to try

Use with AI Assistant

Copy usage instructions for Claude, ChatGPT, or other AI

llms.txt

Model Specifications

Context Window
1.0M
tokens
Max Output
66K
tokens
Training Cutoff
January 2025
Compatible SDK
OpenAI, Google AI

Capabilities

Vision
Function Calling
Streaming
JSON Mode
System Prompt

Token Pricing (per 1M tokens)

Token TypeCreditsUSD Equivalent
Input Tokens250$0.25
Output Tokens1,500$1.50

* 1 credit โ‰ˆ $0.001 (actual charges may vary based on usage)

Quick Start

curl -X POST "https://api.core.today/llm/gemini/v1beta/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gemini-3.1-flash-lite-preview",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following text as: spam, not_spam. Respond with only the label."
    },
    {
      "role": "user",
      "content": "Congratulations! You have been selected for a special prize. Click here to claim now!"
    }
  ],
  "max_tokens": 50,
  "temperature": 0
}'

Parameters

ParameterTypeRequiredDefaultDescription
messagesarrayYes-Array of message objects (OpenAI format). Supports text, image, audio, video, and PDF inputs.
temperaturefloatNo1Sampling temperature (0-2). Lower values produce more deterministic outputs.
top_pfloatNo0.95Nucleus sampling parameter (0-1).
max_tokensintegerNo-Maximum output tokens. Max: 65,536. Context window (input + output): 1,048,576 tokens.
stopstring | arrayNo-Up to 4 sequences where the model stops generating.
response_formatobjectNo-Output format constraint. Use `{ type: 'json_object' }` for structured JSON output.
presence_penaltyfloatNo0Penalty (-2.0 to 2.0) for repeating tokens already present in the text.
frequency_penaltyfloatNo0Penalty (-2.0 to 2.0) for tokens proportional to their frequency in the text.
seedintegerNo-Seed for deterministic sampling (best-effort).
streambooleanNofalseEnable Server-Sent Events streaming.

Examples

Quick Classification

Lightweight text classification with Flash Lite

curl -X POST "https://api.core.today/llm/gemini/v1beta/openai/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer cdt_your_api_key" \
  -d '{
  "model": "gemini-3.1-flash-lite-preview",
  "messages": [
    {
      "role": "system",
      "content": "Classify the following text as: spam, not_spam. Respond with only the label."
    },
    {
      "role": "user",
      "content": "Congratulations! You have been selected for a special prize. Click here to claim now!"
    }
  ],
  "max_tokens": 50,
  "temperature": 0
}'

Tips & Best Practices

1Most affordable Gemini model at $0.25/$1.50 per M tokens
2Max output tokens: 65,536 โ€” set max_tokens up to this limit
3Context window 1,048,576 tokens โ€” leave room for output when packing inputs
4Use cached input tokens for repeated context to further reduce costs
5Ideal for high-volume classification and routing tasks
6Supports audio input for voice-based applications

Use Cases

High-volume text processing
Real-time chat applications
Quick classification and routing
Lightweight data extraction
Audio transcription and understanding

Model Info

ProviderGoogle
Version3.1-preview
CategoryLLM
Price100 credits

API Endpoint

POST /llm/gemini/v1beta/openai/chat/completions
Try in PlaygroundBack to Docs