Configuration

gpt2giga reads two groups of settings:

proxy settings with the GPT2GIGA_ prefix;
GigaChat SDK settings with the GIGACHAT_ prefix.

Keep secrets in .env, environment variables, or a secrets manager. Do not pass credentials via CLI flags in production: process arguments can be visible through ps.

Settings sources

The CLI accepts an explicit env file:

gpt2giga --env-path .env

You can pass structured CLI flags:

gpt2giga \
  --proxy.host 127.0.0.1 \
  --proxy.port 8090 \
  --proxy.pass-model true \
  --gigachat.model GigaChat-2-Max

Full CLI reference:

gpt2giga --help

An env template to copy: .env.example.

How to read this document

.env.example is intentionally a copy-paste template: it is convenient to see all the keys side by side, but the trade-offs are not always visible. This document is a reference for the meaning of the settings and safe combinations.

A practical rule:

for a local run, start with the minimal block of GPT2GIGA_* and GIGACHAT_*;
for production, first configure security, then the backend mode, then optional traffic logs, metrics, and observability;
enable experimental/admin/debug flags only when you understand who has access to them and where payloads will be stored.

Quick profiles

A minimal local .env:

GPT2GIGA_MODE=DEV
GPT2GIGA_HOST=0.0.0.0
GPT2GIGA_PORT=8090
GPT2GIGA_ENABLE_API_KEY_AUTH=True
GPT2GIGA_API_KEY="<local-proxy-api-key>"

GIGACHAT_CREDENTIALS="<your-gigachat-credentials>"
GIGACHAT_SCOPE=GIGACHAT_API_PERS
GIGACHAT_MODEL=GigaChat-2-Max
GIGACHAT_VERIFY_SSL_CERTS=True

A minimum for production:

GPT2GIGA_MODE=PROD
GPT2GIGA_HOST=0.0.0.0
GPT2GIGA_PORT=8090
GPT2GIGA_ENABLE_API_KEY_AUTH=True
GPT2GIGA_API_KEY="<strong-random-secret>"
GPT2GIGA_CORS_ALLOW_ORIGINS='["https://your-app.example.com"]'
GPT2GIGA_LOG_LEVEL=INFO
GPT2GIGA_LOG_REDACT_SENSITIVE=True

GIGACHAT_CREDENTIALS="<your-gigachat-credentials>"
GIGACHAT_SCOPE=GIGACHAT_API_PERS
GIGACHAT_MODEL=GigaChat-2-Max
GIGACHAT_VERIFY_SSL_CERTS=True

Explicit selection of GigaChat v2 on the client side:

GPT2GIGA_GIGACHAT_API_MODE=v1

With this value, root routes without /v1 or /v2 stay on v1, while a client can choose v2 via base_url="http://localhost:8090/v2". /v1 always forces the GigaChat v1 contract, /v2 the GigaChat v2 contract. If you want root routes to also use v2, set GPT2GIGA_GIGACHAT_API_MODE=v2.

Value format

Pydantic Settings reads environment variable names case-insensitively, but the documentation and examples use uppercase.

Specify lists and dictionaries as JSON strings:

GPT2GIGA_CORS_ALLOW_ORIGINS='["https://app.example.com"]'
GPT2GIGA_MODEL_MAX_CONNECTIONS='{"GigaChat-2-Max":5}'
GPT2GIGA_TRAFFIC_LOG_REDACT_EXTRA_KEYS='["customer_token"]'

GPT2GIGA_TRAFFIC_LOG_SINKS accepts a JSON array or a comma-separated list:

GPT2GIGA_TRAFFIC_LOG_SINKS=postgres,opensearch

CLI flags are convenient for non-secret local overrides. For secrets, use .env, environment variables, or a secrets manager, because CLI arguments can be visible to other processes.

Core proxy settings

Variable	Default	Purpose
`GPT2GIGA_MODE`	`DEV`	`DEV` or `PROD`. `PROD` disables `/docs`, `/redoc`, `/openapi.json`, and `/logs*`.
`GPT2GIGA_HOST`	`localhost`	Local server host.
`GPT2GIGA_PORT`	`8090`	Local server port.
`GPT2GIGA_USE_HTTPS`	`False`	Built-in HTTPS. For production, TLS at a reverse proxy is usually better.
`GPT2GIGA_HTTPS_KEY_FILE` / `GPT2GIGA_HTTPS_CERT_FILE`	empty	Local key/cert files for built-in HTTPS.
`GPT2GIGA_ENABLE_API_KEY_AUTH`	`False`	Require proxy API-key authentication for public API routes. Mandatory in `PROD`.
`GPT2GIGA_API_KEY`	empty	Proxy API key. For shared environments, use a strong random value.
`GPT2GIGA_PASS_MODEL`	`True`	Pass the `model` from the request to GigaChat. Set `False` to always use the configured GigaChat model.
`GPT2GIGA_PASS_TOKEN`	`False`	Parse the client `Authorization` as GigaChat credentials for per-request upstream authorization.
`GPT2GIGA_PASS_TOKEN_CLIENT_CACHE_SIZE`	`32`	Maximum idle credential-specific clients retained for connection reuse.
`GPT2GIGA_EMBEDDINGS`	`EmbeddingsGigaR`	Default embeddings model when the model from the request is not used.
`GPT2GIGA_ENABLE_IMAGES`	`True`	Accept and translate supported image inputs. Disable when the deployment must reject image processing.
`GPT2GIGA_DEFAULT_MAX_TOKENS`	empty	Add this positive `max_tokens` value only when the client did not provide `max_tokens`, `max_completion_tokens`, or `max_output_tokens`. Empty means no injected limit.
`GPT2GIGA_MAX_REQUEST_BODY_BYTES`	`10485760`	Maximum HTTP request body size.
`GPT2GIGA_LOG_LEVEL`	`INFO`	Runtime log level. Avoid `DEBUG` in production.
`GPT2GIGA_LOG_FILENAME`	`gpt2giga.log`	Runtime log file.
`GPT2GIGA_LOG_MAX_SIZE`	`10485760`	Maximum log file size before rotation.
`GPT2GIGA_LOG_REDACT_SENSITIVE`	`True`	Redact secrets in runtime logs.

Authentication and security

The proxy API key protects public API routes (/chat/completions, /responses, /messages, /models, /embeddings, /model/info, versioned variants). Clients can pass the key in two ways:

Authorization: Bearer <GPT2GIGA_API_KEY>
x-api-key: <GPT2GIGA_API_KEY>

MODE=PROD requires a configured API key and disables the interactive docs and log routes. Admin/debug endpoints use a separate GPT2GIGA_ADMIN_API_KEY.

GPT2GIGA_PASS_TOKEN=True is needed only for scenarios where each client must pass its own GigaChat credentials. The following prefixes are supported in Authorization:

giga-cred-<credentials>:<scope> for GigaChat authorization key credentials;
giga-auth-<access_token> for a ready access token;
giga-user-<user>:<password> for username/password authorization.

For a regular deployment, it is simpler and safer to keep upstream credentials on the server via GIGACHAT_*. When pass-through is enabled, clients are pooled by credential and closed on least-recently-used eviction or shutdown; active streaming requests are never evicted.

GigaChat settings

Common upstream settings:

Variable	Default	Purpose
`GIGACHAT_CREDENTIALS`	empty	Authorization key credentials.
`GIGACHAT_SCOPE`	SDK default	GigaChat API scope, for example `GIGACHAT_API_PERS`.
`GIGACHAT_USER` / `GIGACHAT_PASSWORD`	empty	Alternative username/password authorization.
`GIGACHAT_ACCESS_TOKEN`	empty	Alternative authorization via a ready access token.
`GIGACHAT_MODEL`	SDK default	Default model when the proxy does not pass the client model or `GPT2GIGA_PASS_MODEL=False`.
`GIGACHAT_PROFANITY_CHECK`	SDK default	Upstream profanity check flag.
`GIGACHAT_VERIFY_SSL_CERTS`	SDK default	Keep `True` in production.
`GIGACHAT_TIMEOUT`	SDK default	Upstream request timeout.
`GIGACHAT_MAX_CONNECTIONS`	SDK default	Global SDK/HTTP connection cap.
`GIGACHAT_MAX_RETRIES`	SDK default	SDK retry count for transient upstream errors.

GigaChat also supports TLS client certificate settings: GIGACHAT_CA_BUNDLE_FILE, GIGACHAT_CERT_FILE, GIGACHAT_KEY_FILE, GIGACHAT_KEY_FILE_PASSWORD.

GPT2GIGA_PASS_MODEL=False is often useful for OpenAI-compatible clients that send a model name not from GigaChat. Then the upstream model is taken from GIGACHAT_MODEL.

Reasoning and structured output

Reasoning:

GPT2GIGA_ENABLE_REASONING=False
GPT2GIGA_DISABLE_REASONING=False

GPT2GIGA_ENABLE_REASONING=True adds reasoning_effort="high" if the client did not pass an explicit reasoning setting.
GPT2GIGA_DISABLE_REASONING=True removes reasoning and reasoning_effort, including explicit client fields and extra_body passthrough.

Structured output:

GPT2GIGA_STRUCTURED_OUTPUT_MODE=function_call

Values:

function_call: a compatibility fallback through function calling;
native: passes the JSON Schema through GigaChat response_format if the model/API supports it.

Both modes are designed for schema-based structured output. OpenAI response_format.type=json_object and Gemini responseMimeType=application/json without responseJsonSchema / responseSchema are not proxied to GigaChat, because the upstream does not support a separate schema-less JSON mode.

Backend API mode

GPT2GIGA_GIGACHAT_API_MODE=v1

`GPT2GIGA_GIGACHAT_API_MODE`	`/chat/completions` backend	`/responses` backend
`v1`	`v1`	`v1`
`v2`	`v2`	`v2`

Root URLs (/chat/completions, /responses, /messages) without /v1 or /v2 use this flag. Versioned prefixes are an explicit per-request override:

/v1/chat/completions, /v1/responses, /v1/messages use the GigaChat v1 contract;
/v2/chat/completions, /v2/responses, /v2/messages use the GigaChat v2 contract.

Built-in tool mapping can be disabled separately:

GPT2GIGA_DISABLE_BUILTIN_TOOL_MAPPING=False

When set to True, OpenAI/Anthropic/Gemini provider built-in tools (web_search*, code_execution*, urlContext, and similar) are not mapped to GigaChat v2 built-in tools and are ignored. User function/local tools remain enabled.

Normalized layer flags

Experimental flags control the OpenAI Chat Completions normalized path and by default keep the legacy behavior for this route:

GPT2GIGA_EXPERIMENTAL_NORMALIZED_LAYER=False
GPT2GIGA_NORMALIZATION_MODE=off
GPT2GIGA_LEGACY_CHAT_FALLBACK=True

off: OpenAI Chat Completions goes through the legacy path;
shadow: builds normalized diagnostics alongside the legacy OpenAI Chat handling without changing client responses;
on: switches OpenAI Chat to the normalized path, with a legacy fallback before the response starts, if the fallback is enabled.

Gemini GenerateContent uses its own dedicated Gemini-to-normalized adapter and GigaChat provider path independently of these OpenAI Chat flags. OpenAI Responses and Anthropic Messages stay on the legacy execution paths, but use a normalized representation for observability/debug helpers where possible.

A detailed description of the models and current execution paths: Normalized messages architecture.

Conversation stitching

Conversation stitching is opt-in in-memory state for stateless chat-like clients that pass a stable conversation identifier. It is disabled by default and does not affect compatibility. OpenAI Chat Completions, Anthropic Messages, and Gemini GenerateContent are supported. The conversation identifier is taken from conversation, metadata.conversation_id, x-gpt2giga-conversation-id or, if enabled, x-session-id.

GPT2GIGA_CONVERSATION_STITCHING_ENABLED=False
GPT2GIGA_CONVERSATION_TTL_SECONDS=3600
GPT2GIGA_CONVERSATION_MAX_MESSAGES=40
GPT2GIGA_CONVERSATION_USE_SESSION_ID=False
GPT2GIGA_CONVERSATION_ON_DIVERGENCE=client_wins

Variable	Default	Purpose
`GPT2GIGA_CONVERSATION_STITCHING_ENABLED`	`False`	Enable local stitching state.
`GPT2GIGA_CONVERSATION_TTL_SECONDS`	`3600`	How long to keep idle conversation state.
`GPT2GIGA_CONVERSATION_MAX_MESSAGES`	`40`	Maximum retained messages sent upstream.
`GPT2GIGA_CONVERSATION_USE_SESSION_ID`	`False`	Allow `x-session-id` as a conversation key when there is no explicit key.
`GPT2GIGA_CONVERSATION_ON_DIVERGENCE`	`client_wins`	`client_wins` replaces the state after success, `fork` creates a revision-suffixed branch.

The state is kept in the process memory. With several workers/pods, use sticky routing or do not enable stitching.

Per-model concurrency

GIGACHAT_MAX_CONNECTIONS is a global SDK/HTTP cap. The proxy can also limit concurrent upstream model calls by effective model:

GIGACHAT_MAX_CONNECTIONS=7
GPT2GIGA_MODEL_MAX_CONNECTIONS='{"GigaChat-2-Max":5}'
GPT2GIGA_MODEL_MAX_CONNECTIONS_DEFAULT=1
GPT2GIGA_MODEL_MAX_CONNECTIONS_ACQUIRE_TIMEOUT=30

Semantics:

an empty GPT2GIGA_MODEL_MAX_CONNECTIONS and an empty GPT2GIGA_MODEL_MAX_CONNECTIONS_DEFAULT disable the limiter;
an explicit model limit takes precedence over the default;
GPT2GIGA_MODEL_MAX_CONNECTIONS_ACQUIRE_TIMEOUT=0 immediately returns a local 429;
streaming calls hold a slot until the stream completes or the client disconnects;
limits work within a single process, so workers/pods multiply the effective capacity.

CORS

The defaults are convenient for local development:

GPT2GIGA_CORS_ALLOW_ORIGINS='["*"]'
GPT2GIGA_CORS_ALLOW_METHODS='["*"]'
GPT2GIGA_CORS_ALLOW_HEADERS='["*"]'

In production, set specific origins and headers:

GPT2GIGA_CORS_ALLOW_ORIGINS='["https://your-app.example.com"]'
GPT2GIGA_CORS_ALLOW_METHODS='["GET","POST","OPTIONS"]'
GPT2GIGA_CORS_ALLOW_HEADERS='["authorization","content-type","x-api-key"]'

HTTP body and attachment limits

The global body limit is checked before JSON parsing:

GPT2GIGA_MAX_REQUEST_BODY_BYTES=10485760

Additional limits protect attachment processing:

Variable	Purpose
`GPT2GIGA_MAX_AUDIO_FILE_SIZE_BYTES`	Maximum size of a single audio file.
`GPT2GIGA_MAX_IMAGE_FILE_SIZE_BYTES`	Maximum size of a single image.
`GPT2GIGA_MAX_TEXT_FILE_SIZE_BYTES`	Maximum size of a single text file.
`GPT2GIGA_MAX_AUDIO_IMAGE_TOTAL_SIZE_BYTES`	Combined limit of audio and images in one request.

Lower these values if the proxy is available to a wide range of clients or sits in front of an expensive storage/upload path.

Runtime logs and `/logs*`

Runtime logs are written to stdout and the log file:

GPT2GIGA_LOG_LEVEL=INFO
GPT2GIGA_LOG_FILENAME=gpt2giga.log
GPT2GIGA_LOG_MAX_SIZE=10485760
GPT2GIGA_LOG_REDACT_SENSITIVE=True

/logs/{last_n_lines}, /logs/stream, and /logs/html are available only in DEV. In PROD they are not mounted. If API-key authentication is enabled, /logs* also requires the proxy key.

An IP allowlist for /logs*:

GPT2GIGA_LOGS_IP_ALLOWLIST='["10.0.0.1"]'

Do not use GPT2GIGA_LOG_LEVEL=DEBUG in production: debug output may contain operational context that should not end up in shared logs.

Traffic logs

Traffic logs are structured request/response records. They are disabled by default:

GPT2GIGA_TRAFFIC_LOG_ENABLED=False
GPT2GIGA_TRAFFIC_LOG_SINK=noop

Local JSONL:

GPT2GIGA_TRAFFIC_LOG_ENABLED=True
GPT2GIGA_TRAFFIC_LOG_SINK=jsonl
GPT2GIGA_TRAFFIC_LOG_JSONL_PATH=traffic_logs.jsonl

Postgres:

GPT2GIGA_TRAFFIC_LOG_ENABLED=True
GPT2GIGA_TRAFFIC_LOG_SINK=postgres
GPT2GIGA_TRAFFIC_LOG_POSTGRES_DSN=postgresql://user:password@localhost:5432/gpt2giga

Postgres plus an OpenSearch mirror:

GPT2GIGA_TRAFFIC_LOG_ENABLED=True
GPT2GIGA_TRAFFIC_LOG_SINKS=postgres,opensearch
GPT2GIGA_TRAFFIC_LOG_POSTGRES_DSN=postgresql://user:password@localhost:5432/gpt2giga
GPT2GIGA_OPENSEARCH_URL=http://localhost:9200

Variable	Default	Purpose
`GPT2GIGA_TRAFFIC_LOG_SINK`	`noop`	Single sink: `noop`, `jsonl`, `postgres`, `opensearch`.
`GPT2GIGA_TRAFFIC_LOG_SINKS`	`[]`	Ordered mirror sinks, for example `postgres,opensearch`. If empty, the single sink is used.
`GPT2GIGA_TRAFFIC_LOG_CAPTURE_CONTENT`	`False`	Store request/response bodies after redaction.
`GPT2GIGA_TRAFFIC_LOG_QUEUE_SIZE`	`10000`	Maximum queued events.
`GPT2GIGA_TRAFFIC_LOG_BATCH_SIZE`	`500`	Maximum events in a storage batch.
`GPT2GIGA_TRAFFIC_LOG_FLUSH_INTERVAL_MS`	`2000`	Best-effort flush interval.
`GPT2GIGA_TRAFFIC_LOG_DROP_ON_BACKPRESSURE`	`True`	Drop events instead of blocking the request path when the queue is full.
`GPT2GIGA_TRAFFIC_LOG_REDACT_SENSITIVE`	`True`	Redact sensitive fields before storage.
`GPT2GIGA_TRAFFIC_LOG_REDACT_EXTRA_KEYS`	`[]`	Additional case-insensitive keys for redaction.
`GPT2GIGA_TRAFFIC_LOG_RETENTION_DAYS`	`30`	Retention for Postgres traffic logs.
`GPT2GIGA_TRAFFIC_LOG_PURGE_INTERVAL_SECONDS`	`3600`	Best-effort retention purge interval.

Enable content capture only after resolving storage encryption, retention, redaction, and admin-endpoint access. All storage backends, including JSONL, use the bounded background queue. JSONL writes are batched so a request does not wait for a file open and worker-thread handoff.

Postgres and OpenSearch helper variables

The Compose profiles use several helper variables that are not direct ProxySettings fields but are needed for the storage services:

Variable	Purpose
`GPT2GIGA_POSTGRES_DB`	Database name for `deploy/postgres.yaml`.
`GPT2GIGA_POSTGRES_USER`	Postgres user for the Compose service.
`GPT2GIGA_POSTGRES_PASSWORD`	Postgres password. Set a strong value.
`GPT2GIGA_POSTGRES_PORT`	Host port for local Postgres.
`GPT2GIGA_OPENSEARCH_PORT`	Host port for local OpenSearch.

OpenSearch runtime settings:

Variable	Default	Purpose
`GPT2GIGA_OPENSEARCH_URL`	`http://localhost:9200`	OpenSearch endpoint.
`GPT2GIGA_OPENSEARCH_USERNAME` / `GPT2GIGA_OPENSEARCH_PASSWORD`	empty	Optional authorization.
`GPT2GIGA_OPENSEARCH_INDEX`	`gpt2giga-traffic`	Index or data stream name.
`GPT2GIGA_OPENSEARCH_DATA_STREAM`	`True`	Use data stream bulk create semantics.
`GPT2GIGA_OPENSEARCH_BULK_SIZE`	`500`	Bulk batch size.
`GPT2GIGA_OPENSEARCH_FLUSH_INTERVAL_MS`	`2000`	Best-effort flush interval.

Metrics

The Prometheus-compatible endpoint is disabled by default:

GPT2GIGA_METRICS_ENABLED=False
GPT2GIGA_METRICS_PATH=/metrics

When enabled, the endpoint is mounted at GPT2GIGA_METRICS_PATH. If proxy API-key authentication is enabled, the metrics endpoint also requires the proxy key.

Metrics labels are limited to a bounded set of operational fields and do not include the prompt, response content, API keys, request ids, or trace ids.

Observability / Phoenix

OpenTelemetry/OpenInference observability is disabled by default:

GPT2GIGA_OBSERVABILITY_ENABLED=False
GPT2GIGA_OBSERVABILITY_BACKEND=phoenix
GPT2GIGA_PHOENIX_COLLECTOR_ENDPOINT=http://localhost:4317
GPT2GIGA_PHOENIX_PROJECT_NAME=gpt2giga
GPT2GIGA_PHOENIX_API_KEY=

The unprefixed PHOENIX_COLLECTOR_ENDPOINT, PHOENIX_PROJECT_NAME, and PHOENIX_API_KEY names remain supported as fallbacks for compatibility with Phoenix tooling. Prefer the GPT2GIGA_PHOENIX_* names in gateway-specific configuration so ownership is explicit.

The capture flags are independent and disabled by default:

GPT2GIGA_OBSERVABILITY_SAMPLE_RATE=1.0
GPT2GIGA_OBSERVABILITY_QUEUE_SIZE=10000
GPT2GIGA_OBSERVABILITY_DROP_ON_BACKPRESSURE=True
GPT2GIGA_OBSERVABILITY_CAPTURE_CONTENT=False
GPT2GIGA_OBSERVABILITY_CAPTURE_MESSAGES=False
GPT2GIGA_OBSERVABILITY_CAPTURE_TOOL_ARGS=False
GPT2GIGA_OBSERVABILITY_CAPTURE_RESPONSES=False
GPT2GIGA_OBSERVABILITY_MAX_CONTENT_LENGTH=8000
GPT2GIGA_OBSERVABILITY_REDACTION_ENABLED=True

The bounded queue exports spans outside the request path. With the default backpressure policy, new events are dropped rather than delaying responses when the exporter cannot keep up.

Payload content reaches spans only when the corresponding capture flags are explicitly enabled. For production, capture is usually left disabled or enabled for a short time under access control.

Compose helper variables for Phoenix and mitmproxy:

Variable	Purpose
`PHOENIX_PORT`	Host port for the Phoenix UI.
`PHOENIX_GRPC_PORT`	Host port for the OTLP gRPC collector.
`MITMPROXY_PORT`	Host port for the mitmproxy proxy listener in the Compose overlay.
`MITMPROXY_WEB_PORT`	Host port for the mitmproxy web UI in the Compose overlay.

Details on the emitted spans and metric names: Operations.

Admin and debug endpoints

The Admin API is disabled by default:

GPT2GIGA_ADMIN_API_ENABLED=False
GPT2GIGA_ADMIN_API_KEY="<strong-admin-secret>"

When enabled, the /_admin/logs* endpoints use GPT2GIGA_ADMIN_API_KEY, not the public proxy key. Replay requires a separate opt-in:

GPT2GIGA_REPLAY_ENABLED=True

The debug translation endpoints are enabled separately:

GPT2GIGA_DEBUG_TRANSLATE_ENABLED=True
GPT2GIGA_ADMIN_API_KEY="<strong-admin-secret>"

These endpoints are intended for local debugging and protected admin workflows. Do not enable them publicly without reverse-proxy controls and a separate admin key.

GPT2GIGA_UI_ENABLED is reserved for a future built-in UI. For now, do not use it as a security control.

Settings sources​

How to read this document​

Quick profiles​

Value format​

Core proxy settings​

Authentication and security​

GigaChat settings​

Reasoning and structured output​

Backend API mode​

Normalized layer flags​

Conversation stitching​

Per-model concurrency​

CORS​

HTTP body and attachment limits​

Runtime logs and /logs*​

Traffic logs​

Postgres and OpenSearch helper variables​

Metrics​

Observability / Phoenix​

Admin and debug endpoints​