Normalized messages architecture
The normalized layer is an internal contract between the public API formats and
the upstream providers. It is not a new public API. Clients keep sending OpenAI
Chat Completions, OpenAI Responses, Anthropic Messages, or Gemini GenerateContent,
and the gateway brings the compatible parts of the payload to canonical models
from gpt2giga/protocols/normalized/. Gemini GenerateContent already uses a
dedicated Gemini-to-normalized adapter in the main execution path.
Current status
GPT2GIGA_NORMALIZATION_MODE=off: OpenAI Chat Completions goes through legacy transforms.GPT2GIGA_NORMALIZATION_MODE=shadow: OpenAI Chat builds a normalized request alongside the legacy path and stores a safe diagnostic shape hash without prompt content.GPT2GIGA_NORMALIZATION_MODE=on: OpenAI Chat Completions is executed through the normalized path andGigaChatProviderAdapter; a legacy fallback is available before the response starts viaGPT2GIGA_LEGACY_CHAT_FALLBACK=True.- OpenAI Responses and Anthropic Messages are still executed through legacy route transforms, but observability and debug translation already use a normalized representation where possible.
- Gemini GenerateContent and streamGenerateContent are executed through
GeminiProtocolAdapter, normalized models, andGigaChatProviderAdapterindependently of the OpenAI Chat normalization flags. - Debug endpoints can translate between the
openai,anthropic,normalized, andgigachatformats for protected admin workflows.
Core models
The normalized request envelope:
NormalizedChatRequest:protocol,operation,model,stream,messages,tools,tool_choice,response_format,generation_config,user,metadata.NormalizedMessage:role,content,name,tool_call_id,tool_calls.NormalizedContentPart: a generic content part withtype,text,data,mime_type,detail.NormalizedTool: a flattened tool/function contract withname,description,parameters.NormalizedGenerationConfig: common generation knobs:temperature,top_p,max_tokens, penalties,stop,seed.
Normalized output:
NormalizedResponse: a provider-independent non-streaming response:choices,usage,error,metadata,provider_metadata.NormalizedChoice:messageordelta,finish_reason,index.NormalizedUsage:input_tokens,output_tokens,total_tokens.NormalizedStreamEvent: canonical stream events:message_start,content_delta,reasoning_delta,tool_call_start,tool_call_delta,usage,message_end,error,heartbeat.
All normalized models inherit two extension buckets:
raw_extensions: fields of the original public protocol that the gateway must keep but not promote into the canonical model.provider_metadata: provider-specific data, for example GigaChatadditional_fieldsor safe metadata from the upstream response.
OpenAI Chat flow
OpenAI Chat Completions in normalized mode goes like this:
gpt2giga/routers/openai/chat_completions.pyreads the payload and request context.OpenAIProtocolAdapterfromgpt2giga/protocols/openai/adapter.pybuilds aNormalizedChatRequest.GigaChatProviderAdapterfromgpt2giga/providers/gigachat/adapter.pyexecutes the normalized request through the current GigaChat SDK path.- The provider adapter returns a
NormalizedResponseor aNormalizedStreamEvent. - OpenAI response adapters map the result back into an OpenAI Chat Completions payload or SSE chunks.
- Observability receives the normalized request/response and builds safe OpenInference-style span attributes.
Inside GigaChatProviderAdapter, the normalized request is currently
reconstructed into an OpenAI-like payload, after which the existing
RequestTransformer for the GigaChat v1/v2 SDK is used. This is a transitional
layer: the normalized contract is already separated from the router, but part of
the GigaChat-specific preparation still reuses the legacy code.
Differences from OpenAI Chat Completions
OpenAI Chat Completions is the public wire format. Normalized messages are the internal gateway contract.
Main differences:
- OpenAI stores tool schemas as
{"type": "function", "function": {...}}; the normalized layer storesNormalizedToolwith flatname,description,parameters. - OpenAI
tool_callscontains nestedfunction.arguments; the normalized layer storesNormalizedToolCall.nameandargumentsdirectly, while the nested provider fields remain inraw_extensions. - OpenAI content parts use concrete fields such as
text,image_url,file; the normalized content part has a genericdataand optional metadata. - OpenAI top-level parameters are mixed in one object; the normalized layer groups
generation knobs in
generation_config, structured output inresponse_format, and unknown/compatibility fields inraw_extensions. - OpenAI usage is called
prompt_tokensandcompletion_tokens; the normalized layer uses provider-neutralinput_tokensandoutput_tokens. - The OpenAI response
id/object/created/system_fingerprintare formed only on the way out of the normalized response adapter.
Differences from OpenAI Responses
The OpenAI Responses API has a different public contract: input, instructions,
output items, previous_response_id, stateful response ids, built-in tool
progress events, and text.format.
The normalized layer currently describes Responses as a chat-like exchange only for observability:
responses_request_to_normalized()builds aNormalizedChatRequestwithoperation="responses".inputandinstructionsare turned into normalized messages.max_output_tokensis mapped togeneration_config.max_tokens.text.formatis mapped toNormalizedResponseFormat.- Responses output items are collapsed into an assistant message and tool calls for LLM spans.
Execution of /responses stays in the legacy route path:
gpt2giga/routers/openai/responses.py uses the existing GigaChat v1/v2
request transformers and response processor. So the normalized Responses helper
is currently needed for consistent observability, not for the main execution path.
Differences from Gemini GenerateContent
Gemini GenerateContent is a separate public protocol with contents, parts,
systemInstruction, generationConfig, tools.functionDeclarations,
toolConfig.functionCallingConfig, candidates, and its own SSE response shape.
The normalized layer differs as follows:
contents[].partsare turned into normalized messages/content parts.systemInstructionbecomes a normalized system message.generationConfig.temperature,topP,maxOutputTokens, penalties,seed, andstopSequencesare mapped toNormalizedGenerationConfig.functionDeclarationsare turned intoNormalizedTool; supported provider tools are kept as GigaChat-compatible built-in tool metadata, while unsupported tools remain inraw_extensionsfor diagnostics.toolConfig.functionCallingConfigapplies to function declarations and does not force the built-in provider tools.- Gemini candidates, finish reasons, and usage metadata are formed on the way out of the normalized response/stream adapters.
The Gemini Files/Batches router modules are prepared but not mounted in the public API surface; they are not part of the current normalized execution path.
Differences from the GigaChat format
GigaChat is the upstream provider format that the gateway calls through the SDK. Its
v1/v2 contracts, SDK models, function-call state ids, attachments, and
additional_fields differ from the public OpenAI/Anthropic shapes.
The normalized layer differs as follows:
- it does not depend on
gigachat.models.Messagesor the v2ChatMessage; - it stores provider-neutral roles/messages/tools/usage/errors;
- it does not expose GigaChat authorization, SDK contextvars, and transport details;
- it keeps GigaChat-specific passthrough in
provider_metadata["gigachat"]; - it filters response headers before moving them into metadata and does not store
authorization,x-api-key,cookie; - it normalizes the GigaChat
function_callintoNormalizedToolCalland the finish reasonfunction_callintotool_calls.
The provider adapter is responsible for the reverse side: it takes the normalized request, prepares the GigaChat payload, calls the upstream, and returns the normalized response/events.
Differences from Anthropic Messages
Anthropic Messages is a separate public protocol with a top-level system,
content blocks, max_tokens, stop_sequences, tool_use, tool_result,
thinking, and its own streaming event names.
The normalized layer differs as follows:
systembecomes a regular normalizedsystemmessage.- Anthropic text/image blocks are translated into a normalized
contentstring or content parts. tool_usebecomes assistanttool_calls.tool_resultbecomes a normalized message withrole="tool"andtool_call_id.max_tokensis stored ingeneration_config.max_tokens, andstop_sequencesingeneration_config.stop.thinking/reasoning content is not a separate canonical field and is kept as a controlled extension, for examplereasoning_content.- Anthropic
usage.input_tokensandusage.output_tokensalready match the normalized naming, andtotal_tokensis computed when both values are present.
Currently the Anthropic execution path stays legacy: the Anthropic payload is first brought to an OpenAI-like payload, then the common GigaChat route transform is used. Debug translation and observability can build a normalized representation on top of this path.
Observability
LLM observability is intentionally built on top of normalized shapes:
- Chat Completions spans get request/response attributes from
NormalizedChatRequestandNormalizedResponse. - Responses and Anthropic helpers bring their public payloads to a normalized chat-like representation before building span attributes.
- The Gemini GenerateContent route already produces observability from the
normalized request/response and uses the root span
Gemini-Content. - Streaming milestones are built from
NormalizedStreamEventwhen the route already uses the normalized stream path. - Content capture stays disabled by default; messages, tool args, and responses require a separate opt-in and go through redaction.
This makes it possible to add new protocols/providers without copying all the OpenInference/Phoenix attribute logic for each wire format.
Debugging
For a local check, enable protected debug translation:
GPT2GIGA_DEBUG_TRANSLATE_ENABLED=True
GPT2GIGA_ADMIN_API_KEY="<strong-admin-secret>"
Useful endpoints:
POST /_debug/translate/openai-to-normalizedPOST /_debug/translate/anthropic-to-normalizedPOST /_debug/translate/normalized-to-gigachatPOST /_debug/translate/gigachat-to-openaiPOST /_debug/translatefor a genericfrom/toenvelope
Shadow diagnostics do not write prompt or response content. They store the route, status, warnings/errors, and the shape hash of the normalized payload.