openai-oxide
A high-performance, feature-complete OpenAI client for Rust, Node.js, and Python.
openai-oxide implements the full Responses API, Chat Completions, and 20+ other endpoints with performance primitives like persistent WebSockets, hedged requests, and early-parsing for function calls.
Why openai-oxide?
- Zero-Overhead Streaming — Custom zero-copy SSE parser, TTFT ~670ms
- WebSocket Mode — Persistent
wss://connections, 37% faster agent loops - Stream FC Early Parse — Execute tools ~400ms before response finishes
- SIMD JSON — Opt-in AVX2/NEON for microsecond parsing
- Hedged Requests — 50-96% P99 tail latency reduction
- WASM First-Class — Full streaming in Cloudflare Workers and browsers
Packages
| Package | Registry | Install |
|---|---|---|
openai-oxide | crates.io | cargo add openai-oxide |
openai-oxide | npm | npm install openai-oxide |
openai-oxide | PyPI | pip install openai-oxide |
OpenAI Compatibility
Parameter names match the official Python SDK exactly. If the OpenAI docs show model="gpt-5.4", use .model("gpt-5.4") in Rust or {model: "gpt-5.4"} in Node.js.
See the OpenAI Docs Mapping for a complete cross-reference.
Installation
openai-oxide is available for three platforms. Pick your language:
All packages share the same Rust core for consistent behavior and performance.
Rust Installation
Add to Cargo.toml
cargo add openai-oxide tokio --features tokio/full
Or manually:
[dependencies]
openai-oxide = "0.9"
tokio = { version = "1", features = ["full"] }
Feature Flags
Every API endpoint is behind a feature flag. All enabled by default.
# Minimal: only Responses API
openai-oxide = { version = "0.9", default-features = false, features = ["responses"] }
Available features: chat, responses, embeddings, images, audio, files, fine-tuning, models, moderations, batches, uploads, beta
Ecosystem features: websocket, websocket-wasm, simd, macros
Configuration
#![allow(unused)]
fn main() {
use openai_oxide::OpenAI;
// From environment variable (recommended)
let client = OpenAI::from_env()?; // Uses OPENAI_API_KEY
// Explicit key
let client = OpenAI::new("sk-...");
// Custom config
use openai_oxide::config::ClientConfig;
let client = OpenAI::with_config(
ClientConfig::new("sk-...").base_url("https://...").timeout_secs(30)
);
// Azure
use openai_oxide::azure::AzureConfig;
let client = OpenAI::azure(
AzureConfig::new().azure_endpoint("https://my.openai.azure.com").api_key("...")
)?;
}
API Reference
Full API docs: docs.rs/openai-oxide
Node.js Installation
Install
npm install openai-oxide
# or
pnpm add openai-oxide
# or
yarn add openai-oxide
Prebuilt native binaries for: macOS (x64, arm64), Linux (x64, arm64, glibc & musl), Windows (x64).
Setup
const { Client } = require("openai-oxide");
// Uses OPENAI_API_KEY from environment
const client = new Client();
// Explicit key
const client = new Client("sk-...");
Available Methods
| Method | Description |
|---|---|
createResponse(params) | Full Responses API call |
createText(model, input) | Fast path — returns text only |
createStoredResponseId(model, input) | Fast path — returns response ID |
createTextFollowup(model, input, prevId) | Multi-turn fast path |
createStream(model, input) | Streaming responses |
wsSession() | WebSocket persistent connection |
npm Package
npmjs.com/package/openai-oxide
Python Installation
Install
pip install openai-oxide
# or
uv pip install openai-oxide
# or
uv add openai-oxide
No Rust toolchain required — prebuilt wheels available.
Setup
from openai_oxide import Client
# Uses OPENAI_API_KEY from environment
client = Client()
# Explicit key
client = Client("sk-...")
Available Methods
| Method | Description |
|---|---|
await client.create(model, input) | Basic request |
await client.create_stream(model, input) | Streaming |
await client.create_structured(model, input, name, schema) | Structured output |
await client.create_with_tools(model, input, tools) | Function calling |
PyPI Package
Quick Start
Set your API key:
export OPENAI_API_KEY="sk-..."
Rust
use openai_oxide::{OpenAI, types::responses::*};
#[tokio::main]
async fn main() -> Result<(), openai_oxide::OpenAIError> {
let client = OpenAI::from_env()?;
let response = client.responses().create(
ResponseCreateRequest::new("gpt-5.4-mini")
.input("Explain quantum computing in one sentence.")
.max_output_tokens(100)
).await?;
println!("{}", response.output_text());
Ok(())
}
Node.js
const { Client } = require("openai-oxide");
const client = new Client();
const text = await client.createText("gpt-5.4-mini", "Hello from Node!");
console.log(text);
Python
import asyncio, json
from openai_oxide import Client
async def main():
client = Client()
res = json.loads(await client.create("gpt-5.4-mini", "Hello from Python!"))
print(res["text"])
asyncio.run(main())
Drop-in Migration
Switch from the official OpenAI SDK by changing one import line. Rest of your code stays the same.
Python
- from openai import AsyncOpenAI
+ from openai_oxide.compat import AsyncOpenAI
Full working example (mirrors official openai examples/parsing.py):
#!/usr/bin/env python3
"""
Drop-in replacement for official openai SDK parsing example.
Change: `from openai import AsyncOpenAI` → `from openai_oxide.compat import AsyncOpenAI`
"""
import asyncio
from typing import List
from pydantic import BaseModel
# ── One-line change from official SDK ──
# from openai import AsyncOpenAI
from openai_oxide.compat import AsyncOpenAI
class Step(BaseModel):
explanation: str
output: str
class MathResponse(BaseModel):
steps: List[Step]
final_answer: str
async def main():
client = AsyncOpenAI()
completion = await client.chat.completions.parse(
model="gpt-5.4-mini",
messages=[
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "solve 8x + 31 = 2"},
],
response_format=MathResponse,
)
message = completion.choices[0].message
if message.parsed:
for step in message.parsed.steps:
print(f" {step.explanation} → {step.output}")
print("answer:", message.parsed.final_answer)
else:
print("refusal:", message.refusal)
asyncio.run(main())
Node.js
- const OpenAI = require('openai');
+ const { OpenAI } = require('openai-oxide/compat');
Full working example (mirrors official openai SDK):
/**
* Drop-in replacement for official openai SDK demo.
* Change: `const OpenAI = require('openai')` → `const { OpenAI } = require('openai-oxide/compat')`
*/
// ── One-line change from official SDK ──
// const OpenAI = require('openai');
const { OpenAI } = require('../compat');
async function main() {
const client = new OpenAI();
// Non-streaming:
console.log("----- standard request -----");
const completion = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "Say this is a test" }],
});
console.log(completion.choices[0].message.content);
// Streaming:
console.log("----- streaming request -----");
const stream = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "How do I list files in a directory using Node.js?" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
}
main();
Guides
Step-by-step guides for common tasks with openai-oxide.
Each guide shows code in all three languages (Rust, Node.js, Python) and links to the relevant OpenAI documentation.
Chat Completions
Send messages to GPT models and receive completions. This is the most common API for conversational AI.
See the official Chat Completions guide and API reference.
Rust
//! Basic chat completion example.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example chat`
use openai_oxide::OpenAI;
use openai_oxide::types::chat::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
let request = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![
ChatCompletionMessageParam::System {
content: "You are a helpful assistant.".into(),
name: None,
},
ChatCompletionMessageParam::User {
content: UserContent::Text("What is the capital of France?".into()),
name: None,
},
],
);
let response = client.chat().completions().create(request).await?;
for choice in &response.choices {
println!(
"[{}] {}",
choice.finish_reason,
choice.message.content.as_deref().unwrap_or("")
);
}
if let Some(usage) = &response.usage {
println!(
"\nTokens: {} prompt + {} completion = {} total",
usage.prompt_tokens.unwrap_or(0),
usage.completion_tokens.unwrap_or(0),
usage.total_tokens.unwrap_or(0),
);
}
Ok(())
}
Run: OPENAI_API_KEY=sk-... cargo run --example chat
Next Steps
- Streaming — Stream chat completion tokens as they arrive
- Function Calling — Let the model call your functions
- Structured Output — Get JSON responses matching a schema
Responses API
The Responses API is OpenAI’s latest endpoint for generating text, replacing Chat Completions for new projects. It supports built-in tools, multi-turn conversations via previous_response_id, and structured output.
See the official Responses API reference.
Rust
//! Responses API example — web search + function tools.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example responses_api`
use openai_oxide::OpenAI;
use openai_oxide::types::responses::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
// Simple text input
let request = ResponseCreateRequest::new("gpt-4o")
.input("What are the latest developments in Rust programming?")
.instructions("Be concise and cite sources when possible.")
.tools(vec![ResponseTool::WebSearch {
search_context_size: Some("medium".into()),
user_location: None,
}])
.temperature(0.7)
.max_output_tokens(1024)
.store(true);
let response = client.responses().create(request).await?;
println!("Response ID: {}", response.id);
println!("Status: {:?}", response.status);
println!("\n{}", response.output_text());
if let Some(usage) = &response.usage {
println!(
"\nTokens: {} in + {} out = {} total",
usage.input_tokens.unwrap_or(0),
usage.output_tokens.unwrap_or(0),
usage.total_tokens.unwrap_or(0),
);
}
// Multi-turn with previous_response_id
let follow_up = ResponseCreateRequest::new("gpt-4o")
.input("Can you elaborate on the async ecosystem?")
.previous_response_id(&response.id);
let response2 = client.responses().create(follow_up).await?;
println!("\n--- Follow-up ---\n{}", response2.output_text());
Ok(())
}
Run: OPENAI_API_KEY=sk-... cargo run --example responses_api
Next Steps
- Streaming — Stream response events in real time
- Function Calling — Use tools with the Responses API
- WebSocket Sessions — Persistent connections for agent loops
Streaming
Stream tokens and events as they are generated, reducing time-to-first-token (TTFT) and enabling real-time UI updates.
See the official Streaming documentation for event types and behavior.
Stream Helpers (recommended)
High-level wrapper with typed events and automatic accumulation — no manual chunk stitching.
//! Streaming chat completion example.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example chat_stream`
use futures_util::StreamExt;
use openai_oxide::OpenAI;
use openai_oxide::types::chat::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
let request = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![
ChatCompletionMessageParam::System {
content: "You are a helpful assistant.".into(),
name: None,
},
ChatCompletionMessageParam::User {
content: UserContent::Text("Write a haiku about Rust programming.".into()),
name: None,
},
],
);
let mut stream = client.chat().completions().create_stream(request).await?;
while let Some(result) = stream.next().await {
match result {
Ok(chunk) => {
for choice in &chunk.choices {
if let Some(content) = &choice.delta.content {
print!("{content}");
}
if choice.finish_reason.is_some() {
println!();
}
}
}
Err(e) => {
eprintln!("\nStream error: {e}");
break;
}
}
}
Ok(())
}
Event Types
| Event | When | Fields |
|---|---|---|
Chunk | Every SSE chunk | Raw ChatCompletionChunk |
ContentDelta | New text fragment | delta, snapshot (accumulated) |
ContentDone | Text complete | content (full text) |
ToolCallDelta | Argument fragment | index, name, arguments_delta, arguments_snapshot |
ToolCallDone | Tool call complete | index, call_id, name, arguments |
RefusalDelta/Done | Model refuses | delta/refusal |
Done | Stream finished | finish_reason |
Node.js (drop-in replacement)
Same syntax as official openai package — for await over stream:
/**
* Drop-in replacement for official openai SDK demo.
* Change: `const OpenAI = require('openai')` → `const { OpenAI } = require('openai-oxide/compat')`
*/
// ── One-line change from official SDK ──
// const OpenAI = require('openai');
const { OpenAI } = require('../compat');
async function main() {
const client = new OpenAI();
// Non-streaming:
console.log("----- standard request -----");
const completion = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "Say this is a test" }],
});
console.log(completion.choices[0].message.content);
// Streaming:
console.log("----- streaming request -----");
const stream = await client.chat.completions.create({
model: "gpt-5.4-mini",
messages: [{ role: "user", content: "How do I list files in a directory using Node.js?" }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
}
main();
Python (drop-in replacement)
Same syntax as official openai package — async for over stream:
#!/usr/bin/env python3
"""
Drop-in replacement for official openai SDK demo.
Change: `from openai import AsyncOpenAI` → `from openai_oxide.compat import AsyncOpenAI`
"""
import asyncio
# ── One-line change from official SDK ──
# from openai import AsyncOpenAI
from openai_oxide.compat import AsyncOpenAI
async def main():
client = AsyncOpenAI()
# Non-streaming:
print("----- standard request -----")
completion = await client.chat.completions.create(
model="gpt-5.4-mini",
messages=[
{
"role": "user",
"content": "Say this is a test",
},
],
)
print(completion.choices[0].message.content)
# Streaming:
print("----- streaming request -----")
stream = await client.chat.completions.create(
model="gpt-5.4-mini",
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
stream=True,
)
async for event in stream:
if event.get("type") == "OutputTextDelta":
print(event.get("delta", ""), end="")
elif event.get("delta"):
print(event.get("delta", ""), end="")
print()
asyncio.run(main())
Responses API Streaming
Typed events for the Responses API:
#![allow(unused)]
fn main() {
use futures_util::StreamExt;
use openai_oxide::types::responses::{ResponseCreateRequest, ResponseStreamEvent};
let mut stream = client.responses()
.create_stream(ResponseCreateRequest::new("gpt-5.4-mini").input("Hi"))
.await?;
while let Some(Ok(event)) = stream.next().await {
match event {
ResponseStreamEvent::OutputTextDelta { delta, .. } => print!("{delta}"),
ResponseStreamEvent::ResponseCompleted { response } => {
println!("\nDone: {}", response.output_text());
}
_ => {}
}
}
}
Next Steps
- Function Calling — Stream with early tool-call parsing
- WebSocket Sessions — Even lower latency with persistent connections
- Structured Output — Type-safe responses
Function Calling
Let the model invoke your functions by defining tools. openai-oxide supports early-parsing of function call arguments during streaming, allowing you to execute tools ~400ms before the response finishes.
See the official Function Calling guide for tool schema definitions.
Rust
//! Tool calling example — model calls a function, we return the result.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example tool_calling`
use openai_oxide::OpenAI;
use openai_oxide::types::chat::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
let tools = vec![Tool {
type_: "function".into(),
function: FunctionDef {
name: "get_weather".into(),
description: Some("Get current weather for a city".into()),
parameters: Some(serde_json::json!({
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
})),
strict: Some(true),
},
}];
// Step 1: Send message with tools
let request = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![ChatCompletionMessageParam::User {
content: UserContent::Text("What's the weather in Tokyo?".into()),
name: None,
}],
)
.tools(tools.clone())
.tool_choice(ToolChoice::Mode("auto".into()));
let response = client.chat().completions().create(request).await?;
let message = &response.choices[0].message;
if let Some(tool_calls) = &message.tool_calls {
for tc in tool_calls {
println!(
"Tool call: {} ({})",
tc.function.name, tc.function.arguments
);
// Step 2: Simulate function result
let result = r#"{"temperature": 22, "condition": "Sunny", "unit": "celsius"}"#;
// Step 3: Send tool result back
let follow_up = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![
ChatCompletionMessageParam::User {
content: UserContent::Text("What's the weather in Tokyo?".into()),
name: None,
},
ChatCompletionMessageParam::Assistant {
content: None,
name: None,
tool_calls: Some(tool_calls.clone()),
refusal: None,
},
ChatCompletionMessageParam::Tool {
content: result.into(),
tool_call_id: tc.id.clone(),
},
],
)
.tools(tools.clone());
let final_response = client.chat().completions().create(follow_up).await?;
println!(
"\nAssistant: {}",
final_response.choices[0]
.message
.content
.as_deref()
.unwrap_or("")
);
}
} else {
println!("Assistant: {}", message.content.as_deref().unwrap_or(""));
}
Ok(())
}
Run: OPENAI_API_KEY=sk-... cargo run --example tool_calling
Next Steps
- Streaming — Stream function call arguments as they arrive
- Structured Output — Combine tools with structured responses
WebSocket Sessions
Persistent WebSocket connections eliminate per-request TLS handshakes and HTTP overhead, achieving 37% faster round-trip times for agent loops and multi-turn conversations.
See the official Realtime API guide for session configuration.
Rust
//! WebSocket Responses API example — persistent connection for multi-turn.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example websocket --features websocket`
use futures_util::StreamExt;
use openai_oxide::OpenAI;
use openai_oxide::types::responses::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
// --- Example 1: Simple send/receive ---
println!("=== Example 1: Simple request ===\n");
let mut session = client.ws_session().await?;
let request = ResponseCreateRequest::new("gpt-4o-mini")
.input("What is the capital of France?")
.max_output_tokens(256);
let response = session.send(request).await?;
println!("Response: {}", response.output_text());
if let Some(usage) = &response.usage {
println!(
"Tokens: {} in + {} out",
usage.input_tokens.unwrap_or(0),
usage.output_tokens.unwrap_or(0),
);
}
// --- Example 2: Multi-turn via same session ---
println!("\n=== Example 2: Multi-turn ===\n");
let follow_up = ResponseCreateRequest::new("gpt-4o-mini")
.input("What about Germany?")
.previous_response_id(&response.id);
let response2 = session.send(follow_up).await?;
println!("Follow-up: {}", response2.output_text());
// --- Example 3: Streaming events ---
println!("\n=== Example 3: Streaming ===\n");
let stream_request = ResponseCreateRequest::new("gpt-4o-mini")
.input("Count from 1 to 5, one number per line.")
.max_output_tokens(128);
let mut stream = session.send_stream(stream_request).await?;
while let Some(event) = stream.next().await {
let event = event?;
use openai_oxide::types::responses::ResponseStreamEvent::*;
match event {
ResponseOutputTextDelta(evt) => print!("{}", evt.delta),
ResponseCompleted(_) => println!("\n\n[completed]"),
_ => {} // Other events: created, output_item.added, etc.
}
}
// --- Clean up ---
session.close().await?;
println!("\nSession closed.");
Ok(())
}
Run: OPENAI_API_KEY=sk-... cargo run --example websocket --features websocket
When to Use WebSockets
- Agent loops with 3+ sequential LLM calls
- Real-time conversational UIs
- High-throughput batch processing where latency matters
Known Issues
Decimal temperature causes silent close (code=1000)
Status: OpenAI bug as of March 2026
Sending temperature as a decimal (e.g. 0.7, 1.2) over WebSocket causes the server to immediately close the connection with code=1000 and an empty reason — no error event is returned. Integer values (0, 1, 2) work fine. The same decimal values work normally over HTTP.
Workaround: Omit temperature from WebSocket requests (the API uses model default ~1.0), or round to integer.
Tracking: OpenAI Community #1375536
Structured Output
Force the model to return JSON matching a specific schema. Guarantees valid, parseable output without prompt engineering tricks.
See the official Structured Outputs guide for schema format and limitations.
Rust — parse::<T>() (recommended)
Derive JsonSchema on your struct and call parse::<T>(). The SDK auto-generates the schema and deserializes the response.
Requires feature structured: cargo add openai-oxide --features structured
#![allow(unused)]
fn main() {
// Live test for all new features — requires OPENAI_API_KEY
//
// cargo run --example live_features_test --features structured
use futures_util::StreamExt;
use openai_oxide::OpenAI;
use openai_oxide::stream_helpers::ChatStreamEvent;
use openai_oxide::types::chat::{ChatCompletionMessageParam, ChatCompletionRequest, UserContent};
#[derive(Debug, serde::Deserialize, schemars::JsonSchema)]
struct MathAnswer {
steps: Vec<Step>,
final_answer: String,
}
#[derive(Debug, serde::Deserialize, schemars::JsonSchema)]
struct Step {
explanation: String,
output: String,
}
#[derive(Debug, serde::Deserialize, schemars::JsonSchema)]
struct Sentiment {
sentiment: String,
confidence: f64,
}
fn msg(text: &str) -> Vec<ChatCompletionMessageParam> {
vec![ChatCompletionMessageParam::User {
content: UserContent::Text(text.into()),
}
Rust — Manual Schema
For full control, construct the schema yourself:
//! Structured output with JSON Schema — model returns a validated JSON object.
//!
//! Run with: `OPENAI_API_KEY=sk-... cargo run --example structured_output`
use openai_oxide::OpenAI;
use openai_oxide::types::chat::*;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let client = OpenAI::from_env()?;
let request = ChatCompletionRequest::new(
"gpt-4o-mini",
vec![
ChatCompletionMessageParam::System {
content: "Extract structured data from user messages.".into(),
name: None,
},
ChatCompletionMessageParam::User {
content: UserContent::Text(
"My name is Alice, I'm 30, and I work as a software engineer at Acme Corp."
.into(),
),
name: None,
},
],
)
.response_format(ResponseFormat::JsonSchema {
json_schema: JsonSchema {
name: "person_info".into(),
description: Some("Extracted person information".into()),
schema: Some(serde_json::json!({
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"occupation": {"type": "string"},
"company": {"type": "string"}
},
"required": ["name", "age", "occupation", "company"],
"additionalProperties": false
})),
strict: Some(true),
},
});
let response = client.chat().completions().create(request).await?;
let content = response.choices[0]
.message
.content
.as_deref()
.unwrap_or("{}");
let parsed: serde_json::Value = serde_json::from_str(content)?;
println!("Extracted data:");
println!(" Name: {}", parsed["name"]);
println!(" Age: {}", parsed["age"]);
println!(" Occupation: {}", parsed["occupation"]);
println!(" Company: {}", parsed["company"]);
Ok(())
}
Node.js (drop-in replacement)
Same syntax as official openai package — change one import:
/**
* Drop-in replacement for official openai SDK structured output example.
* Change: `const OpenAI = require('openai')` → `const { OpenAI } = require('openai-oxide/compat')`
*
* For Zod support: npm install zod zod-to-json-schema
*/
// ── One-line change from official SDK ──
// const OpenAI = require('openai');
const { OpenAI } = require('../compat');
async function main() {
const client = new OpenAI();
// JSON Schema (works without Zod)
const MathResponseSchema = {
type: "object",
properties: {
steps: {
type: "array",
items: {
type: "object",
properties: {
explanation: { type: "string" },
output: { type: "string" },
},
required: ["explanation", "output"],
additionalProperties: false,
},
},
final_answer: { type: "string" },
},
required: ["steps", "final_answer"],
additionalProperties: false,
};
const result = await client.chat.completions.parse({
model: "gpt-5.4-mini",
messages: [
{ role: "system", content: "You are a helpful math tutor." },
{ role: "user", content: "solve 8x + 31 = 2" },
],
response_format: {
type: "json_schema",
json_schema: {
name: "MathResponse",
schema: MathResponseSchema,
strict: true,
},
},
});
const message = result.choices[0].message;
const parsed = JSON.parse(message.content);
for (const step of parsed.steps) {
console.log(` ${step.explanation} → ${step.output}`);
}
console.log("answer:", parsed.final_answer);
}
main();
Python (drop-in replacement)
Same syntax as official openai package — change one import:
#!/usr/bin/env python3
"""
Drop-in replacement for official openai SDK parsing example.
Change: `from openai import AsyncOpenAI` → `from openai_oxide.compat import AsyncOpenAI`
"""
import asyncio
from typing import List
from pydantic import BaseModel
# ── One-line change from official SDK ──
# from openai import AsyncOpenAI
from openai_oxide.compat import AsyncOpenAI
class Step(BaseModel):
explanation: str
output: str
class MathResponse(BaseModel):
steps: List[Step]
final_answer: str
async def main():
client = AsyncOpenAI()
completion = await client.chat.completions.parse(
model="gpt-5.4-mini",
messages=[
{"role": "system", "content": "You are a helpful math tutor."},
{"role": "user", "content": "solve 8x + 31 = 2"},
],
response_format=MathResponse,
)
message = completion.choices[0].message
if message.parsed:
for step in message.parsed.steps:
print(f" {step.explanation} → {step.output}")
print("answer:", message.parsed.final_answer)
else:
print("refusal:", message.refusal)
asyncio.run(main())
Next Steps
- Function Calling — Combine structured output with tool use
- Streaming — Stream with typed events
- Responses API — Full parameter reference
Hedged Requests
Hedged requests race multiple identical API calls and return the first successful response. This technique reduces P99 tail latency by 50-96% at the cost of additional API usage.
This is an openai-oxide exclusive feature not available in the official SDKs.
Rust
#![allow(unused)]
fn main() {
use openai_oxide::{OpenAI, types::responses::*};
let client = OpenAI::from_env()?;
// Race 2 identical requests, return whichever finishes first
let response = client.responses().hedged_request(
ResponseCreateRequest::new("gpt-5.4-mini")
.input("Quick question: what is 2+2?"),
2, // number of concurrent requests
).await?;
}
When to Use
- Latency-sensitive applications (real-time UIs, voice assistants)
- Short, deterministic prompts where cost of duplicates is low
- Production systems with strict P99 SLA requirements
Trade-offs
- Uses N times the tokens (one request per hedge)
- Best for short prompts where the latency gain outweighs cost
- Not recommended for long-running completions with
max_output_tokens > 1000
Webhook Verification
Verify OpenAI webhook signatures to ensure payloads are authentic and not replayed.
Requires feature webhooks: cargo add openai-oxide --features webhooks
See the official Webhooks documentation for setup.
Usage
#![allow(unused)]
fn main() {
use openai_oxide::resources::webhooks::Webhooks;
// Initialize with your webhook secret (from OpenAI dashboard)
let wh = Webhooks::new("whsec_YOUR_WEBHOOK_SECRET")?;
// In your HTTP handler — extract headers and body
let signature = headers.get("webhook-signature").unwrap();
let timestamp = headers.get("webhook-timestamp").unwrap();
// Verify and parse in one call
let event: serde_json::Value = wh.unwrap(body_bytes, signature, timestamp)?;
// Or verify only (without parsing)
wh.verify(body_bytes, signature, timestamp)?;
}
Security
- HMAC-SHA256 signature validation
- Timestamp replay protection (5-minute tolerance)
- Supports multiple signature versions in header
- Base64-encoded
whsec_secrets (auto-stripped)
Next Steps
- API Reference — Full method signatures
WASM / Cloudflare Workers
openai-oxide compiles to WebAssembly for use in Cloudflare Workers, Deno Deploy, and browser environments. Full streaming support is included.
Setup
[dependencies]
openai-oxide = { version = "0.9", default-features = false, features = ["responses", "websocket-wasm"] }
Disable default features to exclude tokio and native TLS, which are not available in WASM.
Cloudflare Worker Example
use openai_oxide::OpenAI;
use worker::*;
#[event(fetch)]
async fn main(req: Request, env: Env, _ctx: Context) -> Result<Response> {
let client = OpenAI::new(env.secret("OPENAI_API_KEY")?.to_string());
let response = client.responses().create(
ResponseCreateRequest::new("gpt-5.4-mini")
.input("Hello from the edge!")
).await.map_err(|e| worker::Error::from(e.to_string()))?;
Response::ok(response.output_text())
}
Limitations
- No filesystem access (audio file uploads require bytes, not paths)
- WebSocket mode uses
websocket-wasmfeature instead ofwebsocket - SIMD feature is not available in WASM targets
Benchmarks
All benchmarks: median of 3 runs × 5 iterations each. Model: gpt-5.4. Environment: macOS (M-series), release mode, warm HTTP/2 connections.
Benchmarks
Rust Ecosystem (openai-oxide vs async-openai vs genai)
| Test | openai-oxide | async-openai | genai |
|---|---|---|---|
| Plain text | 1011ms | 960ms | 835ms |
| Structured output | 1331ms | N/A | 1197ms |
| Function calling | 1192ms | 1748ms | 1030ms |
| Multi-turn (2 reqs) | 2362ms | 3275ms | 1641ms |
| Streaming TTFT | 645ms | 685ms | 670ms |
| Parallel 3x | 1165ms | 1053ms | 866ms |
WebSocket mode (openai-oxide only)
| Test | WebSocket | HTTP | Improvement |
|---|---|---|---|
| Plain text | 710ms | 1011ms | -29% |
| Multi-turn (2 reqs) | 1425ms | 2362ms | -40% |
| Rapid-fire (5 calls) | 3227ms | 5807ms | -44% |
median of medians, 3×5 iterations. Model: gpt-5.4. macOS (M-series), release mode, warm HTTP/2 connections.
Reproduce: cargo run --example benchmark --features responses --release
Python Ecosystem (openai-oxide-python vs openai)
openai-oxide wins 10/12 tests. Native PyO3 bindings vs openai (openai 2.29.0).
| Test | openai-oxide | openai | Winner |
|---|---|---|---|
| Plain text | 845ms | 997ms | OXIDE (+15%) |
| Structured output | 1367ms | 1379ms | OXIDE (+1%) |
| Function calling | 1195ms | 1230ms | OXIDE (+3%) |
| Multi-turn (2 reqs) | 2260ms | 3089ms | OXIDE (+27%) |
| Web search | 3157ms | 3499ms | OXIDE (+10%) |
| Nested structured | 5377ms | 5339ms | python (+1%) |
| Agent loop (2-step) | 4570ms | 5144ms | OXIDE (+11%) |
| Rapid-fire (5 calls) | 5667ms | 6136ms | OXIDE (+8%) |
| Prompt-cached | 4425ms | 5564ms | OXIDE (+20%) |
| Streaming TTFT | 626ms | 638ms | OXIDE (+2%) |
| Parallel 3x | 1184ms | 1090ms | python (+9%) |
| Hedged (2x race) | 893ms | 995ms | OXIDE (+10%) |
median of medians, 3×5 iterations. Model: gpt-5.4.
Reproduce: cd openai-oxide-python && uv run python ../examples/bench_python.py
Node.js Ecosystem (openai-oxide vs openai)
openai-oxide wins 8/8 tests. Native napi-rs bindings vs official openai npm.
| Test | openai-oxide | openai | Winner |
|---|---|---|---|
| Plain text | 1075ms | 1311ms | OXIDE (+18%) |
| Structured output | 1370ms | 1765ms | OXIDE (+22%) |
| Function calling | 1725ms | 1832ms | OXIDE (+6%) |
| Multi-turn (2 reqs) | 2283ms | 2859ms | OXIDE (+20%) |
| Rapid-fire (5 calls) | 6246ms | 6936ms | OXIDE (+10%) |
| Streaming TTFT | 534ms | 580ms | OXIDE (+8%) |
| Parallel 3x | 1937ms | 1991ms | OXIDE (+3%) |
| WebSocket hot pair | 2181ms | N/A | OXIDE |
median of medians, 3×5 iterations. Model: gpt-5.4.
Reproduce: cd openai-oxide-node && BENCH_ITERATIONS=5 node examples/bench_node.js
How to run
Rust
cargo run --example benchmark --features responses --release
Python
cd openai-oxide-python && uv run python ../examples/bench_python.py
Node.js
cd openai-oxide-node && BENCH_ITERATIONS=5 node examples/bench_node.js
Methodology
- Warm connections: First request is a warmup (not measured). All subsequent requests reuse HTTP/2 connections with keep-alive.
- Median of medians: Each test runs 5 iterations per run, 3 runs total. We report the median of the 3 median values.
- Same prompts: Both clients send identical requests to the same model.
- Release mode: Rust benchmarks compiled with
--release. Python and Node use prebuilt native extensions.
Updating benchmarks
- Edit
benchmarks/results.jsonwith new numbers - Run
python3 benchmarks/generate.pyto regenerate tables - Docs and README include from generated files
API Reference
Full API documentation for each platform:
| Platform | Documentation |
|---|---|
| Rust | docs.rs/openai-oxide |
| Node.js | npmjs.com/package/openai-oxide |
| Python | pypi.org/project/openai-oxide |
Rust API
The Rust crate provides the most complete API surface. All endpoints are accessed through the OpenAI client via resource methods:
| Resource | Access | Docs |
|---|---|---|
| Chat Completions | client.chat().completions() | docs.rs |
| Responses | client.responses() | docs.rs |
| Embeddings | client.embeddings() | docs.rs |
| Images | client.images() | docs.rs |
| Audio | client.audio() | docs.rs |
| Files | client.files() | docs.rs |
| Fine-tuning | client.fine_tuning() | docs.rs |
| Models | client.models() | docs.rs |
| Moderations | client.moderations() | docs.rs |
| Batches | client.batches() | docs.rs |
| Uploads | client.uploads() | docs.rs |
| Assistants (beta) | client.beta().assistants() | docs.rs |
| Threads (beta) | client.beta().threads() | docs.rs |
| Runs (beta) | client.beta().runs() | docs.rs |
| Vector Stores (beta) | client.beta().vector_stores() | docs.rs |
OpenAI Docs → openai-oxide
openai-oxide has 1:1 parity with the official Python SDK. Use OpenAI’s documentation as your primary reference — the same concepts, parameter names, and patterns apply.
Endpoint Mapping
| OpenAI Guide | Rust | Node.js | Python |
|---|---|---|---|
| Chat Completions | client.chat().completions().create() | client.createResponse({model, input}) | await client.create(model, input) |
| Responses API | client.responses().create() | client.createText(model, input) | await client.create(model, input) |
| Streaming | client.responses().create_stream() | client.createStream(model, input) | await client.create_stream(model, input) |
| Function Calling | client.responses().create_stream_fc() | client.createResponse({model, input, tools}) | await client.create_with_tools(model, input, tools) |
| Structured Output | ResponseCreateRequest::new(model).text_format(schema) | client.createResponse({model, input, text}) | await client.create_structured(model, input, name, schema) |
| Embeddings | client.embeddings().create() | via createResponse() raw | via create_raw() |
| Image Generation | client.images().generate() | via createResponse() raw | via create_raw() |
| Text-to-Speech | client.audio().speech().create() | via createResponse() raw | via create_raw() |
| Speech-to-Text | client.audio().transcriptions().create() | via createResponse() raw | via create_raw() |
| Fine-tuning | client.fine_tuning().jobs().create() | via createResponse() raw | via create_raw() |
| Realtime API | client.ws_session() | client.wsSession() | — |
| Assistants | client.beta().assistants() | via createResponse() raw | via create_raw() |
Node.js and Python have typed helpers for the top 5 endpoints. All other endpoints work via raw JSON methods.
Parameter Names
Parameter names match the Python SDK exactly:
| OpenAI Python | Rust | Node.js |
|---|---|---|
model="gpt-5.4" | .model("gpt-5.4") | { model: "gpt-5.4" } |
max_output_tokens=100 | .max_output_tokens(100) | { maxOutputTokens: 100 } |
temperature=0.7 | .temperature(0.7) | { temperature: 0.7 } |
stream=True | create_stream() | createStream() |
store=True | .store(true) | { store: true } |
openai-oxide Exclusive Features
These features are not available in the official SDKs:
| Feature | API | Description |
|---|---|---|
| WebSocket Sessions | client.ws_session() | Persistent connection, 37% faster agent loops |
| Hedged Requests | hedged_request() | Race redundant requests, cut P99 latency |
| Stream FC Early Parse | create_stream_fc() | Execute tools 400ms before response finishes |
| SIMD JSON | features = ["simd"] | AVX2/NEON accelerated parsing |
| WASM | default-features = false | Full streaming in Cloudflare Workers |
Migration Guides
Switching to openai-oxide from another OpenAI client? These guides cover the key differences and provide side-by-side code comparisons.
- From openai-python — Migrate from the official Python SDK to Rust
- From async-openai — Migrate from the async-openai Rust crate
Migrating from openai-python
openai-oxide uses the same parameter names and resource structure as the official openai-python SDK. If you know the Python API, you already know openai-oxide.
Key Differences
| Python | Rust (openai-oxide) |
|---|---|
client = OpenAI() | let client = OpenAI::from_env()?; |
client.chat.completions.create(...) | client.chat().completions().create(...).await? |
client.responses.create(...) | client.responses().create(...).await? |
stream=True parameter | Separate create_stream() method |
| Dict / Pydantic models | Typed request/response structs |
None for optional fields | Option<T> with builder methods |
| Exception handling | Result<T, OpenAIError> |
Pattern: Python to Rust
# Python
response = client.responses.create(
model="gpt-5.4",
input="Hello",
max_output_tokens=100,
temperature=0.7,
)
#![allow(unused)]
fn main() {
// Rust
let response = client.responses().create(
ResponseCreateRequest::new("gpt-5.4")
.input("Hello")
.max_output_tokens(100)
.temperature(0.7)
).await?;
}
See the OpenAI Docs Mapping for a complete endpoint cross-reference.
Migrating from async-openai
openai-oxide and async-openai are both Rust OpenAI clients, but differ in API design, feature set, and architecture.
Key Differences
| async-openai | openai-oxide |
|---|---|
Client::new() | OpenAI::from_env()? |
CreateChatCompletionRequestArgs::default()...build()? | ChatCompletionRequest::new("model")... |
| Derive-macro builders | Manual builder methods (no proc macros) |
backoff crate for retries | Built-in configurable retry policy |
| No WebSocket support | Native WebSocket sessions |
| No WASM support | First-class WASM target |
| No hedged requests | Built-in hedged request support |
Pattern: async-openai to openai-oxide
#![allow(unused)]
fn main() {
// async-openai
let client = Client::new();
let request = CreateChatCompletionRequestArgs::default()
.model("gpt-5.4")
.messages(vec![ChatCompletionRequestUserMessageArgs::default()
.content("Hello")
.build()?
.into()])
.build()?;
let response = client.chat().create(request).await?;
}
#![allow(unused)]
fn main() {
// openai-oxide
let client = OpenAI::from_env()?;
let response = client.chat().completions().create(
ChatCompletionRequest::new("gpt-5.4")
.messages(vec![ChatMessage::user("Hello")])
).await?;
}
The main wins from switching: simpler builder API (no .build()? calls), WebSocket support, WASM compatibility, hedged requests, and feature-flag granularity.