Cut AI API costs.
Extend conversations.
One URL change.
Drop-in proxy for OpenAI, Anthropic, Google, xAI, DeepSeek and Meta. Session-aware compression that reduces token costs as conversations grow.
# That's it. Your existing code just works.
export OPENAI_BASE_URL=https://api.lexisaas.com/v1
Models supported
Providers, one endpoint
Per 1M tokens, flat
To integrate
Get started in minutes
No SDK. No code rewrite. Just one line.
Create an account
Sign up and get your API key. 5 million tokens free. No credit card required.
Change one line
Point your client to api.lexisaas.com/v1 — that's it.
Save on every call
Compression kicks in automatically. The longer the conversation, the more you save.
Python (OpenAI SDK)
from openai import OpenAI client = OpenAI( base_url="https://api.lexisaas.com/v1", api_key="your-lexi-key" ) # Everything else stays exactly the same response = client.chat.completions.create( model="gpt-4o", messages=messages )
JavaScript / TypeScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.lexisaas.com/v1", apiKey: "your-lexi-key" }); // Your existing code. No changes. const response = await client.chat.completions.create({ model: "gpt-4o", messages });
Calculate your savings
See exactly what LEXI saves you.
Compression increases with conversation length.
Without LEXI
$125.00
/ month
With LEXI
$75.00
/ month
You save
$50.00
40% savings
LEXI fee: $0.50 per 1M tokens processed. Compression rate scales with conversation length.
Simple, transparent pricing
$0.50 per 1M tokens. 5 million tokens free to start.
One flat rate. No tiers. No overage fees.
$2.50 credit. No credit card required.
Use your existing API keys from any provider. We never store or access your keys beyond proxying.
Rate limits grow with you
Spend more, get more. Automatically.
| Lifetime Spend | Rate Limit |
|---|---|
| $0 (free credits) | 120 req/min |
| $5+ | 2,000 req/min |
| $50+ | 10,000 req/min |
| $500+ | 30,000 req/min |
| $1,000+ | 60,000 req/min |
Need more? Contact us for custom limits.
How it works under the hood
Not a prompt hack. A proprietary compression architecture that learns your conversation and removes redundancy in real-time.
Session-Aware Compression
Each session builds a semantic profile of your conversation. The system identifies what the model already knows and removes redundancy, sending only what matters for each turn.
Adaptive Activation
Compression only activates when it saves tokens. The system compares compressed vs. original on every turn. If compression wouldn't help, your request passes through unmodified.
Constant Resource Usage
Memory and compute stay flat regardless of conversation length or session count. Your costs are predictable and bounded. No surprises at scale.
Intelligent Context Recall
A proprietary indexing system retrieves relevant context from past interactions instantly. The system learns what matters over time. Context persists across sessions.
Compression scales with conversation length
From internal testing. Actual compression varies by message content and topic.
Start saving on AI costs today
5 million tokens free on signup. No credit card required. One URL change and your existing code just works.