Streaming

Egret supports real-time token streaming via Server-Sent Events (SSE), so users see responses as they're generated.

Enabling streaming

Add "stream": true to your query request:

POST /api/v1/rag/query/
Content-Type: application/json
Authorization: Api-Key ek_live_...

{
  "query": "What are the key GDPR principles?",
  "domain": "gdpr",
  "stream": true
}

SSE event format

The response is a text/event-stream with three event types:

Token event

event: token
data: {"text": "The key principles"}

Citation event

event: citation
data: {"document": "gdpr-principles.pdf", "section": "Art. 5", "text": "..."}

Done event

event: done
data: {"credits_used": 1, "session_id": "sess_xyz789"}

Client implementation

Use the native EventSource API or our typed client:

import { createClient } from "@egret/api";

const client = createClient({ apiKey: "ek_live_..." });

for await (const event of client.streamQuery({
  query: "What are the key GDPR principles?",
  domain: "gdpr",
})) {
  if (event.type === "token") {
    process.stdout.write(event.text);
  }
}

Why SSE over WebSockets?

LLM responses are unidirectional — the client sends one request and receives a stream of tokens. SSE works through proxies and CDNs, reconnects automatically, and requires no special server infrastructure.