Quickstart
Run the Runtime API locally and trace one Turn from request to inspection.
This quickstart runs a local end-to-end flow:
- start the Runtime API
- create a Conversation
- send a user message
- inspect the resulting Inference Request telemetry
1) Install
pnpm install2) Configure environment
Create apps/server/.env:
DATABASE_URL=file:local.db
CORS_ORIGIN=http://localhost:5173
OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=openai/gpt-4o-miniNotes:
OPENROUTER_MODELis optional (defaults toopenai/gpt-4o-mini).OPENAI_MODELis also supported as a fallback model env var.
3) Run the Runtime API
pnpm dev:serverServer starts on http://localhost:3000.
4) (Optional) sanity check the SDK demo CLI
This exercises @tardis/ai directly, outside the Runtime API:
pnpm -F demo start -- "Explain Turn vs Inference Request"5) Create a Conversation
curl -s -X POST http://localhost:3000/conversations \
-H 'content-type: application/json'Copy conversation.id from the response.
6) Continue the Conversation with a user Message
curl -s -X POST http://localhost:3000/conversations/<conversationId>/messages \
-H 'content-type: application/json' \
-d '{"content":"Summarize canonical stream events in one sentence."}'Response includes:
turn(the completed Turn)message(the Committed Assistant Message)inferenceRequest(the persisted Inference Request record)
Copy inferenceRequest.id.
7) Inspect the Inference Request
Full inspection:
curl -s http://localhost:3000/inference-requests/<inferenceRequestId>Latency/summary view:
curl -s http://localhost:3000/inference-requests/<inferenceRequestId>/metricsYou should see canonical lifecycle events and derived timing fields (firstTokenLatencyMs, totalDurationMs).
8) (Optional) stream a Turn over NDJSON
curl -N -X POST http://localhost:3000/conversations/<conversationId>/messages/stream \
-H 'content-type: application/json' \
-d '{"content":"Stream this response."}'Stream emits newline-delimited JSON events like assistant_delta, then completed.