Quickstart

This quickstart runs a local end-to-end flow:

start the Runtime API
create a Conversation
send a user message
inspect the resulting Inference Request telemetry

1) Install

pnpm install

2) Configure environment

Create apps/server/.env:

DATABASE_URL=file:local.db
CORS_ORIGIN=http://localhost:5173
OPENROUTER_API_KEY=your_key_here
OPENROUTER_MODEL=openai/gpt-4o-mini

Notes:

OPENROUTER_MODEL is optional (defaults to openai/gpt-4o-mini).
OPENAI_MODEL is also supported as a fallback model env var.

3) Run the Runtime API

pnpm dev:server

Server starts on http://localhost:3000.

4) (Optional) sanity check the SDK demo CLI

This exercises @tardis/ai directly, outside the Runtime API:

pnpm -F demo start -- "Explain Turn vs Inference Request"

5) Create a Conversation

curl -s -X POST http://localhost:3000/conversations \
  -H 'content-type: application/json'

Copy conversation.id from the response.

6) Continue the Conversation with a user Message

curl -s -X POST http://localhost:3000/conversations/<conversationId>/messages \
  -H 'content-type: application/json' \
  -d '{"content":"Summarize canonical stream events in one sentence."}'

Response includes:

turn (the completed Turn)
message (the Committed Assistant Message)
inferenceRequest (the persisted Inference Request record)

Copy inferenceRequest.id.

7) Inspect the Inference Request

Full inspection:

curl -s http://localhost:3000/inference-requests/<inferenceRequestId>

Latency/summary view:

curl -s http://localhost:3000/inference-requests/<inferenceRequestId>/metrics

You should see canonical lifecycle events and derived timing fields (firstTokenLatencyMs, totalDurationMs).

8) (Optional) stream a Turn over NDJSON

curl -N -X POST http://localhost:3000/conversations/<conversationId>/messages/stream \
  -H 'content-type: application/json' \
  -d '{"content":"Stream this response."}'

Stream emits newline-delimited JSON events like assistant_delta, then completed.

Quickstart

On this page