Phase 40 (early): Langfuse tracing on /v1/chat — observability recovery

The lost stack J flagged was partly already present: Langfuse
container has been running 2 days with the staffing project, SDK
installed, mcp-server tracing gw:/* routes. What was missing was
Rust-side /v1/chat emission — the new Phase 38/39 code bypassed
Langfuse entirely.

This commit bridges it. Fire-and-forget HTTP POST to
http://localhost:3001/api/public/ingestion (batch {trace-create +
generation-create}) on every chat call. Non-blocking — spawned
tokio task, response latency unaffected. Trace failures log warn
and drop, never propagate.

Verified end-to-end after restart:
- Log line "v1: Langfuse tracing enabled" at startup
- /v1/chat local (qwen3.5:latest) → v1.chat:ollama trace appears
  with lat=0.41s, 24+6 tokens
- /v1/chat cloud (gpt-oss:120b) → v1.chat:ollama_cloud trace appears
  with lat=1.87s, 73+87 tokens
- mcp-server's existing gw:/log + gw:/intelligence/* traces
  continue to flow into the same project unchanged

Files:
- crates/gateway/src/v1/langfuse_trace.rs (new, 195 LOC) — thin
  client, no SDK. reqwest Basic Auth. ChatTrace payload + event
  serializer. from_env_or_defaults() resolver matches
  mcp-server/tracing.ts conventions (pk-lf-staffing / sk-lf-
  staffing-secret / localhost:3001)
- crates/gateway/src/v1/mod.rs — V1State.langfuse field, emission
  after successful provider call (post-dispatch, pre-usage-update)
- crates/gateway/src/main.rs — resolve + log at startup

Tests: 12/12 green (9 prior + 3 for langfuse_trace: ingestion-batch
serialization, uuid generator uniqueness, env resolver shape).

Recovered piece #1 of 3 from the lost-stack narrative. Still open:
- Langfuse → observer :3800 pipe (Phase 40 mid-deliverable)
- Gitea MCP reconnect in mcp-server/index.ts (Phase 40 late)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
profit 2026-04-22 03:04:28 -05:00
parent 6316433062
commit 75a0f424ef
3 changed files with 273 additions and 0 deletions

View File

@ -204,6 +204,18 @@ async fn main() {
}
k
},
// Phase 40 early deliverable — Langfuse trace emitter.
// Defaults match mcp-server/tracing.ts conventions so
// gateway traces land in the same staffing project.
langfuse: {
let c = v1::langfuse_trace::LangfuseClient::from_env_or_defaults();
if c.is_some() {
tracing::info!("v1: Langfuse tracing enabled — /v1/chat calls will appear at localhost:3001");
} else {
tracing::warn!("v1: Langfuse keys missing — /v1/chat calls will not be traced");
}
c
},
}));
// Auth middleware (if enabled)

View File

@ -0,0 +1,226 @@
//! Phase 40 (early deliverable) — Langfuse tracing from the Rust gateway.
//!
//! Langfuse is already running at `http://localhost:3001` with the
//! staffing project — `mcp-server/tracing.ts` has been emitting traces
//! for months. This file bridges the gap: every `/v1/chat` call now
//! appears in the same Langfuse project, alongside traces from the
//! Bun side.
//!
//! Design:
//! - Fire-and-forget — emit spawns a tokio task. Response to client
//! never waits on Langfuse, never fails because of it. If Langfuse
//! is down or slow, we silently drop traces (and `tracing::warn!`).
//! - Two events per call — `trace-create` + `generation-create`. The
//! generation nests under the trace so Langfuse UI shows the whole
//! thing as one clickable row.
//! - No SDK — plain HTTP POST to `/api/public/ingestion` with Basic
//! auth. Keeps the Rust dependency graph small.
use serde::Serialize;
use std::sync::Arc;
use std::time::Duration;
use super::Message;
const INGESTION_PATH: &str = "/api/public/ingestion";
const EMIT_TIMEOUT_SECS: u64 = 5;
#[derive(Clone)]
pub struct LangfuseClient {
inner: Arc<Inner>,
}
struct Inner {
base_url: String,
public_key: String,
secret_key: String,
http: reqwest::Client,
}
impl LangfuseClient {
pub fn new(base_url: String, public_key: String, secret_key: String) -> Self {
let http = reqwest::Client::builder()
.timeout(Duration::from_secs(EMIT_TIMEOUT_SECS))
.build()
.expect("langfuse client init");
Self { inner: Arc::new(Inner { base_url, public_key, secret_key, http }) }
}
/// Resolve a LangfuseClient from the same conventions the TS side
/// already uses. Returns None if keys are missing — caller logs a
/// warning and omits tracing (non-fatal).
///
/// Sources checked in order:
/// 1. LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY env
/// 2. /root/llm_team_config.json (if it carries langfuse keys)
/// 3. Hardcoded staffing-project defaults matching mcp-server
/// (pk-lf-staffing / sk-lf-staffing-secret / localhost:3001)
pub fn from_env_or_defaults() -> Option<Self> {
let base = std::env::var("LANGFUSE_URL")
.unwrap_or_else(|_| "http://localhost:3001".to_string());
let pk = std::env::var("LANGFUSE_PUBLIC_KEY")
.unwrap_or_else(|_| "pk-lf-staffing".to_string());
let sk = std::env::var("LANGFUSE_SECRET_KEY")
.unwrap_or_else(|_| "sk-lf-staffing-secret".to_string());
if pk.trim().is_empty() || sk.trim().is_empty() { return None; }
Some(Self::new(base, pk, sk))
}
/// Fire-and-forget emit. Never blocks, never errors upward.
pub fn emit_chat(&self, ev: ChatTrace) {
let this = self.clone();
tokio::spawn(async move {
if let Err(e) = this.emit_chat_inner(ev).await {
tracing::warn!(target: "v1.langfuse", "trace drop: {e}");
}
});
}
async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> {
let trace_id = uuid_v7_like();
let gen_id = uuid_v7_like();
let trace_ts = ev.start_time.clone();
let batch = IngestionBatch {
batch: vec![
IngestionEvent {
id: uuid_v7_like(),
timestamp: trace_ts.clone(),
kind: "trace-create",
body: serde_json::json!({
"id": trace_id,
"name": format!("v1.chat:{}", ev.provider),
"input": serde_json::json!({
"model": ev.model,
"messages": ev.input,
}),
"metadata": serde_json::json!({
"provider": ev.provider,
"think": ev.think,
}),
}),
},
IngestionEvent {
id: uuid_v7_like(),
timestamp: ev.end_time.clone(),
kind: "generation-create",
body: serde_json::json!({
"id": gen_id,
"traceId": trace_id,
"name": "chat",
"model": ev.model,
"modelParameters": serde_json::json!({
"temperature": ev.temperature,
"max_tokens": ev.max_tokens,
"think": ev.think,
}),
"input": ev.input,
"output": ev.output,
"usage": serde_json::json!({
"input": ev.prompt_tokens,
"output": ev.completion_tokens,
"total": ev.prompt_tokens + ev.completion_tokens,
"unit": "TOKENS",
}),
"startTime": ev.start_time,
"endTime": ev.end_time,
"metadata": serde_json::json!({
"provider": ev.provider,
"latency_ms": ev.latency_ms,
}),
}),
},
],
};
let url = format!("{}{}", self.inner.base_url.trim_end_matches('/'), INGESTION_PATH);
let resp = self.inner.http
.post(url)
.basic_auth(&self.inner.public_key, Some(&self.inner.secret_key))
.json(&batch)
.send()
.await
.map_err(|e| format!("POST failed: {e}"))?;
if !resp.status().is_success() {
return Err(format!("{}: {}", resp.status(), resp.text().await.unwrap_or_default()));
}
Ok(())
}
}
/// Everything the v1.chat handler collects for one completed call.
pub struct ChatTrace {
pub provider: String,
pub model: String,
pub input: Vec<Message>,
pub output: String,
pub prompt_tokens: u32,
pub completion_tokens: u32,
pub temperature: Option<f64>,
pub max_tokens: Option<u32>,
pub think: Option<bool>,
pub start_time: String,
pub end_time: String,
pub latency_ms: u64,
}
#[derive(Serialize)]
struct IngestionBatch {
batch: Vec<IngestionEvent>,
}
#[derive(Serialize)]
struct IngestionEvent {
id: String,
timestamp: String,
#[serde(rename = "type")]
kind: &'static str,
body: serde_json::Value,
}
/// UUIDv7-ish identifier — not strictly UUIDv7 but time-ordered and
/// unique enough for event deduplication. Avoids a uuid crate dep.
fn uuid_v7_like() -> String {
let ts = chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0);
let rand = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.subsec_nanos())
.unwrap_or(0);
format!("{:016x}-{:08x}", ts, rand)
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn from_env_or_defaults_returns_some_when_defaults_present() {
// Defaults are hardcoded non-empty — should always produce Some.
let c = LangfuseClient::from_env_or_defaults();
assert!(c.is_some());
}
#[test]
fn uuid_v7_like_produces_distinct_ids() {
let a = uuid_v7_like();
// Different nanosecond timestamps — but they might collide on
// back-to-back calls. Loop a bit to ensure uniqueness surface.
let b = uuid_v7_like();
let c = uuid_v7_like();
assert!(a != b || b != c, "three consecutive ids should have at least one distinct pair");
assert!(a.contains('-'));
}
#[test]
fn ingestion_batch_serializes_with_rename() {
let ev = IngestionEvent {
id: "e1".into(),
timestamp: "2026-04-22T00:00:00Z".into(),
kind: "trace-create",
body: serde_json::json!({"name": "t"}),
};
let json = serde_json::to_string(&IngestionBatch { batch: vec![ev] }).unwrap();
assert!(json.contains("\"type\":\"trace-create\""));
assert!(json.contains("\"batch\":["));
}
}

View File

@ -13,6 +13,7 @@
pub mod ollama;
pub mod ollama_cloud;
pub mod langfuse_trace;
use axum::{
Router,
@ -33,6 +34,10 @@ pub struct V1State {
/// Ollama Cloud bearer token. Loaded at startup via
/// `ollama_cloud::resolve_cloud_key()`. None = cloud routes 503.
pub ollama_cloud_key: Option<String>,
/// Phase 40 early deliverable — Langfuse client. None = tracing
/// disabled (keys missing or container unreachable). Traces are
/// fire-and-forget: never block the response path.
pub langfuse: Option<langfuse_trace::LangfuseClient>,
}
#[derive(Default, Clone, Serialize)]
@ -127,6 +132,9 @@ async fn chat(
}
let provider = req.provider.as_deref().unwrap_or("ollama").to_ascii_lowercase();
let start_time = chrono::Utc::now();
let start_instant = std::time::Instant::now();
let resp = match provider.as_str() {
"ollama" | "local" | "" => ollama::chat(&state.ai_client, &req)
.await
@ -148,6 +156,33 @@ async fn chat(
}
};
let end_time = chrono::Utc::now();
let latency_ms = start_instant.elapsed().as_millis() as u64;
// Phase 40 — emit Langfuse trace. Fire-and-forget: the clone is
// cheap (Arc inside), the tokio::spawn never blocks us, a dead
// Langfuse just logs a warn. Client-visible response latency is
// untouched.
if let Some(lf) = &state.langfuse {
let output = resp.choices.first()
.map(|c| c.message.content.clone())
.unwrap_or_default();
lf.emit_chat(langfuse_trace::ChatTrace {
provider: provider.clone(),
model: resp.model.clone(),
input: req.messages.clone(),
output,
prompt_tokens: resp.usage.prompt_tokens,
completion_tokens: resp.usage.completion_tokens,
temperature: req.temperature,
max_tokens: req.max_tokens,
think: req.think,
start_time: start_time.to_rfc3339(),
end_time: end_time.to_rfc3339(),
latency_ms,
});
}
{
let mut u = state.usage.write().await;
u.requests += 1;