Agentic Orchestration · Compare

LangSmith vs Helicone vs Phoenix vs Langfuse: The 2026 LLM Observability Buyer's Guide

Four LLM observability platforms, two ownership shifts in 2026, and a buying decision that depends more on framework affinity and corporate posture than on feature lists. Here is the head-to-head that matters in the post-acquisition landscape.

The bottom line

These four tools dominate LLM observability shortlists in 2026, but the category shifted materially in the first quarter of the year. Langfuse was acquired by ClickHouse in January 2026 as part of the ClickHouse $400M Series D, and Helicone was acquired by Mintlify in March 2026 and entered maintenance mode shortly after. LangChain raised a $125M Series B at a $1.25B valuation in October 2025, accelerating LangSmith investment. Arize Phoenix did not change ownership and continues on its OpenTelemetry-native trajectory. If a search result tells you otherwise, it is older than the current state of the market.

The picking heuristic is simple. LangChain shop: LangSmith. Want true open-source with enterprise polish: Langfuse. OpenTelemetry-purist or LlamaIndex-heavy: Arize Phoenix. Need zero-code instrumentation right now, accept maintenance-mode risk: Helicone.

Key Takeaways

LangChain shop → LangSmith. Anything else → reconsider. — LangSmith is the tightest LangChain/LangGraph integration in the category, with the deepest agent-tracing UX inside that ecosystem. Outside LangChain, the framework-affinity gap evaporates and the other three become more attractive. LangChain raised a $125M Series B at a $1.25B valuation in October 2025, which has accelerated investment in LangSmith but has not made it less coupled to the parent framework.
Langfuse is now a ClickHouse property. That changes the calculus. — ClickHouse acquired Langfuse in January 2026 as part of its $400M Series D. The MIT license and self-host parity remain, and the v3 release shipped ClickHouse-backed analytics, prompt experiments, and Datasets v2. For teams that want true open-source with enterprise polish, Langfuse is now the safest single bet — but factor in that the acquirer is an OLAP database company, which shapes the roadmap toward analytics depth over agent-UX depth.
Helicone is in maintenance mode. Treat it as a tactical pick, not strategic. — Mintlify acquired Helicone in March 2026 and the product entered maintenance mode shortly after. The drop-in HTTP proxy is still the fastest zero-code instrumentation in the category, and the OSS Apache 2.0 license means existing deployments are safe. But new strategic commitments should price in that active feature development has slowed materially. Use Helicone for fast cost visibility today; do not build a multi-year observability roadmap around it.
Phoenix is the OTel-native, framework-neutral pick. — Arize Phoenix is the strongest fit if you are committed to OpenTelemetry and want a vendor-neutral tracing layer that handles LangChain, LlamaIndex, and DSPy through OpenInference auto-instrumentors. Phoenix Evals is the most mature open evaluation library in the category. The self-host tier (Elastic License 2.0) is free; the closed Arize AX tier covers production observability at scale.

At-a-glance comparison

	LangSmith	Helicone	Arize Phoenix	Langfuse
Vendor	LangChain, Inc.	Mintlify (acquired Mar 2026)	Arize AI	ClickHouse (acquired Jan 2026)
License	Proprietary; closed	Apache 2.0 (OSS, maintenance)	Elastic License 2.0 (source-available)	MIT (true OSS)
Self-host	Enterprise tier only	Helm (slowed roadmap)	Free, full parity	Free, full parity
Integration model	SDK-native (LangChain/LangGraph), OTel, OpenAI wrapper	HTTP proxy + AI gateway	OpenTelemetry / OpenInference	OpenTelemetry + SDKs (Python, JS)
Framework affinity	LangChain-first	Framework-agnostic	LlamaIndex, LangChain, DSPy	Framework-agnostic; deep LangChain + LiteLLM
Free tier	Developer: 5k traces/mo, 14-day retention	10k requests/mo	Phoenix self-host free; AX Free 25k spans/mo	Hobby: 50k units/mo
Paid entry	Plus $39/seat/mo + $2.50/1k overage	Pro $79/mo	AX Pro $50/mo (50k spans, 10GB)	Core $29/mo, Pro $199/mo
Latest funding / event	Series B $125M, $1.25B valuation (Oct 2025)	Acquired by Mintlify (Mar 2026)	Series C $70M (2024); no 2026 round disclosed	Acquired by ClickHouse (Jan 2026)
2025-2026 launch	LangGraph Platform GA, Agent Evals, Studio	AI Gateway 2.0 (pre-acquisition)	Phoenix 6.x with session-level tracing, eval templates	v3 release: ClickHouse analytics, prompt experiments, Datasets v2
One-line differentiator	Default for LangChain/LangGraph stacks	Drop-in proxy = zero-code instrumentation	OTel-native with the strongest open eval library	True open-source with ClickHouse-backed analytics

What the 2026 ownership shifts actually mean

Langfuse → ClickHouse (January 2026)

ClickHouse picked up Langfuse as part of its $400M Series D, valuing the combined company at $15B. The MIT license stayed in place, the self-host story stayed intact, and the v3 release shipped ClickHouse-backed analytics, prompt experiments, and Datasets v2 within weeks of the acquisition closing. The strategic read: ClickHouse wanted an LLM-native ingestion product on top of its OLAP engine, and Langfuse wanted infrastructure depth and a corporate parent. For users, this is a low-risk acquisition — the OSS commitment is credible and the roadmap is moving toward more analytics, not less.

Helicone → Mintlify (March 2026)

Mintlify, the developer-documentation company, picked up Helicone in March 2026. Terms were undisclosed. The Apache 2.0 OSS code remains, but active development entered maintenance mode shortly after the acquisition. Existing Helicone deployments are safe and the proxy model continues to work; the strategic concern is that new feature work has slowed and the AI Gateway product is now best treated as feature-complete rather than evolving. Use Helicone for tactical fast wins on cost visibility; do not build a multi-year observability roadmap around it.

LangChain Series B (October 2025)

LangChain raised $125M led by IVP at a $1.25B valuation in October 2025, bringing total funding to roughly $260M. The capital is funding LangGraph Platform commercialization, Agent Evals, and LangSmith Studio. The trade-off for buyers: more investment in LangSmith depth, more pressure to standardize on the LangChain ecosystem. For teams already deep on LangChain this is unambiguously good. For teams considering whether to commit, the pressure is now higher than it was a year ago.

When to pick each one

Pick LangSmith when

Your stack is built on LangChain, LangGraph, or both
You want the tightest agent-tracing UX inside the LangChain ecosystem
You are willing to standardize on LangChain as the orchestration layer
You prefer a managed-SaaS commercial model over self-hosting

Pick Langfuse when

You want true open-source with a credible corporate parent (ClickHouse)
You need full self-host parity to the cloud product
Your stack is multi-framework (LangChain, LlamaIndex, OpenAI SDK, LiteLLM, custom)
You value MIT licensing for compliance or distribution reasons

Pick Arize Phoenix when

You are committed to OpenTelemetry as the tracing standard
You run both traditional ML and LLM agents and want one observability stack
LlamaIndex or DSPy are first-class in your codebase
Eval depth (Phoenix Evals) matters as much as trace visualization

Pick Helicone when

You need cost visibility on raw API calls today with no instrumentation code
You accept that the product is in maintenance mode after the Mintlify acquisition
You want a proxy-based deployment that doesn't touch your application code
You will revisit the choice in 12 months if active development does not resume

Frequently asked questions

LangSmith vs Helicone vs Phoenix vs Langfuse — which should I pick?

Pick LangSmith if your stack is built on LangChain or LangGraph and you want the tightest agent-tracing UX inside that ecosystem. Pick Langfuse if you want true open-source (MIT) with full self-host parity and you are comfortable with ClickHouse as the new owner. Pick Arize Phoenix if you are committed to OpenTelemetry and want a framework-neutral tracing layer, especially for LlamaIndex or DSPy workloads. Pick Helicone if you need zero-code instrumentation today and can accept that the product is now in maintenance mode following the Mintlify acquisition.

What is the difference between LangSmith and Langfuse?

LangSmith is LangChain's proprietary observability product, deeply coupled to the LangChain and LangGraph ecosystem, with SaaS pricing starting at $39/seat/month on the Plus tier. Langfuse is the open-source (MIT) alternative — framework-agnostic, with full self-host parity to its cloud version, and now backed by ClickHouse following the January 2026 acquisition. The most practical difference is what happens when you change frameworks: LangSmith becomes awkward outside LangChain; Langfuse stays useful. The most practical similarity is that both produce excellent trace visualization for agent reasoning chains.

Is Langfuse better than LangSmith for LLM observability?

For teams deep on LangChain, no — LangSmith's native integration is hard to beat. For teams that use multiple frameworks (LangChain alongside LlamaIndex, custom OpenAI SDK code, LiteLLM, or vendor SDKs), yes — Langfuse is usually the better long-term bet because of its framework-agnostic design, MIT license, and full self-hostable parity. The ClickHouse acquisition adds operational confidence; the Hobby tier (free, 50k events/month) and Core tier ($29/month) make it accessible at the smallest team sizes.

What does Arize Phoenix observe that the others miss?

Phoenix is built around ML observability broadly, not just LLMs. It covers LLM traces, retrieval quality (critical for RAG agents), embedding drift, and model performance monitoring in one place. The OpenInference instrumentors give it first-class coverage of LangChain, LlamaIndex, and DSPy without forcing you to commit to any one framework. Phoenix Evals is the most mature open evaluation library, which matters once you move from "is the agent working?" to "is the agent improving?" For teams running both traditional ML and LLM agents in the same stack, Phoenix is the natural choice.

Why is Helicone different from the other three?

Helicone is an HTTP proxy, not an SDK. You change the base URL in your environment and every LLM call gets logged automatically — no instrumentation code, no decorators, no SDK wrappers. That makes it the fastest tool in the category to deploy. The trade-off is visibility depth: because it works at the HTTP layer, it sees requests and responses but has less native context on agent reasoning chains than tools that use SDK instrumentation. Best for teams that need immediate cost visibility on raw API calls with minimal setup. Worst for teams that need deep agent-step tracing.

LangSmith vs Helicone vs Phoenix vs Langfuse pricing comparison?

LangSmith: Free Developer (5k traces/month, 14-day retention); Plus $39/seat/month + $2.50 per 1k overage; Enterprise custom. Helicone: Free 10k requests/month; Pro $79/month; Team $799/month; Enterprise custom. Arize Phoenix: self-host free (Elastic License 2.0); Arize AX cloud has Free 25k spans/month, Pro $50/month for 50k spans, Enterprise custom. Langfuse: Hobby free (50k units/month); Core $29/month; Pro $199/month; Enterprise $2,499/month; overage $8 per 100k units. Self-host is free across Phoenix and Langfuse; the cloud SaaS pricing is the more comparable axis for most teams.

Which has the best agent templates and prebuilt evals?

For prebuilt evaluation templates, Arize Phoenix Evals is the strongest open library in the category, with reference implementations for hallucination, relevance, toxicity, faithfulness, and RAG-specific metrics. Langfuse Datasets v2 (shipped 2026 alongside the ClickHouse integration) is closing the gap and is the strongest option for teams that want eval workflows tightly integrated with the same tool they use for tracing. LangSmith's eval library is good inside LangChain but does not extend as cleanly to non-LangChain code. Helicone is not the right pick if eval depth is a priority.

Self-host or cloud — which makes sense?

Self-host when data sovereignty, regulatory compliance, or token-level data sensitivity rules out a third-party cloud. Langfuse (MIT) and Arize Phoenix (Elastic License 2.0) are the credible self-host options; both ship full feature parity to their cloud versions. Helicone is self-hostable via Helm but feature investment has slowed post-acquisition. LangSmith self-host exists on Enterprise tier only and is closed-source. Cloud is the right answer for most early-stage teams: faster setup, no infrastructure overhead, and the free tiers on all four tools are generous enough for evaluation and small production workloads.

How does LangSmith compare to Helicone, Phoenix, and Langfuse for cost tracking?

All four track token spend per call, per session, and per user. The meaningful differences are in alerting and aggregation. Helicone's proxy model makes per-route cost visibility trivial. Langfuse has the strongest cost-per-experiment view (helpful when iterating prompts). Phoenix surfaces cost in the same plane as quality metrics, which is useful when trade-offs are explicit. LangSmith's cost tracking is good inside LangChain runs but less granular for hybrid stacks. For preventing a runaway retry loop from burning your monthly budget overnight, Helicone's alerting is the most mature; Langfuse v3 closed much of the gap in 2026.

Agent Observability: Langfuse vs LangSmith vs Arize Phoenix vs Helicone — the broader pillar piece introducing the category
Agent Framework Comparison — picking the orchestration layer that sits beneath observability
Agent Topology Patterns — how the agent architecture shapes what you need to observe

The bottom line

Key Takeaways

At-a-glance comparison

What the 2026 ownership shifts actually mean

Langfuse → ClickHouse (January 2026)

Helicone → Mintlify (March 2026)

LangChain Series B (October 2025)

When to pick each one

Pick LangSmith when

Pick Langfuse when

Pick Arize Phoenix when

Pick Helicone when

Frequently asked questions

LangSmith vs Helicone vs Phoenix vs Langfuse — which should I pick?

What is the difference between LangSmith and Langfuse?

Is Langfuse better than LangSmith for LLM observability?

What does Arize Phoenix observe that the others miss?

Why is Helicone different from the other three?

LangSmith vs Helicone vs Phoenix vs Langfuse pricing comparison?

Which has the best agent templates and prebuilt evals?

Self-host or cloud — which makes sense?

How does LangSmith compare to Helicone, Phoenix, and Langfuse for cost tracking?

Related

Tech meets endurance

No comments yet. Be the first!