Enter response times for multiple API endpoints to compare performance side by side, classify latency quality, rank endpoints, and benchmark against industry standards for REST, GraphQL, gRPC, and Chat APIs.
| API Type | Excellent | Good | Acceptable | Slow | Critical |
|---|---|---|---|---|---|
| REST (simple read) | < 50ms | 50–150ms | 150–500ms | 500ms–1s | > 1s |
| REST (complex / DB) | < 100ms | 100–300ms | 300ms–1s | 1s–3s | > 3s |
| GraphQL | < 100ms | 100–300ms | 300ms–1s | 1s–3s | > 3s |
| gRPC | < 20ms | 20–50ms | 50–200ms | 200–500ms | > 500ms |
| WebSocket (message) | < 10ms | 10–50ms | 50–150ms | 150–500ms | > 500ms |
| Chat / LLM API (TTFT) | < 300ms | 300–800ms | 800ms–2s | 2s–5s | > 5s |
| Authentication API | < 50ms | 50–150ms | 150–400ms | 400ms–1s | > 1s |
Compare latency across different endpoints of the same API — identify which routes are slow and need optimization.
Compare REST vs GraphQL vs gRPC with appropriate benchmarks for each type — a 200ms gRPC response is slow, but 200ms REST is good.
Compare time-to-first-token between LLM providers. OpenAI, Anthropic, Gemini, and Mistral have different latency profiles depending on model size and load.
Enter the same endpoint with old and new latency values to measure the impact of caching, query optimization, or infrastructure changes.
EXPLAIN ANALYZELarge Language Model (LLM) APIs have fundamentally different latency profiles than traditional REST APIs. Key metrics to compare include Time to First Token (TTFT) — the latency before streaming begins — and total generation time which depends on output length.
For real-time chat applications, TTFT is the most important metric as it determines perceived responsiveness. A TTFT under 500ms feels fast even if total generation takes several seconds. When comparing chat APIs, measure TTFT separately from total response time and consider streaming support as a critical feature for reducing perceived latency.
For interactive web applications, API calls that users are waiting for should complete in under 200ms. Background API calls can tolerate up to 1000ms. Anything over 300ms for a user-triggered action starts to feel sluggish. The Nielsen Norman Group defines 100ms as the threshold for "instantaneous" response.
Percentile latency metrics describe the distribution of response times. p50 (median) is the latency that 50% of requests complete within. p95 means 95% of requests are faster than this value. p99 captures the slowest 1% of requests. p99 is critical for SLA monitoring because it surfaces outliers that affect real users even if average latency looks good.
Latency is the time for a single request to complete (measured in milliseconds). Throughput is the number of requests a system can handle per second (RPS — requests per second). An API can have low latency but low throughput (fast but not scalable), or high throughput but high latency (scalable but slow per request). Optimizing for both requires different strategies.
Light travels through fiber optic cables at about 200,000 km/s, adding roughly 5ms per 1,000km of round-trip distance. A request from London to a US East Coast server adds ~70ms of unavoidable network latency. CDNs and edge computing reduce this by serving responses from servers physically closer to the user.