Export Retry Mechanism¶

The Logfire SDK includes a robust retry mechanism for handling failed telemetry exports. This page explains how it works and what the warning messages mean.

Understanding the Warning Message¶

You may occasionally see a warning like this in your logs:

logfire - WARNING - Currently retrying 1 failed export(s) (7877 bytes)

This message indicates that the SDK failed to send telemetry data to the Logfire servers and has handed off the export to a background retry system. This is normal behavior when there are transient network issues, brief connectivity blips, or temporary server load.

How the Retry Mechanism Works¶

When the SDK fails to send telemetry data, it follows this process:

Immediate retry: Waits 1 second and retries once
Disk-based retry: If the immediate retry also fails, the payload is saved to disk and retried in a background daemon thread using exponential backoff

The disk-based retry system:

Saves failed exports to a temporary directory to conserve memory
Uses exponential backoff starting at 1 second, doubling on each failure up to a maximum of 128 seconds
Adds proportional jitter to spread out retry attempts
Logs warnings at most once per minute to avoid flooding your logs
Stores up to 512MB of failed exports before dropping new ones

When Is This a Problem?¶

Scenario	Interpretation
Occasional warnings with `retrying 1 failed export(s)`	Normal - exports are failing occasionally but recovering
The retry count grows (2, 3, 5+)	Investigate - exports have been consistently failing for multiple minutes
`dropping an export` error message	Action needed - the 512MB disk buffer is full, data is being lost

Non-Blocking Design¶

The retry mechanism is designed to minimize impact on your application:

Background thread: Retries run in a daemon thread, so they do not block your application's main thread or async event loop
Data persistence: Failed exports are saved to disk, so data won't be lost even if retries take a while
Automatic recovery: Once connectivity is restored, the backlog is sent automatically
Graceful shutdown: The daemon thread won't prevent your application from exiting

Troubleshooting¶

If you're seeing frequent retry warnings:

Check network connectivity: Verify that outbound HTTPS requests to Logfire servers are not being blocked by firewalls or network policies
Check for DNS issues: Ensure DNS resolution is working correctly for Logfire endpoints
Review resource usage: High CPU or memory usage can cause network timeouts
Upgrade the SDK: Newer SDK versions have improved retry logic that may reduce the frequency of these warnings:
```
pip install --upgrade logfire
```
Adjust timeout settings: If you're calling force_flush() (common in serverless environments), you can reduce the worst-case blocking time by lowering the OTLP timeout:
```
export OTEL_EXPORTER_OTLP_TIMEOUT=5000  # 5 seconds instead of default 10
```

Serverless Environments

In serverless environments like AWS Lambda, the SDK typically calls force_flush() at the end of each invocation. This is a blocking call that waits for exports to complete. If exports are failing, it could cause delays up to the configured timeout value.

Configuration¶

The retry mechanism uses these default values:

Setting	Value	Description
Max retry delay	128 seconds	Maximum time between retry attempts
Max disk buffer	512 MB	Maximum bytes of failed exports to store
Log interval	60 seconds	Minimum time between warning messages

These values are not currently configurable but are designed to work well for most use cases.