Retry connections | Lattice Developers

Production integrations send thousands of requests over networks that can drop connections, throttle traffic, or return transient errors. A reusable retry utility lets you apply one consistent backoff policy across every API interaction instead of duplicating retry logic per call.

Before you begin

Complete the steps in Set up to configure your environment and install the SDK.
Decide which protocol you are using, REST or gRPC, as the error types and retry-eligible codes differ between protocols.

gRPC authentication

If you are using gRPC with client credentials, set up the token refresh module before running the examples on this page.

What the SDK retries

Before building your own utility, understand what the SDK already handles so you do not retry the same error twice.

The Lattice REST SDKs retry automatically on HTTP 5xx, 408, 409, and 429 responses. The SDK retry mechanism applies exponential backoff starting at 1 second, with a maximum of 60 seconds, with ±20% jitter, and honors Retry-After, retry-after-ms, and X-RateLimit-Reset headers. The default is 2 retries per call. You can override this per call with request_options={"max_retries": N}.

The gRPC SDKs do not implement retries automatically. You must implement retries for your Lattice integrations.

Avoid double retries

If you build a retry utility on top of the Lattice REST SDKs, set request_options={"max_retries": 0} on each SDK call to delegate the entire retry policy to your utility. Without this, both layers retry independently, and a single transient error can trigger up to (sdk_retries + 1) x utility_retries total attempts.

Build a retry utility

A single helper function that wraps any API call gives you one place to tune backoff parameters, adjust error classification, and add observability across your integration:

Define the retry function

The following utility loops up to max_retries times, sleeping for an exponentially increasing backoff between attempts. It re-raises immediately on errors that retrying will not fix.

1 from anduril import Lattice, \
2     Aliases, MilView, Location, Position, Ontology, Provenance
3 from anduril.core import ApiError
4 from datetime import datetime, timezone, timedelta
5 import asyncio
6 import httpx
7 import math
8 import os
9 import sys
10 from uuid import uuid4
11 
12 lattice_endpoint = os.getenv('LATTICE_ENDPOINT')
13 client_id = os.getenv('LATTICE_CLIENT_ID')
14 client_secret = os.getenv('LATTICE_CLIENT_SECRET')
15 
16 # Remove sandboxes_token from the following statements if you are not developing on Sandboxes.
17 sandboxes_token = os.getenv('SANDBOXES_TOKEN')
18 if not client_id or not client_secret or not lattice_endpoint or not sandboxes_token:
19     print("Missing required environment variables.")
20     sys.exit(1)
21 
22 client = Lattice(
23     base_url=f"https://{lattice_endpoint}",
24     client_id=client_id,
25     client_secret=client_secret,
26     # Remove the following header if you are not developing on Sandboxes.
27     headers={ "anduril-sandbox-authorization": f"Bearer {sandboxes_token}" }
28 )
29 
30 MAX_RETRIES = 3
31 INITIAL_BACKOFF_SECONDS = 1.0
32 
33 TERMINAL_STATUS_CODES = {400, 401, 403, 413}
34 
35 
36 async def retry_with_backoff(operation, *, max_retries=MAX_RETRIES, initial_backoff=INITIAL_BACKOFF_SECONDS):
37     for attempt in range(max_retries):
38         try:
39             return operation()
40         except ApiError as e:
41             # Terminal 4xx errors mean the request itself is wrong; retrying will not help.
42             if e.status_code in TERMINAL_STATUS_CODES:
43                 raise
44             if attempt < max_retries - 1:
45                 backoff = initial_backoff * (2 ** attempt)
46                 await asyncio.sleep(backoff)
47             else:
48                 raise
49         except (httpx.ConnectError, httpx.RemoteProtocolError):
50             if attempt < max_retries - 1:
51                 backoff = initial_backoff * (2 ** attempt)
52                 await asyncio.sleep(backoff)
53             else:
54                 raise
55 
56 
57 entity_id = str(uuid4())
58 radius_degrees = 0.1
59 creation_time = datetime.now(timezone.utc)
60 count = 0.0
61 
62 
63 def publish_track():
64     global count
65     count += 0.1
66     t = math.radians(count)
67     latest_timestamp = datetime.now(timezone.utc)
68 
69     client.entities.publish_entity(
70         entity_id=entity_id,
71         description="Friendly airplane",
72         is_live=True,
73         aliases=Aliases(
74             name="Airplane 1"
75         ),
76         created_time=creation_time,
77         expiry_time=latest_timestamp + timedelta(minutes=5),
78         ontology=Ontology(
79             template="TEMPLATE_TRACK",
80             platform_type="Airplane"
81         ),
82         mil_view=MilView(
83             disposition="DISPOSITION_FRIENDLY",
84             environment="ENVIRONMENT_AIR"
85         ),
86         location=Location(
87             position=Position(
88                 latitude_degrees=50.91402185768586 + (radius_degrees * math.cos(t)),
89                 longitude_degrees=0.79203612077257 + (radius_degrees * math.sin(t))
90             )
91         ),
92         provenance=Provenance(
93             integration_name="your_integration_name",
94             data_type="your_data_type",
95             source_update_time=latest_timestamp
96         ),
97         # The utility owns the retry policy, so disable the SDK's built-in retry layer to avoid double-retrying.
98         request_options={"max_retries": 0}
99     )
100 
101 
102 async def app():
103     try:
104         while True:
105             await retry_with_backoff(publish_track)
106             print(f"Published track with entity ID: {entity_id}")
107             await asyncio.sleep(5)
108     except asyncio.CancelledError:
109         print(">>>Exiting...")
110     except Exception as error:
111         print(f"Exception: {error}")
112 
113 
114 if __name__ == "__main__":
115     asyncio.run(app())

Classify which errors to retry

Not all errors benefit from a retry. Retrying a malformed request or an unauthorized call wastes time and can obscure bugs. Classify errors into two buckets, retryable and terminal, before deciding whether to back off or re-raise immediately.

The REST SDKs

This utility catches two broad error categories:

anduril.ApiError with a status_code — Inspect the code to decide.
httpx.ConnectError and httpx.RemoteProtocolError — Low-level transport failures that are always retryable.

Retryable status codes that should be retried:

404 Not Found — The resource might not yet be visible due to eventual consistency.
408 Request Timeout — The server did not receive the complete request in time, often due to transient network latency.
409 Conflict — A concurrent write contended with this request, so retrying after a short backoff often succeeds.
429 Too Many Requests — The client has been rate-limited, so back off before retrying.
5xx Server Error — The server encountered a transient failure that might clear on a subsequent attempt.

Terminal status codes that should not be retried:

400 Bad Request — The payload is malformed.
401 Unauthorized — Refresh credentials before retrying.
403 Forbidden — Retrying will not change permissions.
413 Payload Too Large — Reduce the payload size.

The gRPC SDKs

Extract the gRPC status code with status.Code(err) and check it against a map[codes.Code]bool.

Retryable codes that should be retried:

UNAVAILABLE — The server is temporarily unreachable.
DEADLINE_EXCEEDED — The call timed out before a response was received.
RESOURCE_EXHAUSTED — The client has been rate-limited or has exceeded a quota.
INTERNAL — The server encountered an unspecified failure that may be transient.
ABORTED — A concurrency conflict occurred, so retrying after a short backoff often succeeds.
UNKNOWN — An unrecognized error was returned, so treat it as transient.
NOT_FOUND — The resource might not yet be visible due to eventual consistency.

Terminal codes that should not be retried:

INVALID_ARGUMENT — Fix the request before retrying.
PERMISSION_DENIED — Retrying will not grant access.
UNAUTHENTICATED — Refresh the token before retrying.
FAILED_PRECONDITION — A required precondition is not met.
ALREADY_EXISTS — The resource already exists.

Apply the utility to any API call

Wrap the API call in a zero-argument, callable function, and pass it to the utility. The utility handles timing and error classification.

1 from anduril import Lattice, \
2     Aliases, MilView, Location, Position, Ontology, Provenance
3 from anduril.core import ApiError
4 from datetime import datetime, timezone, timedelta
5 import asyncio
6 import httpx
7 import math
8 import os
9 import sys
10 from uuid import uuid4
11 
12 lattice_endpoint = os.getenv('LATTICE_ENDPOINT')
13 client_id = os.getenv('LATTICE_CLIENT_ID')
14 client_secret = os.getenv('LATTICE_CLIENT_SECRET')
15 
16 # Remove sandboxes_token from the following statements if you are not developing on Sandboxes.
17 sandboxes_token = os.getenv('SANDBOXES_TOKEN')
18 if not client_id or not client_secret or not lattice_endpoint or not sandboxes_token:
19     print("Missing required environment variables.")
20     sys.exit(1)
21 
22 client = Lattice(
23     base_url=f"https://{lattice_endpoint}",
24     client_id=client_id,
25     client_secret=client_secret,
26     # Remove the following header if you are not developing on Sandboxes.
27     headers={ "anduril-sandbox-authorization": f"Bearer {sandboxes_token}" }
28 )
29 
30 MAX_RETRIES = 3
31 INITIAL_BACKOFF_SECONDS = 1.0
32 
33 TERMINAL_STATUS_CODES = {400, 401, 403, 413}
34 
35 
36 async def retry_with_backoff(operation, *, max_retries=MAX_RETRIES, initial_backoff=INITIAL_BACKOFF_SECONDS):
37     for attempt in range(max_retries):
38         try:
39             return operation()
40         except ApiError as e:
41             # Terminal 4xx errors mean the request itself is wrong; retrying will not help.
42             if e.status_code in TERMINAL_STATUS_CODES:
43                 raise
44             if attempt < max_retries - 1:
45                 backoff = initial_backoff * (2 ** attempt)
46                 await asyncio.sleep(backoff)
47             else:
48                 raise
49         except (httpx.ConnectError, httpx.RemoteProtocolError):
50             if attempt < max_retries - 1:
51                 backoff = initial_backoff * (2 ** attempt)
52                 await asyncio.sleep(backoff)
53             else:
54                 raise
55 
56 
57 entity_id = str(uuid4())
58 radius_degrees = 0.1
59 creation_time = datetime.now(timezone.utc)
60 count = 0.0
61 
62 
63 def publish_track():
64     global count
65     count += 0.1
66     t = math.radians(count)
67     latest_timestamp = datetime.now(timezone.utc)
68 
69     client.entities.publish_entity(
70         entity_id=entity_id,
71         description="Friendly airplane",
72         is_live=True,
73         aliases=Aliases(
74             name="Airplane 1"
75         ),
76         created_time=creation_time,
77         expiry_time=latest_timestamp + timedelta(minutes=5),
78         ontology=Ontology(
79             template="TEMPLATE_TRACK",
80             platform_type="Airplane"
81         ),
82         mil_view=MilView(
83             disposition="DISPOSITION_FRIENDLY",
84             environment="ENVIRONMENT_AIR"
85         ),
86         location=Location(
87             position=Position(
88                 latitude_degrees=50.91402185768586 + (radius_degrees * math.cos(t)),
89                 longitude_degrees=0.79203612077257 + (radius_degrees * math.sin(t))
90             )
91         ),
92         provenance=Provenance(
93             integration_name="your_integration_name",
94             data_type="your_data_type",
95             source_update_time=latest_timestamp
96         ),
97         # The utility owns the retry policy, so disable the SDK's built-in retry layer to avoid double-retrying.
98         request_options={"max_retries": 0}
99     )
100 
101 
102 async def app():
103     try:
104         while True:
105             await retry_with_backoff(publish_track)
106             print(f"Published track with entity ID: {entity_id}")
107             await asyncio.sleep(5)
108     except asyncio.CancelledError:
109         print(">>>Exiting...")
110     except Exception as error:
111         print(f"Exception: {error}")
112 
113 
114 if __name__ == "__main__":
115     asyncio.run(app())

A single utility keeps retry behavior consistent across your integration, so tuning the backoff policy or adding logging requires only a change in one place.

What’s next

Learn how to publish entities to Lattice.
Review Authentication to understand how credentials are refreshed.
Check Choose a protocol to decide between REST and gRPC for your integration.