Persistent End Game via SQS #
By default, game servers send EndGame requests directly to the Pragma backend via RPC. This requires the backend to be available at the moment the game ends. If the platform is down for maintenance, game servers must either wait or drain active games before the maintenance window.
With the SQS-based end game flow, game servers post EndGame requests to an Amazon SQS queue via an API Gateway. The Pragma backend reads from this queue asynchronously, allowing game servers to continue posting end-game data even when the platform is temporarily unavailable.
Provisioned Infrastructure #
Contact your Pragma CSM to enable the SQS on your shard. After the infra is updated you will have the following:
- SQS queue — the main queue where game servers send
EndGamemessages. - API Gateway — an HTTP endpoint that fronts the SQS queue. Game servers POST to this endpoint with an API key for authentication.
- Dead Letter Queue (DLQ) — a secondary queue that receives messages that fail processing after repeated retries.
SQS retention policy #
The SQS queue is configured with the following retention policy:
| Setting | Value |
|---|---|
| Visibility timeout | 5 minutes |
| Max receive count (retries) | 100 |
| Message retention (main queue) | 4 days |
| Message retention (DLQ) | 14 days |
The Dead Letter Queue is a graveyard for messages that have exceeded the maximum retry count. Nothing on the Pragma side reads from the DLQ automatically. Its purpose is to move bad messages out of the main queue so that processing is not blocked. You can inspect the DLQ to find failed requests and diagnose issues.
Architecture overview #
The SQS end game flow works as follows:
- During
LinkV1orCreateAndLinkV1, the platform includesSqsConnectionInfo(invoke URL, API key, queue URL, and region) in the response if theSqsConnectionInfoConfigshared config is set. These config values are set automatically by Pragma infrastructure if you have the SQS feature turned on. Reach out to your Pragma CSM to enable SQS for your shards. - When the game ends, the SDK checks if
bSqsPassthroughis enabled and SQS connection info is available. - The SDK POSTs the
EndGameV1request to the SQS queue - The
SqsReaderServiceon the Pragma backend reads its queue URL and region from theSqsConnectionInfoConfigshared config and continuously polls the SQS queue. - Each message is deserialized and forwarded internally to the correct gateway, which processes it as a normal
EndGameV1RPC. - On successful processing, the message is deleted from SQS.
Fallback behavior #
The SQS flow is opt-in on both the platform and SDK sides and degrades gracefully:
- If the
SqsConnectionInfoConfigshared config is not set, the platform does not include SQS details inLinkV1orCreateAndLinkV1responses. - If the SDK has
bSqsPassthroughenabled but did not receive SQS connection info, it falls back to the normal direct RPC flow. - This means you can enable the SDK flag before the platform-side config is ready, and the SDK will continue to work normally.
Idempotency #
EndGame processing is idempotent based on GameInstanceId. The platform tracks which game instances have been processed in the end_game_processed database table. If the same EndGame request is processed a second time (for example, because SQS redelivered the message), the second invocation is a no-op.
This is critical because SQS provides at-least-once delivery, meaning messages may be delivered more than once.
Error handling and retry behavior #
When the SqsReaderService processes a message, there are three possible outcomes:
- Success — the message is deleted from SQS.
- Service error — the message is not deleted from SQS. After the visibility timeout expires (5 minutes), SQS redelivers the message for retry. Service errors represent transient issues (such as the backend being temporarily unavailable) and are always retried.
- Application error — the behavior depends on whether the error code is in the
applicationErrorsToRetryallowlist:- Allowlisted: the message is not deleted (retried via SQS).
- Not allowlisted: the message is deleted (non-retryable).
By default, GameInstanceApplicationErrors.UnknownGameInstanceIdApplicationError is allowlisted for retry. This makes the system resilient to ordering issues where an EndGame message arrives before the game instance is fully initialized, or before the platform has started up after a restart.
You can configure the allowlist in the SqsReaderServiceConfig. The applicationErrorsToRetry is a configMap<String, String> and only the keys are used to determine which errors are allowlisted. The values of this config map are ignored. Here is a sample config:
game:
serviceConfigs:
SqsReaderServiceConfig:
applicationErrorsToRetry:
"YourCustomErrors.TheCustomApplicationError": ""
"YourCustomErrors.AnotherCustomApplicationError": ""
Dead Letter Queue (DLQ) #
After exceeding the maximum retry count (default: 100 receives), messages are automatically moved from the main SQS queue to the Dead Letter Queue.
| Setting | Value |
|---|---|
| Max receive count before DLQ | 100 |
| Main queue retention | 4 days |
| DLQ retention | 14 days |
The DLQ is a graveyard — nothing on the Pragma side reads from it automatically. Its purpose is to get bad messages out of the main queue so that processing is not blocked by messages that will never succeed.
You can inspect the DLQ to find failed requests and attempt to diagnose or fix them. For example, a message may have failed because it contained malformed data or referenced an invalid game instance configuration.
Processing unknown game instances #
When the platform restarts, game instances that were active before the restart are no longer in memory. If an EndGame message arrives from SQS for one of these game instances, the platform needs a way to process it.
When processEndGameForUnknownInstance is enabled in GameInstanceServiceConfig:
- The platform creates a stub game instance for the unknown
GameInstanceId. GameInstancePlugin.prepareUnknownInstanceForEndGameRequestis called, giving you an opportunity to reconstruct the game instance state.GameInstancePlugin.handleBackendEndRequestis called to process the end game as normal.
Without this flag, unknown game instance IDs result in UnknownGameInstanceIdApplicationError, which is allowlisted for retry by default. This means the message will keep retrying until the retry limit is reached, at which point it moves to the DLQ.
For setup details, see Step 4 and Step 5 in the how-to guide.
Backpressure and concurrency #
The SqsReaderService includes built-in backpressure to prevent the platform from being overwhelmed:
maxConcurrentMessageProcessing(default: 30) — limits the number of messages being processed concurrently.maxPendingBeforePause(default: 100) — when the total count of active and waiting messages exceeds this threshold, the reader pauses polling until the count drops.- Exponential backoff — on poll failures, the reader backs off exponentially with jitter, capped at 60 seconds.
Grafana dashboards for monitoring SQS processing are pre-configured on managed infrastructure.
Related pages #
- Set up persistent end game via SQS — step-by-step setup guide
- End game instances — reference for all ways to end a game instance