High Throughput Recommendations

Recommendations for Business Solution Partners (BSPs) and governments to achieve the best performance with the WhatsApp Business API.

This document covers:

Performance

In this context, performance represents the number of messages that can be sent in any given second using the WhatsApp Business API client.

The maximum achievable performance depends on a variety of factors, the most important factor being whether a message is being sent to a new user or an existing user. WhatsApp encryption sessions setup takes a little longer when messaging a new user.

Under ideal circumstances (i.e., with the suggested setup below), we observed a sustained performance of 250 messages/second when messaging 100% new users.

Setup

WhatsApp Business API Client

For setting up a WhatsApp Business API client, it is strongly recommended using the AWS cloud formation templates provided by WhatsApp. This is for two reasons:

  • It is easy to debug for failures as this CFN template orchestration is done by the WhatsApp team.
  • The WhatsApp team continues to optimize AWS infrastructure components that are needed to run the WhatsApp Business API. When we these changes to the template it’s very simple to absorb these just by updating the cloud formation stack.

A new CFN template will be published soon. Until then, please use this template to create your AWS stack. If the stack is already created, consider updating it to apply these changes.

Recommended core parameters*

Configuration AreaParameterSetting

General Configuration

EnvType

Production

HAEnabled

Enabled

NumCoreappInstances

32

Container Configuration

EC2 InstanceType

c4.4xlarge

EBS Volume Size

128

KeyName

****

WhatsApp Enterprise Container Registry

docker.whatsapp.biz

WhatsApp Enterprise Client (Container) Version

v2.27.9

Database Configuration

MultiAZ Enabled

Enabled

Storage Capacity

1024 GB

Storage Type

io1

IOPS

10000

Instance Type

db.m4.4xlarge

Administrator Name

****

Administrator Password

****

Database Port

3306

Timeout to close idle DB connections

180000

Logging Configuration

Logging driver for container log

awslogs

* Not all parameters from the template are covered here

Database [optional]

The above parameters table covers the critical parameters. But, if you require additional monitoring or insights, modify the RDS instance to apply the following optional changes:

  • Storage Autoscaling
    • Select Enable storage autoscaling
  • Monitoring
    • Select Enable enhanced monitoring
  • Performance Insights
    • Select Enable Performance Insights

Webhooks

The size of the callback queue is 100,000. If the callback/Webhooks server latency is high, these callbacks are queued in the callback queue. When the callback queue is full, the Coreapp considers it a system under heavy load and prevents you from “sending” messages. However, the Coreapp continues to accept callbacks, such as sent/delivery alerts and incoming messages, from the server, appending them to the callback queue. The Coreapp throughput to send messages becomes zero until the queue is emptied, up to a certain point. This is why it's important to configure the callback server with low latency.

To mitigate this, please set up your Webhooks in following way:

  1. Deploy the Webhooks endpoint as close as possible to the Coreapp.
  2. For each inbound notification to the Webhook server, first send 200 OK to the Coreapp before executing any business logic. If 200 OK is sent after the business logic is executed, it can increase the latency of the Webhook server.

We always recommend having Webhooks configured even if your business use case is to only send notification messages because it's important to listen to customers. The Webhoook also receives error message notifications.

When a Webhook is configured, make sure that it's configured correctly as per the above recommendations; otherwise, it can break the system when it's under heavy load. Thus, even given the importance of having Webhooks configured, it’s better to have no Webhooks when not needed for higher performance, than having incorrectly configured Webhooks.

The sent_status parameter

Once the Webhook is configured, the Coreapp dispatches the inbound notifications. One of those notifications is sent_status, which is sent when a message sent by your business is received by the server. If your business chooses to ignore this notification, you can set this parameter to false by updating the Application Settings. By default,sent_status is set to true.

Disabling sent_status does not directly impact the performance of the WhatsApp Business API client, but you may consider disabling it for following reasons:

  1. If the Webhooks server is not configured efficiently enough to respond quickly, or
  2. If you are not concerned with sent status reports (i.e., if it's not going to be used for any reporting in future)

Our recommendation is to enable the sent_status parameter with correctly configured Webhooks and persist these alerts for report generation.

Mark messages as read

The Coreapp dispatches user messages to the Webhook. Once the messages are received, you can choose to mark messages as read (i.e., blue tick marks in the consumer version of WhatsApp). If the Webhooks are configured correctly and message volumes are within the limits, this can create a good user experience. However, if volumes are almost hitting thresholds on the Coreapp, you may not want to implement this.

If you decide to use this option, you must disable the pass_through parameter.

Media auto-download

When a message with media is received, the WhatsApp Business API client will download the media. Once the media is downloaded, you will receive a notification through your Webhook. You can enable downloading media automatically by setting the auto_download parameter by media type. If your business is not considering processing incoming media from users, don’t enable the auto_download parameter.

Monitoring

It's important to configure a monitoring setup for the Coreapp as it helps measure the performance of the Coreapp and its interactions with other system components. While you can build your own monitoring setup by making metrics and stats API calls, we strongly recommend setting up the instance monitoring provided by WhatsApp.

Strategies

Media Messages

Media messages reduce the performance by a significant factor. Please avoid sending media message during times of demanding performance needs.

Sending a single media file to many users

You may consider media messages if you have the option to send the same media file for many or all users (e.g., your company logo). There are a couple of ways to send media — either by media ID or by media URL. In this case, please consider sending media only by media ID because the media can be uploaded once to the Coreapp and reused many times. If you choose to send media by URL, this has to be downloaded each time.

Please note that although the upload only happens once, there are additional costly computations that need to be done in the Coreapp to send media messages, so use this option only if necessary. It is also recommended you create a flag to turn this feature off when performance bottlenecks are hit with live traffic.

Launching

It's strongly recommended you roll out your launch in a graduated manner. Always start with launching to a fraction of your users; learn the behaviors of traffic, then do a wider launch. Similarly, if you are launching globally, consider initially launching to a region with lower traffic, then gradually adding other regions.

Multiple Phone Numbers

If throughput needs are much higher than a single phone number can handle, you can consider using multiple phone numbers.

  • Follow all of the above recommendations for a single phone number.
  • Do not use the same database instance to host multiple phone numbers.
  • Once a phone number is launched to send messages, it cannot be taken down. It should be managed forever. Be careful to only launch the required set of phone numbers.

Strategies to split traffic among multiple phone numbers

  • Choose a rule to split traffic that makes sense for your use case, for example:
    • Language-based — One phone number answers English questions while a second phone number answers in Spanish
    • Region-based — One phone number answers for users in the United States while a second phone number answers users in India

Customer care messages

  • If you are driving the incoming traffic through a www.wa.me/$phone URL, please add one more level of redirection via your http server. The (http) server's responsibility is to determine which phone numbers can be used to serve the incoming request and determine the correct phone number using your “split-traffic” strategy.
  • If a user is already messaging to a phone number that is getting unexpected loads of traffic or already hitting its load limits, redirect the user to use a different phone number.
  • During launch, if you are expecting unexpected loads of traffic, then have one additional phone number running a WhatsApp Business API client that is ready to serve the traffic.

Notifications

  • For notifications, you can estimate how many users each phone number can handle and divide the traffic accordingly. When the system is getting unexpected loads of traffic, then you can choose to spin-off a new WhatsApp Business API client with a different phone number.
  • To prevent negative feedback (i.e., spam/blocks), it’s important to map each recipient number to a single WhatsApp Business API client phone number. For example, if multiple phone numbers are hosted for ABC CupCakes and a user placed two orders for cupcakes, they probably expect to receive these two order notifications from a single WhatsApp Business API client number rather than a different number each time.