A little post about the reliability, security, and performance of Expo services

With regard to the reliability, security, and performance of our services, these are some technical details of our servers I thought people might find interesting. This is how a part of Expo works in March 2017:

The Expo server, which serves a small manifest for your project, is hosted on our cluster running in Google Cloud. Our cluster is a multi-zone cluster, which means that even if even if servers or disks or networking hardware has issue in one zone, our services will continue to run in another zone. This is what Google says about their zones:

Google designs zones to be independent from each other: a zone usually has power, cooling, networking, and control planes that are isolated from other zones, and most single failure events will affect only a single zone. Thus, if a zone becomes unavailable, you can transfer traffic to another zone in the same region to keep your services running.

We use an open technology called Kubernetes to run the Expo cluster, and one of the future benefits of Kubernetes is the ability to manage clusters in different regions. So even if all of Google’s Central United States data center went offline, we’d have servers in other parts of the world automatically handle traffic. We haven’t set this up, but the option is available to us should it become important enough.

Kubernetes also monitors our server health so that if our server processes consume too much CPU or RAM, or if the underlying server machine has issues, it will automatically provision a new server machine if necessary, and start new server processes on it. We’re really interested in making decisions that mitigate infra problems and allow the cluster to self-heal when there are issues; this means a high-quality service for developers and low-effort maintenance for us.

The servers all run behind a VPN and are not directly exposed to the public internet. One of the roles of our main server program is to serve project manifests, which are cryptographically signed with our private key. The Expo client apps and standalone apps (made with exp build for example) verify the signature with our public key. This allows us to trust the URL of the JS bundle, which is one of the things the manifest specifies. The manifests (along with our website and API calls) are served over HTTPS, and we use HTTP/2 for performance as well. This is what the Chrome dev tools say about our HTTPS configuration:

The connection to this site is encrypted and authenticated using a strong protocol (TLS 1.2), a strong key exchange (ECDHE_RSA with P-256), and a strong cipher (AES_128_GCM).

The JS bundle is stored on AWS S3 and served through AWS CloudFront, also over HTTPS. Each JS bundle’s filename contains the hash of the bundle (we get the benefits of content-addressable storage), so the URLs can be served with very long max-age values so the client doesn’t have to send even a single TCP/IP packet if it already has downloaded a bundle once before. And because they are served from AWS’s global CloudFront CDN, users all across the world get to download these bundles efficiently.

We have several ideas to improve in all of these areas and are actively working on some of them. One thing I’ve seen working on Expo is that a lot of Expo developers don’t have a lot of experience or time to work on servers and clusters. When we take care of this work lots of other people don’t have to anymore and to me that feels good for the developer world.

11 Likes