Zanzibar is Google’s purpose-built authorization system. It’s a centralized authorization database built to take authorization queries from high-traffic apps and return authorization decisions. An instance of Zanzibar hosts a list of permissions and responds to queries from many apps. Google published a paper on their global authorization system that they presented at 2019 USENIX Annual Technical Conference and it has since become a popular resource for developers who are building authorization services.
A Zanzibar instance is made up of a cluster of servers, a distributed SQL database, an indexing system, a snapshotting system, and a periodic jobs system.
[WEBINAR] Join us on Nov 14 for a deep dive into Zanzibar’s approach to authorization. You’ll learn about its core principles and walk away with a better understanding of how well they suit your authorization needs. Learn more.
Google Zanzibar’s architecture, source
Why did Google develop Zanzibar for access control?
Google has many high-traffic apps, like Search, Docs, Sheets, and Gmail. Google accounts are shared between those systems, so authorization decisions (that is, what actions a Google account can take) need to be coordinated. These apps operate at huge scales, so constant inter-service communication isn’t practical. Their authorization system needs to handle billions of objects shared by billions of users and needs to return results with very low latency. Also, their system needs to handle filtering questions, like “what documents can this user see?”
In short, their authorization system needs to be:
- Error-free. An incorrect authorization decision might let someone see a document that wasn’t meant for their eyes.
- Fast. All other apps will be waiting on authorization decisions from Zanzibar. Google’s target was <10ms per query.
- Highly available. Authorization must be at least as available as the apps that depend on it.
- High-throughput. Google handles billions of queries per day.
To learn more about Google Zanzibar permissions, read our interview with Abhishek Parmar, co-creator of Google Zanzibar.
How does Google Zanzibar solve for authorization?
Correctness
Zanzibar limits both user errors and system errors. To quote one of the designers, Lea Kissner, “The semantics in Zanzibar are very carefully designed to try and make it very difficult for you to shoot yourself in the foot.” For a resource like a git repository, Zanzibar’s API exposes who can see (or edit/delete/act upon) that repository, why they can see it, and how to stop it from being seen.
Zanzibar also limits system errors. Zanzibar authorization is a distributed system, which means it takes time to propagate new permissions. To avoid data staleness, Zanzibar stores permissions in Google’s Spanner database. Spanner provides strong consistency guarantees, so Zanzibar never applies old permissions to new content.
Speed and availability
Zanzibar uses several tricks to reduce latency. First, it uses several layers of caching. The outermost cache layer is Leopard, an indexing system built to respond quickly to authorization checks. Then, read requests are cached across the servers that store permissions. Also, calls between services inside Zanzibar are cached.
Secondly, Zanzibar replicates data to move it closer to its physical access point. This system works like a CDN—Google maintains many instances of Zanzibar throughout the world.
On top of that, Zanzibar relies on some hand-tuning. In any authorization policy, some common permissions are used far more often than others. Zanzibar’s team hand-tunes these hot spots, for instance by enabling cache prefetching.
Scale
With Zanzibar’s replication and caching, it can store trillions of access control rules and handle millions of requests per second.
What does Google Zanzibar do well?
Zanzibar is a centralized source of authorization decisions. That can be a useful approach for two reasons. First, it is a single source of truth. Each of your services can call Zanzibar and get a “yes” or “no” answer in response, and those answers are consistent between services. Second, each of those services calls the same API, which makes it easier to use across many services.
Zanzibar also supports reverse indexing (also known as data filtering). This means that after assigning a user many individual permissions, you can also ask, “what resources does this user have access to?” This is a common authorization request (e.g., for list endpoints). It’s also useful for maintaining and debugging access controls.
What doesn’t Google Zanzibar do?
A Zanzibar-like solution requires centralizing all authorization data in the solution. This includes obvious things like roles, but it also encompasses org charts, file and folder hierarchies, document creators - anything you may ever use in an authorization query. The problem is that you also need that data in your application, so you have to duplicate it between the two. Google has the culture to impose this requirement and the resources to support it, but most companies don’t. We talk about our own experiences with data centralization and how we relieve this tension in our post on Local Authorization.
The overlap between application data and authorization data
Zanzibar provides few abstractions to work with. Its authorization logic is a flat list of access controls. You can define relationships between users and resources, but you can’t use properties of resources (like public/private switches) to make authorization decisions. It’s up to you to work out how to represent whatever authorization model you may have as a set of relationships. Google’s engineers recommend that you use a policy engine alongside Zanzibar to close the gap.
Finally, Zanzibar is a major technical investment. Building your own Zanzibar takes at least a year of effort from a dedicated team. Airbnb’s Himeji (a Zanzibar-alike) took more than a year of engineering work from a dedicated team. Using Zanzibar also takes engineering effort. At Google, the service is supported by a full-time team of engineers, plus several engineers from each service that uses Zanzibar. Most apps that use Zanzibar-like systems require hand-tuning to avoid hot spots.
Looking for an authorization service?
Engineering teams are increasingly adopting services for core infrastructure components, and this applies to authorization too. There are a number of authorization-as-a-service options available to those who want something like what Google made available to its internal engineers via Zanzibar.
Oso Cloud is a managed authorization service that provides the benefits of Zanzibar while filling in a number of Zanzibar’s gaps. You use Oso Cloud to provide fine-grained access to resources in your app, to define deep permission hierarchies, and to share access control logic between multiple services in your backend.
Oso is built for application authorization. It comes with built-in primitives for patterns like RBAC and ReBAC, and it is extensible for other use cases like attribute-based access control (ABAC). It is built using a best practices data model that makes authorization requests fast and ensures that you don’t need to make schema changes to make authorization changes. It provides APIs for enforcement and data filtering. Oso Cloud is also deployed globally for high availability and low-latency.
Fun fact: Abhishek Parmar, one of the co-creators of Google Zanzibar and Airbnb Himeji, is a technical advisor to the Oso engineering team.
Oso Cloud is free to get started – try it out. If you’d like to learn more about Oso Cloud or ask questions about authorization more broadly, come say hi to us on Slack.