In the previous article, we explained what strong (vs. eventual) consistency is. This article is the second part of a series where we explain how a lack of strong consistency makes it harder to deliver a good end-user experience, can bring serious engineering overhead, and opens you up to exploits. This part is longer since we will explain different database anomalies, go through several example scenarios, and briefly highlight which kind of database suffers from each anomaly.
User experience is the driving factor in the success of any app, and relying on an inconsistent backend can increase the challenge to deliver a good experience. More importantly, building application logic on top of inconsistent data can lead to exploits. One paper calls these kinds of attacks “ACIDrain.” they investigated 12 of the most popular self-hosted e-commerce applications and at least 22 possible critical attacks were identified. One website was a Bitcoin wallet service that had to shut down due to these attacks. When you choose a distributed database that is not 100% ACID, there will be dragons. As explained in one of our previous examples, due to misinterpretations, badly defined terminology, and aggressive marketing, it is very hard for an engineer to determine what guarantees a specific database delivers.
Which dragons? Your app might feature issues such as wrong account balances, unreceived user rewards, trade transactions that executed twice, messages that appear out of order, or application rules that are violated. For a quick introduction why distributed databases are necessary and difficult, please refer to our first article or this excellent video explanation. In short, a distributed database is a database that holds copies of your data in multiple locations for scale, latency, and availability reasons
We’ll go through four of these potential issues (there are more) and illustrate them with examples from game development. Game development is complex and those developers are faced with many problems that closely resemble serious real-life problems. A game has trading systems, messaging systems, awards that require conditions to be fulfilled, etc. Remember how angry (or happy 🤨) gamers can be if things go wrong or appear to go wrong. In games, user experience is everything, so game developers are often under huge pressure to make sure their systems are fault-tolerant.
Ready? Let’s dive into the first potential issue!
1. Stale reads
Stale reads are reads that return old data, or in other words, data that returns values which are not yet updated according to the latest writes. Many distributed databases, including traditional databases that scale up with replicas (read Part 1 to learn how these work), suffer from stale reads.
Impact on end users
First off, stale reads can affect end users. And it’s not a single impact.
Frustrating experiences and unfair advantages
Imagine a scenario where two users in a game encounter a chest with gold. The first user receives the data from one database server while the second is connected to a second database server. The order of events goes as follows:
- User 1 (via database server 1) sees and opens the chest, retrieves the gold.
- User 2 (via database server 2) sees a full chest, opens it, and fails.
- User 2 still sees a full chest and does not understand why it fails.
Although this seems like a minor problem, the result is a frustrating experience for the second player. Not only did he have a disadvantage, but he will also often see situations in the game where things appear to be there, yet they are not. Next, let’s look at an example where the player takes action on a stale read!
Stale reads leading to duplicated writes
Imagine a situation where a character in the game tries to buy a shield and a sword in a shop. If there are multiple locations that contain the data and there is no intelligent system in place to provide consistency, then one node will contain older data than another. In that case, the user might buy the items (which contacts the first node) and then check his inventory (which contacts the second node), only to see that they are not there. The user will probably be confused and might think that the transaction didn’t go through. What would most people do in that case? Well, they try to buy the item again. Once the second node has caught up, the user has already bought a duplicate, and once the replica catches up, he suddenly sees that he has no money left and two items of each. He is left with the perception that our game is broken.
In this case, the user has spent resources which he did not want to spend. If we write an email client on top of such a database, a user might try to send an email, then refresh the browser and not be able to retrieve the email he has just sent, and therefore send it again. Delivering a good user experience and implementing secure transactions such as bank transactions on top of such a system is notoriously hard.
Impact on developers
When coding, you always have to expect that something is not there (yet) and code accordingly. When reads are eventually consistent, writing fault-proof code becomes very challenging and chances are that users will encounter problems in your application. When reads are eventually consistent, these problems will be gone by the time you are able to investigate them. Basically, you end up chasing ghosts. Developers still often choose databases or distribution approaches that are eventually consistent since it often takes time to notice the problems. Then, once the problems in their application arise, they try to be creative and build solutions (1, 2) on top of their traditional database to fix the stale reads. The fact that there are many guides like this and that databases like Cassandra have implemented some consistency features shows that these problems are real and do cause issues in production systems more frequently than you might imagine. Custom solutions on top of a system that is not built for consistency are very complex and brittle. Why would someone go through such a hassle if there are databases that deliver strong consistency out-of-the-box?
Databases that exhibit this anomaly
Traditional databases (PostgreSQL, MySQL, SQL Server, etc..) that use master-read replication typically suffer from stale reads. Many newer distributed databases also started off as eventually consistent, or in other words, without protection against stale reads. This was due to a strong belief in the developer community that this was necessary to scale. The most famous database that started off like this is Cassandra, but Cassandra recognized how their users struggled to deal with this anomaly and have since provided extra measures to avoid this. Older databases or databases which are not designed to provide strong consistency in an efficient way such as Cassandra, CouchDB, and DynamoDB are by default eventually consistent. Other approaches such as Riak are also eventually consistent, but take a different path by implementing a conflict resolution system to reduce the odds of outdated values. However, this does not guarantee that your data is safe since conflict resolution is not fault-proof.
2. Lost writes
In the realm of distributed databases, there is an important choice to make when writes happen at the same time. One option (the safe one) is to make sure that all database nodes can agree on the order of these writes. This is far from trivial since it either requires synchronized clocks, for which specific hardware is necessary, or an intelligent algorithm like Calvin that doesn’t rely on clocks. The second, less safe option is to allow each node to write locally and then decide what to do with the conflicts later on. Databases that choose the second option can lose your writes.
Impact on end users
Consider two trade transactions in a game where we start with 11 gold pieces and buy two items. First, we buy a sword at 5 gold pieces and then buy a shield at five gold pieces, and both transactions are directed to different nodes of our distributed database. Each node reads the value, which in this case is still 11 for both nodes. Both nodes will decide to write 6 as the result (11- 5) since they are not aware of any replication. Since the second transaction could not see the value of the first write yet, the player ends up buying both the sword and shield for five gold pieces total instead of 10. Good for the user, but not so good for the system! To remedy such behavior, distributed databases have several strategies — some better than others.
Resolution strategies include “last write wins” (LWW) or “longest version history” (LVH) wins. LWW has for a long time been the strategy of Cassandra and is still the default behavior if you do not configure it differently.
If we apply LWW conflict resolution to our previous example, the player will still be left with 6 gold, but will only have bought one item. This is a bad user experience because the application confirmed his purchase of the second item, even though the database doesn’t recognize it as existing in his inventory.
Unpredictable security
As you might imagine, it is unsafe to write security rules on top of such a system. Many applications rely on complex security rules in the backend (or directly on the database where possible) to determine whether a user can or cannot access a resource. When these rules are based on stale data that’s updated unreliably, how can we be sure that there is never a breach? Imagine one user of a PaaS application calls his administrator and asks: “Could you make this public group private so that we can repurpose it for internal data?” The admin applies the action and tells him it’s done. However, because the admin and user might be on different nodes, the user might start adding sensitive data to a group that is technically still public.
Impact on developers
When writes are lost, debugging user issues will be a nightmare. Imagine that a user reports that he has lost data in your application, then one day goes by before you get time to respond. How will you try to find out whether the issue was caused by your database or by faulty application logic? In a database that allows tracking data history such as FaunaDB or Datomic, you would be able to travel back in time to see how the data had been manipulated. Neither of these is vulnerable to lost writes though, and databases that do suffer from this anomaly typically don’t have the time-travel feature.
Databases that suffer from lost writes
All databases that use conflict resolution instead of conflict avoidance will lose writes. Cassandra and DynamoDB use last write wins (LWW) as default; MongoDB used to use LWW but has since moved away from it. The master-master distribution approaches in traditional databases such as MySQL offer different conflict resolution strategies. Many distributed databases that were not built for consistency suffer from lost writes. Riak’s simplest conflict resolution is driven by LWW, but they also implement more intelligent systems. But even with intelligent systems, sometimes there’s just no obvious way to resolve a conflict. Riak and CouchDB place the responsibility to choose the correct write with the client or application, allowing them to manually choose which version to keep.
Since distribution is complex and most databases use imperfect algorithms, lost writes are common in many databases when nodes crash or when network partitions arise. Even MongoDB, which does not distribute writes (writes go to one node), can have write conflicts in the rare case that a node goes down immediately after a write.
3. Write skew
Write skew is something that can happen in a type of guarantee that database vendors call snapshot consistency. In snapshot consistency, the transaction reads from a snapshot that was taken at the time the transaction started. Snapshot consistency prevents many anomalies. In fact, many thought it was completely secure until papers (PDF) started to appear proving the opposite. Therefore, it’s not a surprise that developers struggle to understand why certain guarantees are just not good enough.
Before we discuss what doesn’t work in snapshot consistency, let’s first discuss what does. Imagine that we have a battle between a knight and a mage, whose respective life powers consist of four hearts.
When either character gets attacked, the transaction is a function that calculates how many hearts have been removed:
damageCharacter(character, damage) {
character.hearts = character.hearts - damage
character.dead = isCharacterDead(character)
}
And, after each attack, another isCharacterDead
function also runs to see if the character has any hearts left:
isCharacterDead(character) {
if ( character.hearts <= 0 ) { return true }
else { return false }
}
In a trivial situation, the knight’s strike removes three hearts from the mage, and then the mage’s spell removes four hearts from the knight, bringing his own life points back to four. These two transactions would behave correctly in most databases if one transaction runs after the other.
But what if we add a third transaction, an attack from the knight, which runs concurrently with the mage’s spell?
Is the knight dead, and is the mage alive?
To deal with this confusion, snapshot consistency systems typically implement a rule called “the first committer wins.” A transaction can only conclude if another transaction did not already write to the same row, else it will roll back. In this example, since both transactions tried to write to the same row (the mage’s health), only the Life Leech spell would work and the second strike from the knight would be rolled back. The end result would then be the same as in the previous example: a dead knight and a mage with full hearts.
However, some databases such as MySQL and InnoDB do not consider “the first committer wins” as part of a snapshot isolation. In such cases, we would have a lost write: the mage is now dead, although he should have received the health from the life leech before the strike of the knight took effect. (We did mention badly defined terminology and loose interpretations, right?)
Snapshot consistency that includes the “first committer wins” rule does handle some things well, not surprising since it was considered a good solution for a long time. This is still the approach of PostgreSQL, Oracle, and SQL Server, but they all have different names for it. PostgreSQL calls this guarantee “repeatable read,” Oracle calls it “serializable” (which is incorrect according to our definition), and SQL Server calls it “snapshot isolation.” No wonder people get lost in this forest of terminology. Let’s look at examples where it is not behaving as you would expect!
Impact on end users
The next fight will be between two armies, and an army is considered dead if all of the army characters are dead:
isArmyDead(army){
if (<all characters are dead>) { return true }
else { return false }
}
After every attack, the following function determines if a character has died, and then runs the above function to see if the army has died:
damageArmyCharacter(army, character, damage){
character.hearts = character.hearts - damage
character.dead = isCharacterDead(character)
armyDead = isArmyDead(army)
if (army.dead != armyDead){
army.dead = armyDead
}
}
First, the character’s hearts are diminished with the damage that was received. Then, we verify whether the army is dead by checking whether each character is out of hearts. Then, if the state of the army has changed, we update the ‘dead’ boolean of army.
There are three mages that each attack one time resulting in three ‘Life Leech’ transactions. Snapshots are taken at the beginning of the transactions, since all transactions start at the same time, the snapshots are identical. Each transaction has a copy of the data where all knights still have full health.
Let’s take a look at how the first ‘Life Leech’ transaction resolves. In this transaction, mage1 attacks knight1, and the knight loses 4 life points while the attacking mage regains full health. The transaction decides that the army of knights is not dead since it can only see a snapshot where two knights still have full health and one knight is dead. The other two transactions act on another mage and knight but proceed in a similar way. Each of those transactions initially had three live knights in their copy of the data and only saw one knight dying. Therefore, each transaction decides that the army of knights is still alive.
When all transactions are finished, none of the knights are still alive, yet our boolean that indicates whether the army is dead is still set to false. Why? Because at the time the snapshots were taken, none of the knights were dead. So each transaction saw his own knight dying, but had no idea about the other knights in the army. Although this is an anomaly in our system (which is called write skew), the writes went through since they each wrote to a different character and the write to the army never changed. Cool, we now have a ghost army!
Impact on developers
Data quality
What if we want to make sure users have unique names? Our transaction to create a user will check whether a name exists; if it does not, we will write a new user with that name. However, if two users try to sign up with the same name, the snapshot won’t notice anything since the users are written to different rows and therefore do not conflict. We now have two users with the same name in our system.
There are numerous other examples of anomalies that can occur due to write skew. If you are interested, Martin Kleppman’s book “Designing Data-Intensive Applications” describes more.
Code differently to avoid the rollbacks
Now, let’s consider a different approach where an attack is not directed towards a specific character in the army. In this case, the database is responsible for selecting which knight should be attacked first.
damageArmy(army, damage){
character = getFirstHealthyCharacter(knight)
character.hearts = character.hearts - damage
character.dead = isCharacterDead(character)
// ...
}
If we execute several attacks in parallel as in our previous example, the getFirstHealthyCharacter
will always target the same knight, which would result in multiple transactions that write to the same row. This would be blocked by the “first committer wins” rule, which will roll back the two other attacks. Although it prevents an anomaly, the developer is required to understand these issues and code around them creatively. But wouldn’t it be easier if the database just did this for you out-of-the-box?
Databases that suffer from write skew
Any database that provides snapshot isolation instead of serializability can suffer from write skew. For an overview of databases and their isolation levels, please refer to this article.
4. Out of order writes
To avoid lost writes and stale reads, distributed databases aim for something called “strong consistency.” We mentioned that databases can either choose to agree on a global order (the safe choice) or decide to resolve conflicts (the choice that leads to lost writes). If we decide on a global order, it would mean that although the sword and shield are bought in parallel, the end result should behave as if we bought the sword first and then bought the shield. This is also often called “linearizability” since you can linearize the database manipulations. Linearizability is the gold standard to make sure your data is safe.
Different vendors offer different isolation levels, which you can compare here. A term that comes back often is serializability which is a slightly less strict version of strong consistency (or linearizability). Serializability is already quite strong and covers most anomalies, but still leaves room for one very subtle anomaly due to writes that get reordered. In that case, the database is free to switch that order even after the transaction has been committed. Linearizability in simple terms is serializability plus a guaranteed order. When the database is missing this guaranteed order, your application is vulnerable to out of order writes.
Impact on end users
Reordering of conversations
Conversations can be ordered in a confusing way if someone sends a second message due to a mistake.
Reordering of user actions
If our player has 11 coins and simply buys items in the order of importance while not actively checking the amount of gold coins he has, then the database can reorder these buy orders. If he didn’t have enough money, he could have bought the item of least importance first.
In this case, there was a database check which verified whether we have enough gold. Imagine that we did not have enough money and it would cost us money to let the account go below zero, just like a bank charges you overdraft fees when you go below zero. You might sell an item quickly in order to make sure you have enough money to buy all three items. However, the sale that was meant to increase your balance might be reordered to the end of the transaction list, which would effectively push your balance below zero. If it were a bank, you would likely incur charges you definitely did not deserve.
Unpredictable security
After configuring security settings, a user will expect that these settings will apply to all forthcoming actions, but issues can arise when users talk to each other via different channels. Remember the example we discussed where an administrator is on the phone with a user who wants to make a group private and then adds sensitive data to it. Although the time window within which this can happen becomes smaller in databases that offer serializability, this situation can still occur since the administrator’s action might not be completed until after the user’s action. When users communicate through different channels and expect that the database is ordered in real-time, things go wrong.
This anomaly can also happen if a user is redirected to different nodes due to load balancing. In that case, two consecutive manipulations end up on different nodes and might be reordered. If a girl adds her parents to a facebook group with limited viewing rights, and then posts her spring break photos, the images might still end up in her parents’ feeds.
In another example, an automatic trading bot might have settings such as a maximum buy price, a spending limit, and a list of stocks to focus on. If a user changes the list of stocks that the bot should buy, and then the spending limit, he will not be happy if these transactions were reordered and the trading bot has spent the newly allocated budget on the old stocks.
Impact on developers
Exploits
Some exploits depend on the potential reversal of transactions. Imagine that a game player receives a trophy as soon as he owns 1,000 gold, and he really wants that trophy. The game calculates how much money a player has by adding together gold of multiple containers, for example his storage and what he’s carrying (his inventory). If the player quickly swaps money in between his storage and inventory, he can actually cheat the system.
In the illustration below, a second player acts as a partner in crime to make sure that the money transfer between the storage and the inventory happens in different transactions, increasing the chance that these transactions get routed to different nodes. A more serious real world example of this happens with banks that use a third account to transfer money; the bank might miscalculate whether or not someone is eligible for a loan because various transactions have been sent to different nodes and not had enough time to sort themselves out.
Databases that suffer from out of order writes
Any database that does not provide linearizability can suffer from write skew. For an overview of which databases do provide linearizability, please refer to this article. Spoiler: there are not that many.
All anomalies can return when consistency is bounded
One final relaxation of strong consistency to discuss is to only guarantee it within certain bounds. Typical bounds are a datacenter region, a partition, a node, a collection, or a row. If you program on top of a database that imposes these kinds of boundaries to strong consistency, then you need to keep those in mind to avoid accidentally opening Pandora’s Box again.
Below is an example of consistency, but only guaranteed within one collection. The example below contains three collections: one for the players, one for the smithies (i.e., blacksmiths repairing players’ items), and another for the items. Each player and each smithy has a list of ids that point to items in the items collection.
If you want to trade the shield between two players (e.g., from Brecht to Robert), then everything is fine since you remain in one collection and therefore your transaction remains within the boundaries where consistency is guaranteed. However, what if Robert’s sword is in the smithy for repairs and he wants to retrieve it? The transaction then spans two collections, the smithy’s collection and the player’s collection, and the guarantees are forfeited. Such limitations are often found in document databases such as MongoDB. You will then be required to change the way you program to find creative solutions around the limitations. For example, you could encode the location of the item on the item itself.
Of course, real games are complex. You might want to be able to drop items on the floor or place them in a market so that an item can be owned by a player but does not have to be in the player’s inventory. When things become more complex, these workarounds will significantly increase technical depth and change the way you code to stay within the guarantees of the database.
Conclusion
We have seen different examples of issues that can arise when your database does not behave as you would expect. Although some cases might seem insignificant at first, they all have a significant impact on developer productivity, especially as a system scales. More importantly, they open you up to unpredictable security exploits — which can cause irreparable damage to your application’s reputation.
We discussed a few degrees of consistency, but let’s put them together now that we have seen these examples:
Stale reads | Lost writes | Write skew | Out of order writes | |
---|---|---|---|---|
Linearizability | safe | safe | safe | safe |
Serializability | safe | safe | safe | unsafe |
Snapshot consistency | safe | safe | unsafe | unsafe |
Eventual consistency | unsafe | unsafe | unsafe | unsafe |
Also remember that each of these correctness guarantees can come with boundaries:
Row-level boundaries | The guarantees delivered by the database are only honored when the transaction reads/writes to one row. Manipulations such as moving items from one player to another can cause issues. HBase is an example database that limits guarantees to one row. |
Collection-level boundaries | The guarantees delivered by the database are only honored when the transaction reads/writes to one collection. E.g., trading items between two players stays within a “players” collection, but trading them between a player and an entity from another collection such as a market opens the door to anomalies again. Firebase is an example which limits correctness guarantees to collections. |
Shard/Replica/Partition/Session boundaries | As long as a transaction only affect data on one machine or shard, the guarantees hold. This is, of course, less practical in distributed databases. Cassandra has recently started offering serializability features if you configure them, but only within a partition. |
Regionboundaries | Some databases almost go all the way and provide guarantees across multiple nodes (shards/replicas), but their guarantees do not hold anymore if your database is distributed across multiple regions. Such an example is Cosmos. Cosmos is a great technology, but they have chosen an approach where consistency guarantees are limited to one region. |
Finally, realize that we have only mentioned a few anomalies and consistency guarantees while in fact there are more. For the interested reader, I fondly recommend Martin Kleppman’s Designing Data-Intensive Applications.
We live in a time when we no longer have to care, as long as we choose a strongly consistent database without limitations. Thanks to new approaches such as Calvin (FaunaDB) and Spanner (Google Spanner, FoundationDB), we now have multi-region distributed databases that deliver great latencies and behave as you expect in each scenario. So why would you still risk shooting yourself in the foot and choose a database that does not deliver these guarantees?
In the next article in this series, we will go through the effects on your developer experience. Why is it so hard to convince developers that consistency matters? Spoiler: most people need to experience it before they see the necessity. Think about this though: “If bugs appear, is your app wrong, or is it the data? How can you know?” Once the limitations of your database manifest themselves as bugs or bad user experiences, you need to work around the limitations of the database, which results in inefficient glue code that does not scale. Of course, at that point, you are deeply invested and the realization came too late.