Introduction:
Blockchain is a technology that drives all the cryptocurrencies. In every one of them, a set of validator nodes are responsible for validating all the transactions. The validators are assumed to be rational and self-interested, i.e. they are only interested in making as much money possible for themselves. Under such assumptions, it is generally assumed that a required majority of the validators would agree on the sequence of transactions that have ever happened on the blockchain.
However, such blockchain validator nodes are generally expensive in terms of the size of the disk space they need. The oldest and most popular cryptocurrency Bitcoin, for example, needs about 200 GB of disk space to store the entire transaction log. This makes it necessary to have a high-speed connection and a lot of time to even get started on mining or validation. This is a problem that prompted researchers to suggest sharding as a solution, i.e. storing only part of the log in each node, but storing the entire log as a whole. Sharding comes with its own challenges when it comes to validating transactions.
But, does it make sense for a node to store the transaction log starting all the way from the genesis block? This is an important question that needs to be answered before such solutions are crafted.
To understand whether saving all the transaction log is possible/necessary, we consider the following points –
1. Is it necessary to store the entire transaction log to validate transactions?
2. Is it more secure to store the transaction log than not storing it?
3. Is it possible to incentivize the validator nodes to store the log?
We take on these points in the rest of the article.
Is it necessary to store the entire transaction log to validate transactions?
To validate a transaction, the only thing that a node needs is to know that state of the blockchain right before the transaction. It is immaterial how that state was achieved. So, it is enough to store the state after each block. In fact, we can go even further – since the blocks that have been mined a long time before the current time are hardly ever going to be undone, it is safe to delete all the previous blocks.
The natural question that comes to our mind is if it would somehow compromise the safety of the blockchain, i.e. – would it somehow make the blockchain to approve a transaction that is not correct based on the current state of the blockchain? To answer this question, we move onto the next section.
Is it more secure to store the transaction log than not storing it?
When thinking about it, we need to think in terms of security of the whole transactions and not just the part on the blockchain. Most transactions have two parts – a payment on blockchain and the receipt of something of value in exchange. The second part of the transaction is not stored in the blockchain. This means the seller that sells the product or the service in exchange for some cryptocurrency relies on the blockchain not to revert the transaction after he/she provides the product or the service. This, in turn, means that there has to be a reasonable time after which the transaction must become completely immutable requiring the block in which it is included to be immutable too. In other words, we need some finality of the transaction and the block. When a block is finalized in some form or other, it is okay to forget what happened before that and simply continue as if that block is the genesis block. This in general means that the transaction history must be stored only up to a short period of time. In the case of Bitcoin, people normally assume that a block is pretty much finalized after an hour of time, so it makes sense to delete all the history before that. That means, in the case of Bitcoin, a validator only needs the list of UTXOs at the end of each of the last few blocks.
We now turn towards our final point.
Is it possible to incentivize the validator nodes to store the log?
As stated earlier, the validators are assumed to be selfish and rational. Which means, they need to be paid or rewarded for doing anything. Since the validators only get paid in cryptocurrency for mining the blocks in all public blockchains, that’s the only thing they should be doing. We have also seen that the storage of the data is not necessary for doing the job of validating. Therefore, there is no incentive for any validator to store the entirety of the transaction logs starting from the genesis. If we indeed want the rational validator to store the entire history of the transactions, we must sufficiently incentivize the validators to do so. We may want to require proof of storage of all the transactions in a block for it to be considered valid. Is it possible to do it? Yes indeed, if we change the proof-of-work consensus protocol as follows –
1. Let a block being proposed to be B and the corresponding proof-of-work be p. This means that p is the nounce such that hash(B||p) < threshold.
2. Let the number of transactions before that block be N.
3. Let h = hash(B||hash(p)), where || means concatenation.
4. Compute the remainder r when h is divided by N. Since, h is typically a 256-bit number, it must be the case that h is way greater than N, so this means r is less than both h and N and can be considered a random sample from the set of natural numbers less than N.
5. Let the transaction with the sequence number r be tr. The transactions are indexed from 0 to N-1.
6. Now, the block proposal must be (B||p|| tr ) for it to be considered valid.
It is easy to see that if the validator does not store all of the logs of transactions, it would need more effort to generate valid blocks since it has to throw away all the proof-of-work and start over if the corresponding transaction is not stored by it. More specifically, if a validator stores on a fraction f of all the history, it has to throw away the proof-of-work 1/f times on an average, significantly reducing its average mining reward for the same amount of processing power. It is, therefore, most efficient for any validator to simply store all the transactions from the genesis block. But, such a system is not currently used, so the validators are really doing a social service by still storing all the transactions. However, from our discussion on security, it is clear that such a modification can be quite unnecessary.
Another issue is that when a new validator joins, it must download the entire transaction log from its peers. The peers, however, are not incentivized at all to provide this data to their new peer. They are simply adding a competing mining power while also burdening their own bandwidth to broadcast the information. The least we can do is to relieve them from having to broadcast the entirety of the history of transactions.
Conclusion:
We can see that it is possible to mandate the storage of all the transaction history if we choose to design the blockchain in such a way. However, it seems quite unnecessary and cumbersome. It may even provide more motivation for the existing miners to provide the needed data to any new joiners. We have also seen that there is no advantage of storing the transaction history in terms of the security of the blockchain, so we might as well not want to do so.