How Blockchains Work
A blockchain is a data structure that uses the hash function cryptographic primitive to create an immutable data structure. Together with a consensus algorithm, this allows the creation of a distributed data store.
Data Structure
The structure of a the simplest blockchain is as follows:
struct {
uint32_t hash_prev_block;
uint32_t random_nonce;
uint32_t hash_data_block;
}
First, there is a link to the previous block, then a random number, and then a hash to a data block. The first block in the chain has the previous block set to zero, and any block can have no data by specifying a zero for the data block hash. With a blockchain, as long as you have the hash of the end of the chain, and a way to lookup data from hashes, the entire data structure can be retrieved.
For illustration purposes, the diagrams are using CRC32 checksums instead of a cryptographic hash function. In practice, a hash algorithm like SHA256 or SHA512 should be used, but these required 256 and 512 bytes respectively. The shorter, trivially insecure CRC32 is used to allow for compact diagrams.
Linking cryptographic hashes together is useful for creating immutable data stores. If a single bit in any block is changed, the hash of that block changes and breaks the blockchain link,provided there blocks don't have a hash collision. With CRC32, this is much more likely to happen. There is a 50% chance of a collision after checking about 7,700 blocks, which can be done in less than a second on modern hardware.
By using SHA256, the number tries needed for a 50% probability of collision increases to 4.0e38, which is not going to happen in our lifetime on modern computers, barring a drastic innovation in either computational power or hash function mathematical analysis.
Consensus
Once you have the end of the chain, you can get the entire chain and verify it hasn't been changed, but how do you know that you have the correct end of the chain? For a distributed data store to be useful, all hosts that need access to the data need to have the same chain. This is where consensus algorithms come in. There are both centralized and decentralized consensus algorithms.
An example of a centralized consensus algorithm would be the following: choose the longest chain with a cryptographic signature from public key X on every block. It relies on a trusted party, in this case the system holding the private key for X. That party could arbitrarily change the chain data. When people are talking about private blockchains, this is the type of consensus algorithm used.
A decentralized consensus algorithm is like the following: chose the longest chain with the smallest sum of all the block CRCs. This is a kind of proof of work, so called because you can become the correct chain by spending more time checking all the possible values that the field nonce could be set to to get the lowest non-zero hash. With a cryptographic hash function, the only way to find a lower value is to try each nonce number and select the lowest value. This takes time.
Other decentralized consensus algorithms include proof-of-stake, which uses self-selected signatures with the possibility of losing a monetary instrument if the active chain contains a proof that you created an incorrect chain, and the random sample consensus algorithm Avalanche. Moving away from proof-of-work reduces the energy requirements of finding consensus, but may reduce the security of the system in exchange.
In addition to the base consensus algorithm, decentralized blockchain systems being used for cryptocurrency will also validate the contents of each data block, to make sure that transactions occur according a defined set of rules. Any chain that fails to validate every block gets rejected.
Resources
The source files for the images in this post can be found here