Efficient, Usable, And Cheap Storage Of IPFS Hashes In Solidity Smart Contracts

in #ethereum6 years ago

Efficient, Usable, And Cheap Storage Of IPFS Hashes In Solidity Smart Contracts

Recently while working on a new project, I encountered the need to store IPFS Hashes, in particular CIDv1 type on the ethereum blockchain, in a user-friendly but also gas efficient manner. Why CIDv1? Well CIDv0 uses sha2-256, whose output size does not fit into a single bytes32 storage slot. From what I've seen in the wild, anytime people need to store IPFS hashes in smart contracts, it's almost always a hash of the IPFS CID stored in a bytes32 storage variable, or the IPFS hash itself stored in a string storage variable. Working with CIDv1 does not suffer from this :)

While there is nothing wrong with either of those two approaches, they are suboptimal. Hashing a hash simply to fit into a single storage slot makes it significantly harder to consume and is only worthwhile if you need to do so for security reasons. Storing the IPFS hash in a string storage variable is very expensive in general, and also suboptimal.

The trick here is being able to fit into as few, but fully occupied storage slots as possible (efficient), while being easy to consume (usable) and using minimal gas (cheap). This sounds simple in theory as all you need to do is find a hash function that takes up a single bytes32 storage variable, or two bytes32 storage variables, but because IPFS uses multiformats (or more appropriately, multihash) this isn't as easy as it sounds. To figure this out, we're going to need to talk a bit about multihash.

Multihash

Multihash is a subset of the multiformat specification and allows us to create self describing hashes. The format for multihash looks like (image is from github.com/multiformats/multiformats):

So given the above, this means that even if you were able to find a hash function whose output is 32 bytes, multihash would then add an additional 4 bytes onto that for a total of 36 bytes. This means we then need to store it in a bytes storage variable which has a single word overhead.

Solution

This then lead me to the question, what if we can find a hash function which when in multihash format takes up a single bytes32 storage slot we would be good to go! Well not quite, the only multihash that did this was blake2b-136, which go-ipfs nodes don't accept by default due to security risks... yikes!

After several hours of different experimentation (aka blindly trying multihashes), I was finally able to find a hash function which when in multihash the output is 64 bytes, which means we can store this in exactly two bytes32 storage slots, completely filling two slots and not wasting any. The multihash is blake2b-328. So to use this, all we need to do is take the 64 bytes output, split it in two and we're good to go!

Example

To keep things short, I'll demonstrate the most optimal solution I found after trying a few different combinations and ways to store two bytes32 storage variables.

The first step is two define two parts of the hash, hashPart1 and hashPart2. In order to store our IFPS hashes here, we need to take the 64 bytes output of the blake2b-328 multihash, split it in half, storing each half within a 2 element array of bytes32 type, passing that into the function updateHash.

Now, whenever we want to consume this data, all we have to do is call getHash which will return the complete hash in bytes type. If you're consuming this in a mobile phone DApp, then all you need to do is convert to string, which in golang would be string(returnedBytes) and you have your IPFS Hash in plaintext!

pragma solidity 0.5.7;

contract Hash {
    
    bytes32 public hashPart1;
    bytes32 public hashPart2;


    function updateHash(bytes32[2] memory _hash) public returns (bool) {
        hashPart1 = _hash[0];
        hashPart2 = _hash[1];
        return true;
    }

    function getHash() public view returns (bytes memory) {
        bytes memory joined = new bytes(64);
        bytes32 h1 = link.currentHash[0];
        bytes32 h2 = link.currentHash[1];
        assembly {
            mstore(add(joined, 32), h1)
            mstore(add(joined, 64), h2)
        }
        return joined; 
    }

}

So does this actually save gas, or did I just waste your time?

Gas Consumption

To test gas consumption, i wrote the following fairly ugly test contract to measure gas consumption. To get cas costs, I would check the CumulativeGasUsed field of the transaction receipt. Tests were ran on a local private PoA chain on my laptop with a 1sec block time, using geth 1.8.27-stable.

The functions updateLink, updateLinkHash and updateLinkParts are used to test gas costs from different ways of storing data in two bytes32 storage slots. The function setCID is used to test gas costs for storing a hashed IPFS hash.. The function setCIDString was used to test gas costs from storing the plaintext (aka string) version of the IPFS hash.

pragma solidity 0.5.7;

contract GasTest {

    string public hashString;
    bytes32 public hash;
    bytes32[2] public linkHash;
    bytes32 public linkPart1;
    bytes32 public linkPart2;
    LinkObject private link;

    struct LinkObject {
        bytes32[2] currentHash;
    }

    function updateLink(bytes32[2] memory _newHash) public returns (bool) {
        link.currentHash = _newHash;
        return true;
    }

    function updateLinkHash(bytes32[2] memory _newHash) public returns (bool) {
        linkHash = _newHash;
        return true;
    }

    function updateLinkParts(bytes32[2] memory _newHash) public returns (bool) {
        linkPart1 = _newHash[0];
        linkPart2 = _newHash[1];
        return true;
    }

    function setCID(bytes32 _cid) public returns (bool) {
        hash = _cid;
        return true;
    }


    function setCIDString(string memory _cid) public returns (bool) {
        hashString = _cid;
        return true;
    }
}

And the results of cumulative gas usage from the above contract is

FunctionCumulative Gas Used
updateLink66360
updateLinkHash66426
updateLinkParts66071
setCID43810
setCIDString86213

Initially you might be looking at the gas cost for setCID and start thinking that I just wasted your precious time. However, we need to consider the fact that this isn't actually just the IPFS hash. It is a hash, of the IPFS hash. So while this may be gas efficient, it is not easy to consume outside of smart contracts, and is abysmal at best to consume within other smart contracts because:

  1. We need to store a plaintext copy of the hash somewhere accessible by the smart contract (storage)
  2. We need to read the plaintext data from storage, hash it, then compare the two hashed hashes.

Now after considering that, the gas prices for the hash storage methods being talked about here (66071 -> 66360 depending on the method being used), combined with the fact that there you can store+consume the hashes as is, seems pretty useful in my eyes.

Takeways

  • Use blake2b-328
  • Cast to bytes
  • Store in 2 bytes32 storage slots
  • ???
  • profit!

Coin Marketplace

STEEM 0.20
TRX 0.24
JST 0.038
BTC 97120.80
ETH 3374.08
USDT 1.00
SBD 3.13