How To Calculate Ethereum Classic Root Hashes
The Ethereum Classic (ETC) blockchain contains "root hashes" that help maintain the integrity of various components of the ETC system. I will describe these root hashes including how to calculate them.
Introduction
Some important ETC data structures are sets of key value pairs that are stored as Merkle Patricia tries. Tries are trees of nodes. The top nodes correspond to the "roots" of the trees. Therefore, hashes associated with the top nodes of Merkle Patricia tries are referred to as root hashes. Specifically, root hashes are the Keccak 256 hashes of the Recursive Length Prefix (RLP) encodings of the top nodes.
Headers
ETC block headers contain root hashes for states, transaction lists and receipt lists. ETC block headers also implicitly specify storage root hashes in the state root hashes.
Code
Here is Python code that implements RLP encoding and decoding:
import math
BYTE_LEN = 8
def n_bytes(integer):
"""
Finds the numbers of bytes needed to represent integers.
"""
return math.ceil(integer.bit_length() / BYTE_LEN)
def get_len(input, extra):
"""
Finds the lengths of the longest inputs using the given extra values.
"""
n_bytes = input[0] - extra
return 1 + n_bytes + int.from_bytes(input[2:2 + n_bytes], "big")
def encode(input):
"""
Recursive Length Prefix encodes inputs.
"""
if isinstance(input, bytes):
body = input
if (len(body) == 1) and (body[0] < 128):
header = bytes([])
elif len(body) < 56:
header = bytes([len(body) + 128])
else:
len_ = len(body)
len_ = len_.to_bytes(n_bytes(len_), "big")
header = bytes([len(len_) + 183]) + len_
result = header + body
else:
body = bytes([])
for e in input:
body += encode(e)
if len(body) < 56:
header = bytes([len(body) + 192])
else:
len_ = len(body)
len_ = len_.to_bytes(n_bytes(len_), "big")
header = bytes([len(len_) + 247]) + len_
result = header + body
return result
def decode(input):
"""
Recursive Length Prefix decodes inputs.
"""
if input[0] < 128:
result = input
elif input[0] < 184:
result = input[1:]
elif input[0] < 192:
result = input[1 + (input[0] - 183):]
else:
result = []
if input[0] < 248:
input = input[1:]
else:
input = input[1 + (input[0] - 247):]
while input:
if input[0] < 128:
len_ = 1
elif input[0] < 184:
len_ = 1 + (input[0] - 128)
elif input[0] < 192:
len_ = get_len(input, 183)
elif input[0] < 248:
len_ = 1 + (input[0] - 192)
else:
len_ = get_len(input, 247)
result.append(decode(input[:len_]))
input = input[len_:]
return result
Here is Python code that calculates root hashes using the PySHA3 package. It requires the RLP code above to be saved to an accessible location with the file name rlp.py. Invoke the root_hash function on Python dictionaries representing sets of ETC key value pairs. All keys and key values must be Python byte strings:
import sha3
import rlp
HASH_LEN = 32
HEXADEC = 16
def remove(dict_, segment):
"""
Removes initial key segments from the keys of dictionaries.
"""
return {k[len(segment):] : v for k, v in dict_.items()}
def select(dict_, segment):
"""
Selects dictionary elements with given initial key segments.
"""
return {k : v for k, v in dict_.items() if k.startswith(segment)}
def find(dict_):
"""
Finds common initial segments in the keys of dictionaries.
"""
segment = ""
for i in range(min([len(e) for e in dict_.keys()])):
if len({e[i] for e in dict_.keys()}) > 1:
break
segment += list(dict_.keys())[0][i]
return segment
def patricia_r(dict_):
"""
Creates Patricia tries that begin with regular nodes.
"""
pt = (HEXADEC + 1) * [None]
if "" in dict_:
pt[-1] = dict_[""]
del(dict_[""])
for e in {e[0] for e in dict_.keys()}:
pt[int(e, HEXADEC)] = patricia(remove(select(dict_, e), e))
return pt
def patricia_s(dict_):
"""
Creates Patricia tries composed of one key ending special node.
"""
pt = list(dict_.items())[0]
if len(pt[0]) % 2 == 0:
pt = (bytes.fromhex("20" + pt[0]), pt[1])
else:
pt = (bytes.fromhex("3" + pt[0]), pt[1])
return pt
def patricia(dict_):
"""
Creates Patricia tries from dictionaries.
"""
segment = find(dict_)
if len(dict_) == 1:
pt = patricia_s(dict_)
elif segment:
dict_ = remove(dict_, segment)
if len(segment) % 2 == 0:
pt = [bytes.fromhex("00" + segment), patricia_r(dict_)]
else:
pt = [bytes.fromhex("1" + segment), patricia_r(dict_)]
else:
pt = patricia_r(dict_)
return pt
def merkle(element):
"""
Encodes Patricia trie elements using Keccak 256 hashes and RLP.
"""
if not element:
merkle_ = b""
elif isinstance(element, str):
merkle_ = bytes.fromhex(element)
elif isinstance(element, bytes):
merkle_ = element
else:
merkle_ = [merkle(e) for e in element]
rlp_ = rlp.encode(merkle_)
if len(rlp_) >= HASH_LEN:
merkle_ = sha3.keccak_256(rlp_).digest()
return merkle_
def merkle_patricia(dict_):
"""
Creates Merkle Patricia tries from dictionaries.
"""
return [merkle(e) for e in patricia(dict_)]
def root_hash(dict_):
"""
Calculates root hashes of Merkle Patricia tries from dictionaries.
"""
dict_ = {k.hex() : v for k, v in dict_.items()}
return sha3.keccak_256(rlp.encode(merkle_patricia(dict_))).hexdigest()
Examples
Here are sample calculations for all of the root hash types found in the ETC blockchain. They require the root hash code above to be saved to an accessible location with the file name root_hash.py. The RLP code above must be saved to an accessible location with the file name rlp.py. Lastly, the following code to convert integers to Python byte strings must be saved to an accessible location with the file name int_to_bytes.py:
def int_to_bytes(number):
if number:
hex_ = hex(number)[2:]
if len(hex_) % 2 != 0:
hex_ = "0" + hex_
result = bytes.fromhex(hex_)
else:
result = b""
return result
State Root Hashes
For state root hash calculations, the keys of the Python dictionaries must be the Keccak 256 hashes of the account addresses. The key values must be the RLP encodings of lists containing the corresponding account nonces, balances, storage root hashes, and, smart contract hashes. One way to obtain state information is with an ETC Geth node. For example, the following ETC Geth node command prints the state information for block 1,000,000:
geth dump 1000000
Here is the beginning of the voluminous output:
{
"root": "0e066f3c2297a5cb300593052617d1bca5946f0caa0635fdb1b85ac7e5236f34",
"accounts": {
"843fd22c88d59e57ae1856a871a5d95e95b0a656": {
"balance": "52500000000000",
"nonce": 1,
"root": "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
"codeHash": "c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
"code": "",
"storage": {}
},
"dcd0b6fa4f0a26a7b12325b0d09b5b809c5aef84": {
"balance": "9375377890126000",
"nonce": 1,
"root": "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
"codeHash": "c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
"code": "",
"storage": {}
},
"7d62878a7235e95d56f802f80835543cac711f90": {
"balance": "204544100000000000",
"nonce": 0,
"root": "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
"codeHash": "c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
"code": "",
"storage": {}
},
"67db390312dc02a140c358add4f37966c7775096": {
"balance": "0",
"nonce": 2,
"root": "56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421",
"codeHash": "c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470",
"code": "",
"storage": {}
},
...etc.
The following code prints the state root hash for block 1,000,000 which is 0x0e066f3c2297a5cb300593052617d1bca5946f0caa0635fdb1b85ac7e5236f34. It requires the aforementioned state information to be saved to an accessible location with the file name state_1000000:
import root_hash
import sha3
import rlp
import int_to_bytes
dict_ = {}
state = eval(open("state_1000000", "r").read())
for address in state["accounts"]:
account = state["accounts"][address]
account = [int_to_bytes.int_to_bytes(int(account["nonce"])),
int_to_bytes.int_to_bytes(int(account["balance"])),
bytes.fromhex(account["root"]),
bytes.fromhex(account["codeHash"])]
key = sha3.keccak_256(bytes.fromhex(address)).digest()
value = rlp.encode(account)
dict_[key] = value
print(root_hash.root_hash(dict_))
Transaction List Root Hashes
For transaction list root hash calculations, the keys of the Python dictionaries must be the RLP encodings of the transaction indices starting from zero. The key values must be the RLP encodings of lists containing the corresponding transaction nonces, gas prices, gas usage maxima, destination addresses, ether sent, data sent and digital signature components. The following code prints the transaction list root hash for the transactions in block 4,000,003 which is 0xad79d498b7e407d3a2b32c13a380ee93635da2b3e0696c39563cbd5c32d368b2:
import root_hash
import sha3
import rlp
import int_to_bytes
key_1 = rlp.encode(int_to_bytes.int_to_bytes(0))
nonce = int_to_bytes.int_to_bytes(1514565)
gas_price = int_to_bytes.int_to_bytes(20000000000)
gas_max = int_to_bytes.int_to_bytes(50000)
dest = 0x7b96a5006d5fc86d05f8799fe1fc6f7d23b24969
dest = int_to_bytes.int_to_bytes(dest)
ether = int_to_bytes.int_to_bytes(1001525814273650153)
data = b""
v = int_to_bytes.int_to_bytes(157)
r = 0x8815ebbcdb56717a30193db4629fa7565d2fb06c6fba2aaf0db06deaf932955d
r = int_to_bytes.int_to_bytes(r)
s = 0x4dbd4dcb648114859f57122d804b85c2dd60d0b502fb93d0ef770d50bfa3a59d
s = int_to_bytes.int_to_bytes(s)
trans = [nonce, gas_price, gas_max, dest, ether, data, v, r, s]
value_1 = rlp.encode(trans)
key_2 = rlp.encode(int_to_bytes.int_to_bytes(1))
nonce = int_to_bytes.int_to_bytes(43565)
gas_price = int_to_bytes.int_to_bytes(20000000000)
gas_max = int_to_bytes.int_to_bytes(21000)
dest = 0x7ccfb3028404225e4e9da860f85274e30ccc9275
dest = int_to_bytes.int_to_bytes(dest)
ether = int_to_bytes.int_to_bytes(109404508089999998976)
data = b""
v = int_to_bytes.int_to_bytes(28)
r = 0x8d6e2fcfe032d2612d2ea56da6d07b6a94004a4ec7cbe2c3f086db1a194aa679
r = int_to_bytes.int_to_bytes(r)
s = 0x6ed1333497c12b4549e55d117977bf60bb96872dfb05816fb7ce25c7396ef23a
s = int_to_bytes.int_to_bytes(s)
trans = [nonce, gas_price, gas_max, dest, ether, data, v, r, s]
value_2 = rlp.encode(trans)
print(root_hash.root_hash({key_1 : value_1, key_2 : value_2}))
Receipt List Root Hashes
For receipt list root hash calculations, the keys of the Python dictionaries must be the RLP encodings of the receipt indices starting from zero. The key values must be the RLP encodings of lists containing the corresponding receipt state root hashes, cumulative gas amounts, log Bloom filters and logs. The following code prints the receipt list root hash for the receipts in block 4,000,003 which is 0x4b3b43affc2927a152b9d6f18e378cf33671f8606e8549de292ae36b8a691584:
import root_hash
import sha3
import rlp
import int_to_bytes
key_1 = rlp.encode(int_to_bytes.int_to_bytes(0))
state = "abca6dd8fb332962c1c14c02d13b2082aee152496dc809d9642e2deca07fb7c2"
gas = 0x5208
bloom = 256 * "00"
logs = []
receipt = [bytes.fromhex(state),
int_to_bytes.int_to_bytes(gas),
bytes.fromhex(bloom),
logs]
value_1 = rlp.encode(receipt)
key_2 = rlp.encode(int_to_bytes.int_to_bytes(1))
state = "029b0eb2c76ff08a1cf47aba4be53ff1c20b01026206eca248b47e0657f97524"
gas = 0xa410
bloom = 256 * "00"
logs = []
receipt = [bytes.fromhex(state),
int_to_bytes.int_to_bytes(gas),
bytes.fromhex(bloom),
logs]
value_2 = rlp.encode(receipt)
print(root_hash.root_hash({key_1 : value_1, key_2 : value_2}))
Storage Root Hashes
For storage root hash calculations, the keys of the Python dictionaries must be the Keccak 256 hashes of the storage indices for all nonzero storage values. The key values must be the RLP encodings of the corresponding storage values. The following code prints the storage root hash for the account with the address 0xd4eae4ae8565f3ecf218191fb267941d98a2c77a which is 0x9f630ea9c8cc6e9f7ecbc08cb7f9e901c14b788cc8f2ae64e3134cf3cb089f55. Note that this result was correct as of block 5,874,861 but may possibly change afterwards:
import root_hash
import sha3
import rlp
import int_to_bytes
KEY_LEN = 32
ZERO = b"\x00"
dict_ = {}
storage = [(0, 0x51f24771a5a2720456076e7c81d59753dac20e1f),
(1, 0x4563918244f40000),
(3, 0x55c64da8),
(4, 0x6f05b59d3b20000),
(5, 0x4fb5acbe16ffdda225cb14c64aa84c7e253b08ae)]
for e in storage:
key = int_to_bytes.int_to_bytes(e[0])
key = (KEY_LEN - len(key)) * ZERO + key
key = sha3.keccak_256(key).digest()
value = rlp.encode(int_to_bytes.int_to_bytes(e[1]))
dict_[key] = value
print(root_hash.root_hash(dict_))
Conclusion
Root hashes are vital for the operation of the ETC world computer. The ETC system utilizes state, transaction list, receipt list and storage root hashes. These ETC root hashes can be found with a detailed recipe involving RLP encodings, Keccak 256 hashes and Merkle Patricia tries.
Feedback
Feel free to leave any comments or questions below. You can also contact me by clicking any of these icons:
Acknowledgements
I would like to thank Moe Aboulkheir and Tomasz Zdybał for their invaluable help debugging some of the code in this paper. I would also like to thank IOHK (Input Output Hong Kong) for funding this effort.
License
This work is licensed under the Creative Commons Attribution ShareAlike 4.0 International License.
This is a bit over my knowledge level but still i love to see content about ETC. Love the team and ideology behind this blockchain.
I'm also mining it and hodl.
Congratulations @cseberino! You have received a personal award!
2 Years on Steemit
Click on the badge to view your Board of Honor.
Do not miss the last post from @steemitboard: