Debug missing blocks
The scope of this page is a guide for debugging missing blocks. This would be helpful for SPOs would have clear steps to look after when there is a block missing and you will also know what evidence to provide/gather when there is an issue.
How do you know that your pool did not produce a block when it was selected as a Slot Leader?
-
cardano_node_metrics_Forge_forge_about_to_lead_int
-cardano_node_metrics_Forge_node_not_leader_int
!= 0 -
cardano_node_metrics_Forge_could_not_forge_int
!= 0 -
You see a block scheduled for your pool (using a leader scheduler based on your vrf key - like cncli tool) but the pool does not create it and none of the above metrics are updated (like the pool did not know that it had a block assigned)
1. Check the submittedvrf keys
in the pool registration certificate.2. Check the
cold.counter
was correctly increased based on the previous one.-
In this case, you can increase/generate the counter multiple times in order to make sure the current counter is higher than the previous one that created the last block.
-
If the submitted
cold.counter
is lower than the previous one that created a block, you should see the below error in logs:
-
Invalid block aa98999f126e90cdcf441db2ac818fe14c196efbf3e74b2fd07f0f91ddb281ba at slot 21351198: ExtValidationErrorHeader (HeaderProtocol
Error (HardForkValidationErrFromEra S (S (S (Z (WrapValidationErr {unwrapValidationErr = ChainTransitionError [OverlayFailure (OcertFailure (CounterTooSmallOCERT 12 0))]}))))))
fromList [("val",Object (fromList [("kind",String "TraceForgedInvalidBlock"),("reason",Object (fromList [("error",Object (fromList [("err
or",Object (fromList [("failures",Array [Object (fromList [("currentKESCounter",String "0"),("error",String "The operational certificate's last KES counter is greater than the current
KES counter."),("kind",String "CounterTooSmallOCert"),("lastKESCounter",String "12")])]),("kind",String "ChainTransitionError")])),("kind",String "HeaderProtocolError")])),("kind",St
ring "ValidationError")])),("slot",Number 2.1351198e7)])),("credentials",String "Cardano")]
What to look for:
-
Check the Producer logs - here we should see if the node tried to create any block.
-
Check the Relay’s logs - here we should see if the relays propagated the block.
-
The Pledge is respected (you can check it in adapools.org).
-
If the pledge is not respected, the pool will create blocks but it will not receive rewards.
-
-
The KES keys are still valid.
-
The actual cold.counter value is higher than the last counter number that created the last block.
-
The CPU and RAM levels.
-
The producer being in sync with the relays around the time of the slot/block.
-
Check that a block was created by any other pool in the expected slot_number (on https://adapools.org/blocks) → if a block was adopted by the blockchain in the expected slot_number, but it was created by a different pool, that means that there was a slot battle (more pools selected as leaders for that slot) and other pool owned.
-
Check the logs of the 3 nodes around the time of the expected slot_no
-
Check if there are any mentions of
CannotForge
-
Useful debug commands
-
Check that you are using the correct
cold.vkey
file → this should return thepool ID
cardano-cli stake-pool id \
--cold-verification-key-file cold.vkey
-
check the Kes and Cold vkeys used inside the
pool_operational.cert
-
Top byte is
kes.vkey
-
Bottom byte is
cold.vkey
-
The first int is
counter
, the second int iskes period
-
cardano-cli text-view decode-cbor \
--in-file pool_operational.cert
- Check the
VRF
keys
cardano-cli node key-hash-VRF \
--verification-key-file vrf.vkey
cardano-cli key verification-key \
--signing-key-file vrf.skey \
--verification-key-file vrf.vkey
-
Check the
KES
keys
cardano-cli key verification-key \
--signing-key-file kes.skey \
--verification-key-file /dev/stdout
-
Check the on-chain version number of the
cold.counter
cardano-cli query protocol-state --mainnet|jq ".csProtocol[0].\"$(cardano-cli stake-pool id --cold-verification-key-file node1-cold.vkey --output-format hex)\""
cardano-cli query protocol-state --mainnet | \
jq ”.csProtocol[0].\”$(cat stakepoolid.txt)\””
Open questions:
-
How to find at the node level, the slots the node was elected to lead and what the node did during those slots?
→leadership schedule
per epoch will be added to the node CLI.
-
How to find the keys used in the pool registration certificate?
→ The above commands might provide some help.
→ We requested to have the key details printed on logs when starting the node.
-
How to find the actual cold.counter value (and make sure you increment it correctly when renewing the KES)
→Check the above example. -
What is the value of the node performance - get this value directly from the node (CLI or logs)?
→ A node should in theory be able to calculate its actual performance based on slots it was assigned to be able to compare that to the performance reported by the ledger for the ranking.
KES Renewal
-
Generate a new KES key pair.
cardano-cli node key-gen-KES \
--verification-key-file kes.vkey \
--signing-key-file kes.skey
2. Generate a new pool operational certificate using:
- The existing
cold.skey
file - The newly created
kes.vkey
file - The previously
cold.counter
file - The actual
KES period
cardano-cli node issue-op-cert \
--kes-verification-key-file kes.vkey \
--cold-signing-key-file cold.skey \
--operational-certificate-issue-counter cold.counter \
--kes-period 197 \
--out-file pool_operational.cert
3. Restart the node with the new kes.skey
and pool_operational.cert
files.
Please also check the following articles: