Incident Report: Some validators missing blocks due to version conflict

The issue where it seemed like some validators were not producing blocks is now resolved. In summary, the issue is caused due to conflicts that arose with multiple client versions.

Key Insights:

  • Mainnet 3 (v0.16.3) introduced new API v1beta2, an upgrade from v1beta1. For example, akash.deployment.v1beta1.MsgCreateDeployment TX type is now akash.deployment.v1beta2.MsgCreateDeployment

  • This upgrade means v0.16.3 will no longer parse v1beta1, The assumption was this is fine as long as there are no new v1beta transactions are being create. Unfortunately, not everyone upgraded to Mainnet 3 (v0.16.3) and we started noticing bug reports

  • To enable newer client to process older message types, we introduced reverse compatibly with Patch #1605 that released with v0.16.4

  • Naturally, not everyone upgraded to v0.16.4 and majority of validators still run v0.16.3

  • A client using version v0.14.x sent akash.cert.v1beta1.MsgCreateCertificate TX that was received by an RPC server on v0.16.4 and ended up in validators mempools, during block commit validators on v0.16.3 rejected transaction as invalid whereas ones v0.16.4 included it into the block.

  • The conflict arose when the majority set on v0.16.3 rejected the block proposed by the minority set on v0.16.4 as they deemed akash.cert.v1beta1.MsgCreateCertificate invalid message type.

  • The conflict was resolved when v0.16.4 validators rolled back the “bad” TX and resync with a good state approved by the majority set on v0.16.3. Validators are currently in the process of resolving version parity by upgrading to v0.16.4.

2 Likes