Large Node Operator

Blueprint provided by kiln

Stakeholder Overview

GroupStakeholderLevel of EngagementComms. Channels

Institutional Stakers

Enterprise customers (eg Bitpanda)

Communicating the monthly performance

Communication if there is a major outage that might affect the SLAs

Slack, fortnightly meeting

Status page: https://status.kiln.fi/

Service Partners

Liquid staking protocol customers (eg Lido)

Communication with them on any outage, communication of postmortems (not just with the LST protocol but also with other validators where relevant) Performing tests for them (eg holesky web3 signer scale test)

Telegram

Software Providers

3rd party software teams (eg Web3 signer, clients Teku, Prysm)

Share bugs and issues

Telegram, Github issues

Auditors

Auditors (eg., Quantstamp)

Share any outage and postmortem. Share architectural designs/changes

Slack

Communities

Ethereum foundation

Organizing talks, sharing some feedback on the latest hot topics (upgrades, now timing games)

Telegram

Best Practices

  1. Try not to have more than one active channel per stakeholder.

  2. Define incident response plans to be prepared for potential incidents.

  3. Focus on declaring incidents fast/easily, stop the bleeding, and try to reduce over-communication.

Hot Topics

  1. Continuous training of people on incident management

  2. Ensuring you have a good on-call rotation

  3. Reduce over-communication

Tools in Use

  1. For incident communications incident.io is used. It helps streamline internal resolution, helps write the post-mortem, and publishes to https://status.kiln.fi/

  2. Worknet as a slack bot to do bulk messaging to groups of customer slack channels.

  3. Standard tools like grafana & pagerduty

Effectiveness Metrics

  1. Are SLAs being met? (ETH target is 99% uptime, like coinbase, so there is typically enough buffer)

  2. Are the customers happy with the level of communication?

Last updated