Large Node Operator

Blueprint provided by kiln

Stakeholder Overview

Group

Stakeholder

Level of Engagement

Comms. Channels

Institutional Stakers

Enterprise customers (eg Bitpanda)

Communicating the monthly performance

Communication if there is a major outage that might affect the SLAs

Slack, fortnightly meeting

Status page: https://status.kiln.fi/

Service Partners

Liquid staking protocol customers (eg Lido)

Communication with them on any outage, communication of postmortems (not just with the LST protocol but also with other validators where relevant) Performing tests for them (eg holesky web3 signer scale test)

Software Providers

3rd party software teams (eg Web3 signer, clients Teku, Prysm)

Share bugs and issues

Telegram, Github issues

Auditors

Auditors (eg., Quantstamp)

Share any outage and postmortem. Share architectural designs/changes

Slack

Communities

Ethereum foundation

Organizing talks, sharing some feedback on the latest hot topics (upgrades, now timing games)

Best Practices

Try not to have more than one active channel per stakeholder.
Define incident response plan s to be prepared for potential incidents.
Aim to follow the Google SRE incident management practices.
Focus on declaring incidents fast/easily, stop the bleeding, and try to reduce over-communication.

Hot Topics

Continuous training of people on incident management
Ensuring you have a good on-call rotation
Reduce over-communication

Tools in Use

For incident communications incident.io is used. It helps streamline internal resolution, helps write the post-mortem, and publishes to https://status.kiln.fi/
Worknet as a slack bot to do bulk messaging to groups of customer slack channels.
Standard tools like grafana & pagerduty

Effectiveness Metrics

Are SLAs being met? (ETH target is 99% uptime, like coinbase, so there is typically enough buffer)
Are the customers happy with the level of communication?

PreviousEcosystem Blueprint

Last updated 1 year ago

Was this helpful?