Regain Keys to Kingdom
Worked with executives to perform a greybox system inventory, and output a report showcasing all the components of their system that we w...
Worked with executives to perform a greybox system inventory, and output a report showcasing all the components of their system that we were able to find, allowing them to transition away from their current developer and hire an IT team providing staff augmentation services.
Reporting
Identity Management
Auditing
View more
Reporting
Identity Management
Auditing
View more
Observability Consultation
As an experienced SRE consultant, I have had the opportunity to work closely with multiple teams within customer organizations, providing...
As an experienced SRE consultant, I have had the opportunity to work closely with multiple teams within customer organizations, providing valuable insights and guidance on their observability stance. By focusing on areas such as alerting, dashboards, logging, metrics, and tracing, I have contributed to enhancing their incident response capabilities. One of my key contributions has been collaborating with teams to develop Service Level Indicators (SLIs) that accurately represent the core issues behind incidents, ultimately leading to actionable alerts. Additionally, I have played a pivotal role in laying the groundwork for the development of Service Level Objectives (SLOs) using Prometheus, Sloth, and Grafana, thereby enabling teams to set and achieve meaningful performance targets. I also encountered a particularly challenging incident response scenario when working with a specific team in the organization, which had a system deployed in various ways with inconsistent versioning. I helped them identify the path of least resistance to achieve more consistency and visibility, leading to the creation of a synthetic testing framework that allowed other teams to build and execute tests via a webhook. This framework enabled developers to easily develop synthetic tests, provide a webhook for execution, and create scheduled tasks for ongoing testing to catch common error scenarios. To identify the path of least resistance for achieving consistency and visibility, we created a matrix outlining the systems and their versions, then determined the most efficient route to upgrade the systems. Additionally, we added custom metrics/logging and developed synthetics for known failure scenarios to enhance visibility and proactively detect issues.
Kubernetes
Grafana
Prometheus
View more
Kubernetes
Grafana
Prometheus
View more