Chaos Engineering Lead

New

Skills

Automation scripting Chaos Engineering Distributed systems Engineer Ios Development Observability Tools Strategy Swift Testing

Goodnotes is seeking a visionary Senior Chaos Engineer to pioneer our reliability program. You will architect and execute chaos engineering strategies, designing creative fault scenarios to uncover system weaknesses and championing a culture of resilience. This remote-friendly role offers the unique opportunity to define, build, and scale chaos practices across our world-class digital productivity platform used by millions globally.

Job Overview
  • Lead the development of a chaos engineering program from the ground up.
  • Drive innovation, experimentation, and strategic reliability improvements.
  • Collaborate across teams to foster resilience and continuous learning.
  • Build tools, processes, and practices to scale chaos engineering at Goodnotes.
  • Enjoy flexibility, meaningful equity, and a culture of growth and empowerment.
Key Responsibilities
  • Define chaos engineering strategy, tools, safety practices, and long-term roadmap.
  • Design and execute fault injection experiments across mobile and backend systems.
  • Simulate real-world failures (latency, outages, resource exhaustion) to expose hidden risks.
  • Automate experiments, track outcomes, and enhance system observability.
  • Establish guardrails for safe, measured, and reversible testing.
  • Facilitate resilience drills, chaos game days, and cross-team engagement.
  • Document findings, communicate insights, and drive actionable engineering improvements.
  • Mentor, hire, and shape the future of the chaos engineering function.
Required Skills & Qualifications
  • Proven experience in chaos engineering or fault injection in distributed, production-scale environments.
  • Strong expertise in Swift programming and iOS platforms.
  • Deep understanding of mobile networking and client-backend interactions.
  • Knowledge of resilience patterns (circuit breakers, bulkheads, timeouts, retries).
  • Experience in incident postmortems, war games, or reliability reviews.
  • Ability to build scripts/tools for automating chaos experiments and analyzing system behavior.
  • Scientific mindset: hypothesis-driven testing and iterative learning.
  • Excellent collaboration and communication skills.
  • Passion for building programs from scratch.
  • Growth mindset—enthusiastic about continuous learning and development.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12 Months

Share this job:

Similar Jobs

Senior Chaos Engineer Role

Posted 32 days ago

Establish a comprehensive chaos engineering program

Design and automate fault injection experiments

Automation scripting Chaos Engineering Distributed systems Engineer

Senior Chaos Engineer Role

Posted 64 days ago

Define and lead chaos engineering initiatives

Improve system reliability through experimentation

Automation scripting Chaos Engineering Distributed systems Engineer

Cloud Endpoint Services Admin

Posted 118 days ago

Administer and automate endpoint management

Ensure security and compliance of endpoints

Automation scripting Azure Bash Cloud
overtime