Senior SRE - AI Infra Lead

New

Skills

AWS Distributed systems Gcp Kubernetes Llm Systems Python React Terraform Typescript

Job Overview

As a Senior Site Reliability Engineer, you will be responsible for owning fleet reliability for Portal's SaaS infra, including LLM workflows. You will define SLOs and capacity plans to scale the product and architect infra on GCP and AWS using Terraform for AI workloads. This role involves driving incident management, on-call duties, postmortems, and enabling self-healing mechanisms. Additionally, you will lead fullstack reliability across TypeScript, React, Python, mentor engineers, and shape the infra roadmap with AI features.

Responsibilities
  • Own fleet reliability and define SLOs
  • Architect infra on GCP and AWS using Terraform
  • Lead incident management, on-call, and postmortems
  • Mentor engineers and shape infra roadmap with AI features
  • Build automation to prevent operational issues
Requirements & Qualifications
  • 5+ years of experience operating cloud infra (GCP/AWS)
  • Experience with Terraform and Kubernetes
  • Proficient in TypeScript, Java, Go, or Python
  • Knowledge of distributed systems concepts
  • Ability to communicate effectively and write impactful postmortems

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12 Months

Share this job:

Similar Jobs

ML Data Service Lead

New

Define roadmap and strategy for ML Data Service platform

Architect scalable data systems

AWS Azure Cloud Platforms Gcp

Data Architect Engineer

New

Design end-to-end data/analytics architecture

Architect scalable data platforms

AWS Azure Gcp Java

Engineering Manager

New

Oversee planning and execution across multiple components

Mentor and grow engineers for career development

AWS CI/CD Engineering management Git

ML Data Service Lead

New

Define and drive ML Data Service roadmap and strategy

Architect scalable data ingestion and transformation processes

AWS Azure Gcp Spark

CDP Architect

New

Promote and enforce information security practices

Identify and report security risks

AWS Azure Google Cloud PowerBI

Lead Software Engineer, AI Platform

New

Lead design and development of Java backend services

Optimize platform components for high-throughput workflows

AWS Elasticsearch Java Kubernetes

Technical Analyst

New

Design and implement features with cross-functional teams

Build scalable frontend/backend code using specified technologies

Angular AWS C# Gcp

Senior Backend Engineer

New

Take ownership of quarterly goals and lead the engineering team

Collaborate with product, design, and analytics teams on ideation and decisions

AWS Code reviews Debugging Distributed systems

Senior Java Engineer

New

Advance capabilities in various industries

Gain hands-on experience with key technologies

Architecture AWS CI/CD Data Modeling

Software Engineer - Social

New

Build end-to-end social-channel experiences

Capture profiles, consent, and engagement data

Agile Development AWS Code reviews Front-end Development

LLM Software Engineer

New

Implementing and deploying LLM models for conversational agents

Collaborating with cross-functional teams for deployment and scalability

AWS LLMs Machine Learning Numpy

Engineering Manager, Storage

New

Lead team in building automated database operations

Orchestrate node lifecycles and cluster operations for DB systems

AWS Azure Gcp Kubernetes

Sr. Solutions Architect - Public Sector

New

Showcasing Databricks' solutions to address business problems

Providing technical leadership for customer evaluation and adoption

AWS Azure Gcp Java

Senior Cloud Architect - Video

New

Lead end-to-end migration of media workloads to AWS

Collaborate with stakeholders to set cloud requirements

AWS Ci/cd Automation Cloud Architecture Docker

Senior Backend Engineer - EU

New

Lead and manage a team of engineers effectively

Collaborate with cross-functional teams for ideation and constraints

AWS Backend Systems Code Review Distributed systems

Sr. Director Customer Security

New

Serve as the primary security representative in customer engagements.

Articulate security architecture, controls, and risk management across different layers.

Apis AWS Azure Cloud Security

CDP Architect Position

New

Promote and enforce information security practices

Understand and evaluate data storage and processing

AWS Azure Google Cloud Snowflake

Senior Pen Tester

New

Lead and conduct advanced penetration testing

Provide expert guidance on security

Apis AWS Azure Cloud infrastructure

Solutions Engineer

New

Articulate business and technical value of Neo4j to customers

Lead technology evaluation stages in sales cycles

AWS Azure C# Go

Staff AI Engineer

New

Architect and scale AI quality infrastructure

Design end-to-end AI evaluation framework with metrics

Ai/ml AWS CI/CD LLMs

Software Engineer II, Databases

New

Develop reliable and optimized code

Collaborate to reduce technical debt and resolve customer issues

AWS Azure CI/CD Docker

Senior SRE AWS Kubernetes Java DevOps

New

Lead design and implementation of scalable systems on AWS and Kubernetes

Support Java-based fintech apps powering core advisor tools

Ai technologies Automation AWS Computer science

Senior Patching Engineer

New

Implement and maintain standardized patching across various operating systems

Deploy patches using Tanium and troubleshoot deployment issues

Active directory Ansible AWS Puppet

Data Infrastructure Engineer

New

Take ownership of DB and data warehousing infra

Optimize query performance

AWS Data Modeling Distributed systems Mysql

Senior Eng Manager - Platform Architecture

New

Lead and grow a team of engineers

Define and evolve platform architecture for scale

Ai Automation AWS Django

Quantitative Researcher - Investments

New

Build reproducible backtests for models and algorithms

Contribute to differentiated investment services

AWS Java Python R

Advanced Penetration Tester

New

Lead and conduct advanced penetration testing

Perform security assessments on cloud environments

Apis AWS Azure Kubernetes

Senior AI Engineer

New

Develop state-of-the-art voice cloning solutions.

Ensure low latency and cost-effective text to speech capabilities.

AWS Azure Backend Development Docker

Senior Security Manager

New

Define and execute a product security strategy with AI capabilities

Recruit, develop, and retain top security talent

Ai AWS Azure Cloud Security

Applied AI Manager - Partnerships

New

Lead and mentor team of Partner Solutions Architects

Serve as senior tech partner to international GTM teams

Ai technologies AWS Google Cloud Platform Partnership Building

Senior Software Engineer, Inference Project

New

Build and maintain large-scale distributed inference systems.

Develop routing and fleet orchestration across accelerators.

AWS Cloud infrastructure Distributed systems Gcp

Platform Engineer Job

New

Evaluate cloud infrastructure strategy

Design and implement infrastructure-as-code

Ai AWS CI/CD Datadog

Sr. Director Customer Security

New

Serve as primary security representative in customer engagements

Drive technical security discussions

AWS Azure Cloud Security Compliance

CDP Implementation Architect

New

Promote information security practices

Ensure compliance with data privacy standards

AWS Azure Google Cloud Snowflake

Principal AI Engineer

New

Lead design and development of Java-based backend services

Optimize platform components for high throughput and low latency

AWS Distributed systems Elasticsearch Java

Senior Penetration Tester

New

Lead and conduct advanced penetration testing

Define offensive security roadmap

AWS Azure Cloud infrastructure Kubernetes

CX Operations Lead

New

Drive financial fluency and scalability within Customer Experience operations

Enhance high-touch customer experiences through AI and data utilization

Ai AWS Data Analysis Financial Management

Strategic Partnerships & AI Ecosystem Lead

New

Define and lead ISV ecosystem strategy

Expand AWS co-sell partnerships with key players

Ai Anthropic AWS google

Principal Software Engineer - FGA Core

New

Design and develop scalable authorization solutions

Lead optimization of data models and query paths

Analytical Skills AWS Debugging Problem-solving Skills

Senior Software Engineer

New

Drive design and implementation of new features

Break down complex problems into elegant designs

AWS Azure C# Gcp

AI Product Manager

New

Lead product management for cloud cost visibility product

Define vision, roadmap, and strategy for cloud cost features

Analytics Apis AWS Azure

Security Architect Remote Role

New

Serve as primary security representative in customer engagements

Articulate security architecture and controls

AWS Azure Cloud Security Compliance

AI Model Serving Engineer

New

Efficient implementation of AI models

Prioritizing tasks based on urgency and impact

AWS Azure Backend Development Docker

Senior Software Engineer, Infrastructure

New

Shape technical roadmap

Architect and design deployment pipelines

AWS Docker Gcp Go

Director, Strategic Alliances

New

Lead and execute BD & strategic alliances strategy in the Americas region.

Build and manage a cross-functional partnerships team.

AWS Communication E-commerce Fraud Prevention

Senior Consultant, Back Office

New

Design and deploy workflows for investment lifecycle

Optimize platform including data migration

Apis AWS Confluence Jira

Senior Software Engineer

New

Develop and deliver full-stack code

Ensure testing, documentation, and observability standards

AWS Docker Full-stack Development Github

Senior ML Engineer

New

Lead ML projects and client discussions

Optimize ML models for performance and scalability

AWS Azure CI/CD Distributed systems

Backend Engineer

New

Develop and scale Python scripts for data analysis

Ensure accurate financial and regulatory reporting

Automated Testing AWS CI/CD Data Analysis

Data Engineer, Growth Platforms

New

Build and maintain reliable and scalable data pipelines for Growth use cases

Improve data models and schemas to meet evolving needs

Airflow AWS Data Warehousing Etl Processes
overtime