Site Reliability Engineer Job at Infinity Quest UK, Remote

T1h6b0JCbkZxZmVCamtEOVpjUGwwYkFnWXc9PQ==
  • Infinity Quest UK
  • Remote

Job Description

Primary Responsibilities:

  • Work closely with Product Engineering team and implement strategies for modernizing IT operations enhancing observability and toil reduction.
  • Architect and deploy observability platforms to monitor system health, performance, and reliability effectively.
  • Propose & drive strategies for AI-driven alerting and proactive anomaly detection to reduce MTTD & MTTR.
  • Develop and enforce SRE best practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.
  • Establish & create AIOPS roadmap for improving operational efficiency.
  • Lead efforts to automate repetitive tasks (toil) using scripting, orchestration tools, and AI/ML-based solutions.
  • Drive toil automation initiatives for automated incident responses & self-healing automation for achieving autonomous operations.
  • Collaborate with cross-functional teams to ensure systems are scalable, resilient, and maintainable.
  • Drive incident management and root cause analysis processes through automation, ensuring continuous improvement to enable autonomous operations.
  • Partner with engineering, architecture, and product teams to enable shift-left engineering practices ensuring reliability.
  • Mentor and guide teams on adopting SRE principles and tools.
  • Advocate for a culture of reliability, automation, and continuous improvement across the organization.

Key Skills:

  • Strong expertise in implementing Site Reliability Engineering (SRE) principles.
  • Advanced knowledge of establishing observability using tools Dynatrace & Datadog (primary skills).
  • Proficiency in automation & scripting using Python & Ansible (primary skills).
  • Strong experience with cloud platforms AWS & Azure (primary skills).
  • Solid understanding of containerization and orchestration tools like Docker and Kubernetes .
  • Proficiency in cloud native distributed systems & microservices architecture.
  • Exposure to AI/ML techniques for predictive analytics and automated problem resolution.
  • Familiarity with CI/CD pipelines & enabling automated release & deployment engineering solutions.
  • Good to have experience with chaos engineering tools like Gremlin or Chaos Monkey and implementing automation frameworks for resilience tracking.
  • Ability to manage and prioritize multiple projects in a fast-paced environment.
  • Strong interpersonal and communication skills to work effectively across teams.
  • Excellent problem solving, analytical thinking, and adaptability.
  • Strategic mindset balancing engineering excellence with business priorities.

Preferred Qualifications:

  • 12+ years of experience in IT operations, SRE, or DevOps roles.
  • Proven track record of SRE experience in implementing observability and automation solutions in large-scale environments.
  • Certifications in cloud platforms, observability tools & other SRE related areas.

Job Tags

Shift work,

Similar Jobs

Pursuit Collection

Dishwasher Job at Pursuit Collection

Job Description At Pursuit, we offer more than just a place to visit, we create opportunities for our guests to truly connect with iconic destinations. Our experiences include world-class attractions and distinctive lodges, all designed to highlight the unique beauty...

undefined

Merchandiser Job at undefined

 ...others successes. Heres to crafting careers and creating new legacies. Crafted Highlights : In the role ofMerchandiser working in Denver, CO you will be part of the Merchandising team. You will be a supporting role to the Distributor Sales Team. The... 

Johnson Controls

Experienced Sprinkler Inspector Job at Johnson Controls

 ...sustainable smart building solutions. We are currently seeking a highly skilled and motivated Experienced Sprinkler Inspector to join our Engineering & Maintenance team. In this role, you will conduct detailed inspections and maintenance of fire sprinkler systems and other fire... 

Genie Healthcare

Travel MRI Technologist (Siemens MRI) Job at Genie Healthcare

Job Description Genie Healthcare is seeking a travel MRI Technologist for a travel job in Mayfield Heights, Ohio. Job Description & Requirements ~ Specialty: MRI Technologist ~ Discipline: Allied Health Professional ~ Start Date: 01/05/2026~ Duration: 1...

Sysco

Senior Merchandiser (Protein Buyer) Job at Sysco

 ...information in the system, as required. (pricing, freight, brackets, ship points, etc.). Inter-department training and on-the-floor merchandising support. QUALIFICATIONS: Education: High School diploma or equivalent required. Bachelors degree preferred....