Principal Site Reliability Engineer, Infrastructure Platform

3 weeks ago


Mountain View, California, United States Groq Full time
Job Description

At Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Site Reliability Engineer, you'll play a crucial role in ensuring the reliability, scalability, and performance of our tools and services.

Responsibilities:
  • Design and implement scalable and reliable architectures for the platform infrastructure.
  • Establish comprehensive monitoring systems to track key performance indicators (KPIs) and identify potential issues.
  • Develop and implement automated testing frameworks to ensure software quality and reliability.
  • Lead the investigation and resolution of production incidents.
  • Work collaboratively with engineering teams to identify and mitigate potential risks.
Requirements:
  • 6+ years of experience in site reliability engineering or a related field.
  • Deep understanding of cloud-native technologies and infrastructure as a service (IaaS).
  • Expertise in monitoring and alerting systems, incident management processes, and disaster recovery planning.
What We Offer:
  • A competitive base salary and comprehensive compensation package.
  • Equity and benefits.
  • A geo-agnostic company, meaning you work where you are.


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Job Title: Platform Site Reliability EngineerSamsung Ads is seeking a highly skilled Platform Site Reliability Engineer to join our Global Ads Product & Engineering team. As a key member of our team, you will play a crucial role in ensuring the reliability, scalability, and performance of our advertising technology platform.Key Responsibilities:Design,...


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Job Title: Platform Site Reliability EngineerSamsung Ads is a thriving business poised for even greater success, and we're looking for a passionate leader to join our Global Ads Product & Engineering team.About the RoleWe're the innovators behind the products, tech, and tools driving ad-based monetization. As a Site Reliability Engineer specializing in...


  • Mountain View, California, United States Groq Full time

    About the RoleGroq is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our APIs, ensuring seamless performance and exceptional service delivery.Key ResponsibilitiesEnhance system reliability by refining operational...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.Key ResponsibilitiesDesign, build, and maintain large-scale data systems that support core products and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness, empowering AI applications to reach new heights.Job Summary:We're seeking a seasoned...


  • Mountain View, California, United States Groq Full time

    About GroqGroq is a company that believes in an AI economy powered by human agency. We envision a world where AI is accessible to all, and we're working towards making that a reality.Job DescriptionWe're looking for a Principal Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Principal Site Reliability Engineer, you'll play a crucial role in ensuring the...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our global infrastructure team. As a Site Reliability Engineer, you will be responsible for building and operating large-scale, massively distributed infrastructures to ensure the reliability, fault-tolerance, and efficiency of our edge services.ResponsibilitiesDesign, build, and...


  • Mountain View, California, United States Samsung Electronics America North America Full time

    Site Reliability Engineer - DevOps InfrastructureAt Samsung Ads, we're transforming the advertising landscape with cutting-edge technology. As a Site Reliability Engineer - DevOps Infrastructure, you'll play a crucial role in ensuring the reliability, scalability, and performance of our advertising technology platform.Key Responsibilities:Design and...


  • Mountain View, California, United States Tik Tok Full time

    {"h1": "Site Reliability Engineer, Data Platform USDS", "p": "At TikTok, we're passionate about creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. As a Site Reliability Engineer in the Data Platform area, you'll have the...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our global infrastructure team. As a Site Reliability Engineer, you will be responsible for building and operating large-scale, massively distributed infrastructures to ensure the reliability, fault-tolerance, and efficiency of our edge services.ResponsibilitiesDesign, build, and...


  • Mountain View, California, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok is the leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide. Our mission is to empower creators and communities to thrive on our platform.U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the...


  • Mountain View, California, United States Moveworks Full time

    About MoveworksMoveworks is a leading AI-powered automation platform that helps businesses streamline their operations and improve employee productivity. Our innovative technology enables employees to find information and get support in one place, reducing costs and increasing efficiency.Job DescriptionWe are seeking a highly skilled Site Reliability...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...


  • Mountain View, California, United States Moveworks Full time

    About MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Ads Data Platform team. As a key member of our team, you will be responsible for designing, building, and operating large-scale, massively distributed services and infrastructures that support the TikTok Ads ecosystem.ResponsibilitiesDesign and implement reliable, scalable,...


  • Mountain View, California, United States Moveworks Full time

    About MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...