Principal Site Reliability Engineer, Infrastructure Platform
3 weeks ago
At Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Site Reliability Engineer, you'll play a crucial role in ensuring the reliability, scalability, and performance of our tools and services.
Responsibilities:- Design and implement scalable and reliable architectures for the platform infrastructure.
- Establish comprehensive monitoring systems to track key performance indicators (KPIs) and identify potential issues.
- Develop and implement automated testing frameworks to ensure software quality and reliability.
- Lead the investigation and resolution of production incidents.
- Work collaboratively with engineering teams to identify and mitigate potential risks.
- 6+ years of experience in site reliability engineering or a related field.
- Deep understanding of cloud-native technologies and infrastructure as a service (IaaS).
- Expertise in monitoring and alerting systems, incident management processes, and disaster recovery planning.
- A competitive base salary and comprehensive compensation package.
- Equity and benefits.
- A geo-agnostic company, meaning you work where you are.
-
Principal Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Principal Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Principal Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...
-
Platform Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Samsung Electronics America North America Full timeJob Title: Platform Site Reliability EngineerSamsung Ads is seeking a highly skilled Platform Site Reliability Engineer to join our Global Ads Product & Engineering team. As a key member of our team, you will play a crucial role in ensuring the reliability, scalability, and performance of our advertising technology platform.Key Responsibilities:Design,...
-
Platform Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Samsung Electronics America North America Full timeJob Title: Platform Site Reliability EngineerSamsung Ads is a thriving business poised for even greater success, and we're looking for a passionate leader to join our Global Ads Product & Engineering team.About the RoleWe're the innovators behind the products, tech, and tools driving ad-based monetization. As a Site Reliability Engineer specializing in...
-
Principal Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Groq Full timeAbout the RoleGroq is seeking a highly skilled Principal Site Reliability Engineer to join our team. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our APIs, ensuring seamless performance and exceptional service delivery.Key ResponsibilitiesEnhance system reliability by refining operational...
-
Site Reliability Engineer, Data Platform
4 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.Key ResponsibilitiesDesign, build, and maintain large-scale data systems that support core products and...
-
Principal Site Reliability Engineer
9 hours ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness, empowering AI applications to reach new heights.Job Summary:We're seeking a seasoned...
-
Principal Site Reliability Engineer
11 hours ago
Mountain View, California, United States Groq Full timeAbout GroqGroq is a company that believes in an AI economy powered by human agency. We envision a world where AI is accessible to all, and we're working towards making that a reality.Job DescriptionWe're looking for a Principal Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the...
-
Principal Site Reliability Engineer
2 days ago
Mountain View, California, United States Groq Full timeJob Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Principal Site Reliability Engineer, you'll play a crucial role in ensuring the...
-
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking an experienced Site Reliability Engineer to join our global infrastructure team. As a Site Reliability Engineer, you will be responsible for building and operating large-scale, massively distributed infrastructures to ensure the reliability, fault-tolerance, and efficiency of our edge services.ResponsibilitiesDesign, build, and...
-
Site Reliability Engineer
5 hours ago
Mountain View, California, United States Samsung Electronics America North America Full timeSite Reliability Engineer - DevOps InfrastructureAt Samsung Ads, we're transforming the advertising landscape with cutting-edge technology. As a Site Reliability Engineer - DevOps Infrastructure, you'll play a crucial role in ensuring the reliability, scalability, and performance of our advertising technology platform.Key Responsibilities:Design and...
-
Site Reliability Engineer, Data Platform USDS
24 hours ago
Mountain View, California, United States Tik Tok Full time{"h1": "Site Reliability Engineer, Data Platform USDS", "p": "At TikTok, we're passionate about creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace. As a Site Reliability Engineer in the Data Platform area, you'll have the...
-
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking an experienced Site Reliability Engineer to join our global infrastructure team. As a Site Reliability Engineer, you will be responsible for building and operating large-scale, massively distributed infrastructures to ensure the reliability, fault-tolerance, and efficiency of our edge services.ResponsibilitiesDesign, build, and...
-
Mountain View, California, United States Tik Tok Full timeAbout TikTok U.S. Data SecurityTikTok is the leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide. Our mission is to empower creators and communities to thrive on our platform.U.S. Data Security (USDS) is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the...
-
Site Reliability Engineer
1 day ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI-powered automation platform that helps businesses streamline their operations and improve employee productivity. Our innovative technology enables employees to find information and get support in one place, reducing costs and increasing efficiency.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
21 hours ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...
-
Site Reliability Engineer
2 days ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer, Ads Data Platform
1 week ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Ads Data Platform team. As a key member of our team, you will be responsible for designing, building, and operating large-scale, massively distributed services and infrastructures that support the TikTok Ads ecosystem.ResponsibilitiesDesign and implement reliable, scalable,...
-
Site Reliability Engineer
1 week ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...