Current jobs related to Senior Site Reliability Engineer - Mountain View, California - Groq
-
Senior Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Groq Full timeJob Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure operations team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and...
-
Senior Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Groq Full timeUnlock the Power of AI with GroqWe're on a mission to democratize access to AI, and we need your expertise to make it happen. As a Senior Site Reliability Engineer at Groq, you'll play a critical role in ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and reliable...
-
Senior Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Groq Full timeJob DescriptionAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. As a Senior Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and performance of our tools and services.Responsibilities:Design and implement scalable and reliable architectures...
-
Senior Software Engineer, Site Reliability
4 weeks ago
Mountain View, California, United States LinkedIn Full timeJob Title: Senior Software Engineer, Site ReliabilityAt LinkedIn, we're committed to building a platform that helps professionals achieve their career goals. As a Senior Software Engineer, Site Reliability, you'll play a critical role in ensuring the reliability and scalability of our platform.About the Role:Design, develop, and maintain high-quality...
-
Senior Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Groq Full timeUnlock the Power of AI with GroqAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness, empowering a world where AI is universally accessible.Join Our MissionWe're seeking a Senior Site...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI-powered automation platform that helps businesses streamline their operations and improve employee productivity. Our innovative technology enables employees to find information and get support in one place, reducing costs and increasing efficiency.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...
-
Site Reliability Engineer
3 weeks ago
Mountain View, California, United States Moveworks Full timeAbout MoveworksMoveworks is a leading AI startup that provides a universal AI copilot for search and automation across all business applications. Our mission is to empower employees to work faster and more efficiently by eliminating repetitive support issues and delivering instant knowledge.Job DescriptionWe are seeking a highly skilled Site Reliability...
-
Senior Engineering Manager
2 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleTikTok is seeking an experienced Senior Engineering Manager to lead our site reliability engineering teams and algorithm teams across Trust and Safety Platform, E-Commerce Platform, and several other platforms.As a Senior Engineering Manager, you will be responsible for leading complex projects, managing day-to-day operations, and influencing...
-
Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our Applied Machine Learning (AML) team. As a Site Reliability Engineer, you will be responsible for designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and develop large-scale systems that meet the needs of our AML...
-
Site Reliability Engineer
4 weeks ago
Mountain View, California, United States Synopsys Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Platform Team at Synopsys. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our engineering environment. You will work closely with our development teams to design, implement, and operate scalable and efficient...
-
Mountain View, California, United States LinkedIn Full timeJob DescriptionAt LinkedIn, we're committed to creating economic opportunities for every member of the global workforce. As a Senior Software Engineer, Site Reliability Expert, you'll play a critical role in ensuring the reliability and scalability of our centralized Pubsub systems.Key ResponsibilitiesDesign and implement scalable and reliable systems to...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Atlassian Full timeAbout the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the performance and reliability of our services. You will work closely with our teams to identify and resolve issues, and develop solutions to improve our systems.Key Responsibilities:Investigate...
-
Site Reliability Engineer, Edge
3 weeks ago
Mountain View, California, United States Tik Tok Full timeJob Title: Site Reliability Engineer, EdgeAt TikTok, we're committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe, and so does our workplace.About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our Edge team. As a...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout the RoleWe are seeking a skilled Site Reliability Engineer to join our AML team, where you will play a critical role in designing, building, and maintaining highly available, scalable, and fault-tolerant systems.ResponsibilitiesDesign and implement large-scale systems to ensure high availability and scalability.Monitor and analyze system performance,...
-
Staff Site Reliability Engineer
1 week ago
Mountain View, California, United States Moveworks Full timeAbout the RoleMoveworks is the universal AI copilot for search and automation across all your business applications. We give employees one place to go to find information and get support while reducing costs for your business. The Moveworks Copilot is powered by an industry-leading Reasoning Engine that uses a combination of public and proprietary language...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States Tik Tok Full timeAbout TikTok U.S. Data SecurityTikTok is a leading destination for short-form mobile video, inspiring creativity and bringing joy to millions of users worldwide. Our mission is to empower creators and communities to express themselves authentically, while ensuring the security and integrity of our platform.Job SummaryWe are seeking a highly skilled Site...
-
Site Reliability Engineer
1 week ago
Mountain View, California, United States Insight Global Full timeSite Reliability Engineer Opportunity in the Bay AreaWe are seeking a highly motivated Site Reliability Engineer to join our team in the Bay Area. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud infrastructure.Key Responsibilities:* Strong Linux System Admin fundamentals (bash/shell...
-
Site Reliability Engineer
2 weeks ago
Mountain View, California, United States NewsBreak Full time{"h1": "Transform Local News with NewsBreak", "p": "At NewsBreak, we're revolutionizing the way users interact with local news and their communities. Our mission is to foster safer, more vibrant, and authentically connected lives through robust collaborations with local publishers and businesses across the nation. As a Site Reliability Engineer, you'll play...
Senior Site Reliability Engineer
2 months ago
We are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and related support systems.
Key Responsibilities- Reliability Architecture: Design and implement scalable and reliable architectures for the platform infrastructure, defining and enforcing operational standards and best practices for site reliability, developing and implementing disaster recovery and business continuity plans.
- Monitoring & Alerting: Establish comprehensive monitoring systems to track key performance indicators (KPIs) and identify potential issues, implementing robust alerting and notification workflows to ensure timely response to incidents, analyzing data and identifying opportunities for platform optimization.
- Incident Management: Lead the investigation and resolution of production incidents, developing and maintaining incident response playbooks and escalation procedures, working collaboratively with engineering teams to identify and mitigate potential risks.
- Automation & Continuous Improvement: Develop and implement automated testing frameworks to ensure software quality and reliability, driving continuous improvement by identifying and implementing process and tool enhancements.
- 6/10+ years of experience in site reliability engineering or a related field.
- Deep understanding of cloud-native technologies and infrastructure as a service (IaaS).
- Expertise in monitoring and alerting systems, incident management processes, and disaster recovery planning.
- Strong analytical and problem-solving skills with a focus on root cause analysis and mitigation.
- Excellent communication and teamwork skills with the ability to collaborate effectively across engineering teams.
Groq is a geo-agnostic company, meaning you work where you are. We value and celebrate diversity in thought, beliefs, talent, expression, and backgrounds. We know that our individual differences make us better. Groq is an Equal Opportunity Employer that is committed to inclusion and diversity.