Site Reliability Engineer

1 week ago


Mountain View, California, United States Saxon Global Full time
Site Reliability Engineer

Saxon Global is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our data center infrastructure.

Key Responsibilities:
  • Data Monitoring and Alerting: Design and implement monitoring systems to detect and alert on potential issues before they impact our services.
  • Data Quality Assurance: Develop and maintain data quality assurance processes to ensure data accuracy and integrity.
  • Anomaly Detection: Analyze system logs and metrics to identify and resolve anomalies that could impact our services.
  • Documentation: Create and maintain documentation on team processes and policies, including methods of engagement and Service Level Objectives (SLOs).
  • System Design and Implementation: Collaborate with cross-functional teams to design and implement solutions to improve system performance and reliability.
  • Monitoring and Alerting: Implement monitoring and alerting systems to improve issue detection and response.
  • Linux and Kubernetes Expertise: Maintain and operate a Linux and Kubernetes environment, ensuring high availability and performance.
Qualifications:
  • Education: Bachelor's degree or above in Computer Science or related field.
  • Experience: At least 5 years of experience in a related field, with a strong background in Unix/Linux systems, system libraries, file systems, and client-server protocols.
  • Skills: Experience with Python scripts, networking technologies (TCP/IP, BGP, DNS), and developing and operating systems such as OpenStack, Kubernetes, Nginx, and ELK stack.


  • Mountain View, California, United States Optomi Full time

    Optomi's Site Reliability Engineer OpportunityWe are seeking a skilled Site Reliability Engineer to join our team at Optomi, in partnership with a large consulting firm. This role requires a versatile and highly motivated individual who can provide frontline technical and operational support to our Site Reliability teams.Key Responsibilities:Collaborate with...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a Principal Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and...


  • Mountain View, California, United States Groq Full time

    Job Title: Principal Site Reliability EngineerAt Groq, we're revolutionizing the AI economy by making processing power more accessible, faster, and more affordable. Our Language Processing Unit (LPU) outpaces the GPU in speed, power, efficiency, and cost-effectiveness. As a Principal Site Reliability Engineer, you'll play a crucial role in ensuring the...


  • Mountain View, California, United States Muon Space Full time

    About the RoleMuon Space is seeking a skilled Site Reliability Engineer to join our Platform Software team. Our team provides cloud infrastructure for Muon's Satellite Operations systems and Data Platform, as well as development and test systems for engineers across the company.Key ResponsibilitiesDevelop and maintain infrastructure-as-code components for...


  • Mountain View, California, United States Groq Full time

    Reliability Engineer at GroqWe are seeking a highly skilled Reliability Engineer to join our team at Groq. As a Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our tools and services.Key Responsibilities:Design and implement scalable and reliable architectures for our platform...


  • Mountain View, California, United States Samsung Electronics Full time

    Position OverviewSamsung has evolved from a hardware leader into a vibrant ecosystem of innovative services across devices. At the forefront of this transformation is Samsung Ads, a flourishing division poised for significant growth.Our Global Ads Product & Engineering team, with a robust presence across multiple countries, is integral to this advancement....


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a key member of our engineering organization, you'll play a critical role in ensuring the reliability and performance of our cloud-based services.ResponsibilitiesDesign and implement scalable solutions to improve service reliability and...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our Data Platform team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, fault-tolerance, and scalability of our data infrastructure.Key ResponsibilitiesDesign, build, and maintain large-scale data systems that support core products and...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureCollaborate with...


  • Mountain View, California, United States Tik Tok Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to join our global infrastructure team. As a Site Reliability Engineer, you will be responsible for building and operating large-scale, massively distributed infrastructures to ensure the reliability, fault-tolerance, and efficiency of our edge services.ResponsibilitiesDesign, build, and...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Cloud Infrastructure Engineer to join our Site Reliability team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesDesign and Implement Cloud Infrastructure: Collaborate with...


  • Mountain View, California, United States Insight Global Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team in the Bay Area. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and scalability of our cloud infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and reliable cloud infrastructureDevelop and maintain...


  • Mountain View, California, United States Insight Global Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team in the Bay Area. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign, implement, and maintain scalable and highly available cloud...


  • Mountain View, California, United States Groq Full time

    About the RoleWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at Groq. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our tools and services for provisioning and managing the full lifecycle of Groq hardware and related support systems.Key...


  • Mountain View, California, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok U.S. Data Security is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of the TikTok platform.ResponsibilitiesCollaborate with infrastructure, product, and platform engineering teams to design, deploy, and maintain scalable and secure software platforms.Develop and implement...


  • Mountain View, California, United States Motion Recruitment Full time

    About the RoleMotion Recruitment is seeking a highly skilled Linux Systems Engineer to join our team. As a Site Reliability Engineer, you will be responsible for managing and maintaining large-scale Linux environments, implementing automation, and ensuring the reliability and scalability of our systems.Key ResponsibilitiesDesign, implement, and maintain...


  • Mountain View, California, United States Atlassian Full time

    About the RoleWe're seeking a highly skilled Site Reliability Engineer to join our team at Atlassian. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, reliability, and scalability of our cloud-based services.Key ResponsibilitiesImprove Service Reliability: Actively work to improve the performance and reliability of...


  • Mountain View, California, United States eTek IT Services, Inc. Full time

    Job DescriptionWe are seeking a highly skilled Site Reliability Engineer - Cloud Infrastructure to join our team at eTek IT Services, Inc.Role: As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud infrastructure.Responsibilities:Data Monitoring and Alerting: Design and implement...