Current jobs related to Senior SRE - San Jose, California - Selby Jennings


  • San Jose, California, United States ZEDEDA Full time

    Job DescriptionZededa is a cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. We are looking for an experienced Senior Site Reliability Engineer (SRE) who is seeking new challenges and wants to make their mark by contributing to the design and upkeep of an exciting...


  • San Jose, California, United States Selby Jennings Full time

    About the Role:Selby Jennings is partnering with a global tech company to build out their SRE teams. The company is looking for a Senior SRE Engineer to lead their Recommendation Infrastructure team, working closely with engineers in the US and Asia.Key Responsibilities:Design and implement large-scale distributed systems for high reliability and...


  • San Jose, California, United States Tik Tok Full time

    ResponsibilitiesTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. As a key member of our global e-commerce team, you will play a critical role in ensuring the reliability and scalability of our e-commerce platform.Why Join UsAt TikTok, we believe that every challenge is an opportunity to learn,...


  • San Jose, California, United States SysMind Tech Full time

    Role Overview:As a Senior Server Administrator at SysMind Tech, you will be responsible for developing and implementing Waratek agent upgrade automation and rollout. This involves performing POV and upgrading existing Waratek interfaces, as well as supporting the SRE team for onboarding ~20 applications to the Waratek app security framework.Key...


  • San Jose, California, United States Triune Infomatics Inc Full time

    Role:Senior Site Reliability ManagerTriune Infomatics Inc is seeking an experienced Senior Site Reliability Manager to join our team and contribute to the design and upkeep of our cloud-based IoT edge orchestration solution.Job Summary:The Senior Site Reliability Manager will be responsible for ensuring the availability of our SaaS platform and meeting the...


  • San Jose, California, United States Tik Tok Full time

    Senior Site Reliability Engineer, Global E-CommerceTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy to our users. As a Senior Site Reliability Engineer on our Global E-Commerce team, you will play a critical role in ensuring the reliability and scalability of our platform.Responsibilities:Be...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Senior Site Reliability Engineer, Global E-CommerceTikTok is a leading destination for short-form mobile video, and our mission is to inspire creativity and bring joy. As a Senior Site Reliability Engineer on our Global E-Commerce team, you will play a critical role in ensuring the reliability and scalability of our e-commerce...


  • San Jose, California, United States F5 Full time

    Job SummaryF5 is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our SRE team, you will play a pivotal role in ensuring the reliability and scalability of our distributed cloud product.Key ResponsibilitiesDesign and implement automation solutions to reduce toil and improve operational efficiencyParticipate in...


  • San Jose, California, United States Tik Tok Full time

    Senior Site Reliability Engineer, Global E-commerceTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy to our users. As a Senior Site Reliability Engineer on our Global E-commerce team, you will play a critical role in ensuring the reliability and scalability of our e-commerce platform.Key...


  • San Jose, California, United States F5 Full time

    About F5F5 is a leading provider of cloud and security solutions, empowering organizations to create, secure, and run applications that enhance the digital experience.Job SummaryWe are seeking an exceptional Senior Site Reliability Engineer to join our SRE team for the F5 Distributed Cloud Product. As a key member of our team, you will play a pivotal role in...


  • San Jose, California, United States F5 Full time

    About F5F5 is a leading provider of cloud and security solutions, empowering organizations to create, secure, and run applications that enhance the digital experience.Job SummaryWe are seeking an exceptional Senior Site Reliability Engineer to join our SRE team for the F5 Distributed Cloud Product. As a key member of our team, you will play a pivotal role in...


  • San Jose, California, United States F5 Full time

    About F5F5 is a leading provider of cloud and security solutions, empowering organizations to create, secure, and run applications that enhance the digital experience.Job SummaryWe are seeking an exceptional Senior Site Reliability Engineer to join our SRE team for the F5 Distributed Cloud Product. As a key member of our team, you will play a pivotal role in...


  • San Jose, California, United States HireIO Inc Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at HireIO Inc. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for designing, implementing, and operating large-scale, massively distributed systems.Responsibilities:Design and implement software...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Senior Site Reliability Engineer, Global E-CommerceWe are seeking a highly skilled Senior Site Reliability Engineer to join our Global E-Commerce team. As a key member of our team, you will be responsible for ensuring the reliability and scalability of our e-commerce platform.Responsibilities:Be part of our global on-call rotation and be...


  • San Jose, California, United States Tik Tok Full time

    Job Title: Senior Software Engineer - Generative AI InfrastructureJob SummaryWe are seeking a highly skilled Senior Software Engineer to join our Generative AI team at TikTok. As a key member of our infrastructure team, you will be responsible for designing, developing, and deploying scalable and reliable software infrastructure to support our Generative AI...

  • Senior Technical Lead

    4 weeks ago


    San Jose, California, United States Glocomms Full time

    Job Title: Technical LeadGlocomms, a leading Hypergrowth Technology and Entertainment platform, is seeking a seasoned Technical Lead to join their recommendations and infrastructure team.Key Responsibilities:Lead and mentor a team of 7-9 engineers, providing guidance and support to ensure the success of the team.Take ownership of large-scale distributed...


  • San Jose, California, United States F5 Full time

    About the RoleWe are seeking an exceptional Senior Site Reliability Engineer to join our SRE team for the groundbreaking F5 Distributed Cloud Product. As a key member of our team, you will play a pivotal role in ensuring the reliability, scalability, and security of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement automation solutions...


  • San Jose, California, United States Tik Tok Full time

    {"title": "Senior Site Reliability Engineer", "description": "Unlock Your Potential at TikTokTikTok is the leading destination for short-form mobile video, and we're seeking Site Reliability Engineers (SREs) to join our monetization technology team. Our platform is built to help imaginations thrive, and we're committed to creating an inclusive space where...


  • San Jose, California, United States F5 Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at F5. As a key member of our SRE team, you will play a pivotal role in ensuring the reliability and scalability of our Distributed Cloud Product.Key ResponsibilitiesDesign and implement automation solutions to reduce toil and improve operational...


  • San Jose, California, United States NetApp Full time

    Job Title: Senior Head of ProductNetApp is a leader in intelligent data infrastructure, empowering customers to turn disruption into opportunity. We're seeking a seasoned Senior Product Manager to join our team and drive innovation in cloud storage solutions.About the RoleThis pivotal position will play a key role in building the next-generation Azure &...

Senior SRE

2 months ago


San Jose, California, United States Selby Jennings Full time
About Selby Jennings

We are partnering with a global tech company in San Jose, CA to build out their SRE teams across various organizations. One of their top priorities is bringing on additional SREs to their Applied Machine Learning team.

Key Qualifications
  • Expertise in analyzing and troubleshooting Linux-based distributed systems.
  • Bachelor's/Master's degree in Computer Science, Computer Engineering, or equivalent experience in SRE or software engineering.
  • Proficiency in at least one commonly used language (C, C++, Python, Go).
  • Strong understanding of data structures and algorithms, as well as relational database systems.
  • Ability to design and maintain large-scale systems, with a focus on code optimization and routine task automation.
  • Proficiency in at least one machine learning framework: TensorFlow, PyTorch, MXNet, or PaddlePaddle.
Responsibilities
  • Designing, building, and maintaining highly available, scalable, and fault-tolerant systems.
  • Monitoring and analyzing system performance, identifying and resolving issues before causing user impact.
  • Developing and maintaining automated monitoring, alerting, and incident response systems.
  • Collaborating closely with software engineering teams to ensure applications are designed with reliability, scalability, and performance in mind.
  • Implementing and maintaining security best practices and ensuring compliance with regulatory requirements.
  • Participating in on-call rotations and responding to issues and incidents within and outside of normal business hours.
  • Conducting root cause analysis of incidents, holding post-mortem reviews with stakeholders, and implementing preventative measures to minimize the risk of similar incidents occurring in the future.