Senior SRE Engineer, NIM Factory

6 days ago


Santa Clara, California, United States Sage Lake Senior Living Full time
About the Role

We are seeking a seasoned Senior SRE Engineer to join our team at Sage Lake Senior Living, where you will play a critical role in ensuring the high availability and performance of our AI-powered applications.

Key Responsibilities
  • Operate and improve the observability and maintainability of our distributed microservice cloud applications and services.
  • Collaborate with cross-functional teams, including development, security, and architecture, to design and deliver rapid iterations on our technical strategies and roadmaps.
  • Partner with internal and external SRE teams to provide exceptional experiences for our developers and users of our services.
  • Ensure the security and integrity of our infrastructure, including containers, databases, and networking, by following and improving standard processes for security, scalability, and cost optimization.
  • Define and track key metrics to drive improvements based on user feedback and collaborate with teams to grow and develop skills.
Requirements
  • Demonstrated advanced system engineering skills operating and improving the observability and maintainability of distributed microservice cloud applications and services.
  • Effective experience working with multi-functional teams, principals, and architects, and across organizational boundaries.
  • Mentorship, growing teams, and team members, and the flexibility to adjust your direction and expectations given the needs of our customers.
  • Experience operating distributed containerized applications using technologies such as Docker, K8s, Cloud Endpoints, Helm, and Prometheus.
  • Experience identifying the root cause of failures and performance bottlenecks in distributed microservices or cloud systems.
  • BS or MS in Computer Science, Computer Engineering, or equivalent experience.
  • 7+ years of shown experience as an SRE or Developer working on high-performance microservices and cloud software.
Preferred Qualifications
  • Excellent communication and interpersonal skills and the ability to engage a multi-functional team.
  • Experience with event-driven applications using various services such as Temporal, Kafka, Redis, or others.
  • A history of building and deploying containers for Microservices, Cloud, and On-prem deployments, and their associated CI/CD pipelines.
About Us

We are a leading provider of senior living solutions, committed to fostering a diverse and inclusive work environment. We are an equal opportunity employer and value diversity at our company.



  • Santa Clara, California, United States Sage Lake Senior Living Full time

    About the RoleWe are seeking a seasoned Senior SRE Engineer to join our team at Sage Lake Senior Living, where you will play a critical role in monitoring and operating our NVIDIA Inference Microservices (NIMs) factory automation and deployed services.Key ResponsibilitiesOperate a software factory that takes an AI model as input and produces a deployable...


  • Santa Clara, California, United States ServiceNow Full time

    Job DescriptionOverviewThe ServiceNow SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the ServiceNow cloud infrastructure.Key ResponsibilitiesProvide relief and sustainable resolution to issues within our infrastructure.Use expertise in software development,...


  • Santa Clara, California, United States ServiceNow Full time

    Company OverviewAt ServiceNow, we harness technology to create a better world for everyone, driven by our talented workforce. We prioritize speed and innovation to meet the demands of our customers and communities.Joining ServiceNow means becoming part of a dynamic team of innovators who possess a relentless curiosity and a commitment to creativity.We...


  • Santa Clara, California, United States ServiceNow Full time

    Company OverviewAt ServiceNow, we harness technology to enhance global operations, and our dedicated workforce makes it all possible. We operate swiftly because the world demands it, innovating uniquely for our clients and communities.By becoming part of ServiceNow, you join a dynamic team of innovators who possess a relentless curiosity and a passion for...

  • Senior Manager

    1 month ago


    Santa Clara, California, United States NVIDIA Full time

    As a Sr Manager in Site Reliability Engineering (SRE), you will lead a team dedicated to the design, construction, and maintenance of expansive production systems, emphasizing high efficiency and availability. This role spans various domains, including software and systems engineering, cloud-scale storage, data management, and services. SRE Senior Managers...

  • Senior Manager

    4 days ago


    Santa Clara, California, United States NVIDIA Full time

    About the RoleWe are seeking a highly skilled Senior Manager to lead our Storage Systems team at NVIDIA. As a key member of our Site Reliability Engineering (SRE) organization, you will be responsible for formulating and executing strategic initiatives to enhance the reliability and performance of our storage systems.Key ResponsibilitiesLeadership: Develop...


  • Santa Clara, California, United States NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior SRE Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a key member of the team, you will be responsible for designing and implementing scalable, resilient cloud infrastructure platforms for NVIDIA's internal cloud provisioning product.Key ResponsibilitiesDesign and...


  • Santa Clara, California, United States Promote Project Full time

    About the Company: Promote Project is at the forefront of innovation, leveraging cutting-edge technology to redefine the landscape of AI and computing. Our mission is to harness the power of advanced computing to create transformative solutions that impact various industries.Position Overview: We are seeking a Manager of Site Reliability Engineering to...


  • Santa Clara, California, United States Trillium Staffing Full time

    Job DescriptionTrillium Staffing is seeking a seasoned Senior Cloud Operations Engineer to join its fast-paced Infrastructure, Planning and Processes organization. The ideal candidate will have a strong background in cloud infrastructure and highly available production environments.Key ResponsibilitiesDesign, implement, and maintain sophisticated cloud...


  • Santa Cruz, California, United States Joby Aviation Full time

    Company Overview:Joby Aviation is committed to revolutionizing air transportation through innovative electric aircraft. Our mission is to create a sustainable and efficient air taxi service that enhances urban mobility. With a focus on cutting-edge technology and engineering excellence, we are paving the way for a new era in aviation.Position Overview:The...


  • Santa Clara, California, United States Promote Project Full time

    About the Company: Promote Project is at the forefront of innovation, focusing on redefining technology and enhancing the capabilities of AI. We are dedicated to creating groundbreaking solutions that push the boundaries of what is possible in computing.Position Overview: We are seeking a Manager for Site Reliability Engineering to spearhead our cloud...

  • Senior Manager

    1 week ago


    Santa Clara, California, United States NVIDIA Full time

    About the RoleWe are seeking a highly skilled Senior Manager to lead our Storage Systems team at NVIDIA. As a key member of our Site Reliability Engineering (SRE) organization, you will be responsible for formulating and executing strategic initiatives to enhance the reliability and performance of our storage systems.Key ResponsibilitiesLeadership: Develop...


  • Santa Cruz, California, United States Joby Aviation Full time

    Company Overview:Joby Aviation is at the forefront of revolutionizing air transportation with our commitment to developing an affordable, all-electric air taxi service. Our vision is to create a piloted air taxi that seamlessly integrates into urban environments, allowing passengers to navigate city landscapes efficiently and quietly. Since our inception, we...


  • Santa Clara, California, United States NVIDIA Full time

    NVIDIA, a global technology company renowned for its pioneering innovations in GPU technology, is seeking a highly skilled Senior Structural Test Engineer to join their Manufacturing Test Team. As a key member of the team, you will be responsible for developing and implementing comprehensive structural test plans to identify and address issues early in the...


  • Santa Clara, California, United States Promote Project Full time

    About Promote Project: Promote Project is a leader in innovative technology solutions, dedicated to pushing the boundaries of what is possible in the realm of artificial intelligence and cloud computing. Our commitment to excellence is reflected in our talented workforce and our pursuit of groundbreaking advancements.Position Overview: We are seeking a...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled and experienced Principal Software Engineer to join our team at Palo Alto Networks. As a key member of our engineering team, you will play a critical role in designing and developing distributed backend services that are the backbone of our platform.Key Responsibilities:Analyze requirements, design, develop, and...


  • Santa Clara, California, United States ServiceNow Full time

    Job DescriptionOverviewThe ServiceNow SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the ServiceNow cloud infrastructure.Our SREs are empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in...


  • Santa Fe Springs, California, United States Fox Factory Full time

    Company OverviewFOX Factory is a leading designer, engineer, manufacturer, and marketer of high-performance products and systems for a global clientele. Our premium brand products are utilized across various sectors, including bicycles, off-road vehicles, and commercial trucks.Why Work With UsWe offer competitive salaries along with comprehensive benefits...


  • Santa Clara, California, United States Palo Alto Networks Full time

    About the RoleWe are seeking a highly skilled Data Center Operations/DevOps Engineer to join our team at Palo Alto Networks. As a key member of our infrastructure team, you will be responsible for maintaining and optimizing the performance, reliability, and efficiency of our data center infrastructure.Key ResponsibilitiesWork closely with the DevOps team to...


  • Santa Clara, California, United States NVIDIA Full time

    The NVIDIA GPU Cloud (NGC) team is seeking experienced software engineers to develop NVIDIA's advanced compute cloud solutions. These solutions encompass software for managing hardware and network provisioning to create a multi-tenant infrastructure. As a software engineer, you will collaborate with fellow engineers, product architects, and product managers...