Service Reliability Engineer

2 days ago


Redwood City, California, United States Stanford University Full time
Job Summary

Stanford University is seeking an experienced Service Reliability Engineer to join its Enterprise Technology team. The successful candidate will be responsible for deploying and managing highly available hybrid systems on-premise and in the cloud, focusing on Infrastructure-as-a-Service (IaaS) and Platform-as-a-Service (PaaS) offerings.

Key Responsibilities
  • Deploy and manage hybrid systems on-premise and in the cloud, ensuring high availability and scalability.
  • Implement Infrastructure as Code practices using tools like Terraform to automate cloud infrastructure provisioning and management.
  • Design, implement, and maintain CI/CD pipelines to streamline application deployment processes.
  • Deploy and manage containerized applications using Docker and orchestrate them with Docker Compose or Kubernetes for scalability and resilience.
  • Lead efforts to modernize existing infrastructure and applications by integrating new technologies and cloud-native solutions.
  • Conduct application server hardening to enhance security against potential threats.
  • Provide technical support for complex issues by collaborating with all stakeholders to assess current systems, recommend improvements for enhanced performance and scalability.
  • Ensure effective communication regarding system status and operations.
  • Create and maintain comprehensive documentation for system configurations, procedures, and best practices to ensure knowledge transfer and compliance.
Requirements
  • Bachelor's degree and eight years of relevant experience or a combination of education and relevant experience.
  • Experience with diverse middleware technologies on bare metal and Docker containers.
  • Experience with Infrastructure as Code like Terraform and container orchestration utilities.
  • Demonstrate Cloud Infrastructure experience with experience in building full-stack infrastructure for enterprise-ready applications.
  • Proficiency in programming and scripting languages, especially Python and Shell.
  • Strong working knowledge of Linux-based systems.
Additional Information

Stanford University provides pay ranges representing its good faith estimate of what the university reasonably expects to pay for a position. The expected pay range for this position is $150,922-$155,000 per annum. The pay offered to a selected candidate will be determined based on factors such as (but not limited to) the scope and responsibilities of the position, the qualifications of the selected candidate, departmental budget availability, internal equity, geographic location, and external market pay for comparable jobs.

At Stanford University, base pay represents only one aspect of the comprehensive rewards package. The Cardinal at Work website provides detailed information on Stanford's extensive range of benefits and rewards offered to employees. Specifics about the rewards package for this position may be discussed during the hiring process.



  • Redwood City, California, United States Stanford University Full time

    Job Title: Service Reliability EngineerStanford University is seeking an experienced Service Reliability Engineer to join the Enterprise Technology team. This role will support the implementation, maintenance, and upkeep of on-premise and cloud systems.Key Responsibilities:Deploy and manage highly available hybrid systems on on-premise and cloud platforms...


  • Redwood City, California, United States Stanford University Full time

    Job SummaryStanford University is seeking an experienced Service Reliability Engineer to join the Enterprise Technology team. This role will focus on implementing, maintaining, and upgrading on-premise and cloud systems to ensure high availability and performance.Key ResponsibilitiesDeploy and manage hybrid systems on-premise and in the cloud, utilizing...


  • Redwood City, California, United States 1872 Consulting Full time

    Site Reliability EngineerAt 1872 Consulting, we're seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our systems, working closely with developer teams to identify and resolve issues.Key Responsibilities:Be on-call rotation to respond to...


  • Redwood City, California, United States 1872 Consulting Full time

    Site Reliability EngineerAt 1872 Consulting, we're seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our systems, working closely with developer teams to identify and resolve issues.Key Responsibilities:Be on-call rotation to respond to...


  • Redwood City, California, United States Box Full time

    About BoxBox is the market leader for Cloud Content Management, empowering businesses to accelerate their digital transformation. Our mission is to power how the world works together, and we're seeking a talented Senior Software Engineer to join our Site Reliability Engineering team.Job SummaryWe're looking for a highly skilled Senior Software Engineer to...


  • Redwood City, California, United States Box Full time

    Transforming the Way the World Works TogetherAt Box, we're revolutionizing Cloud Content Management, and we need a talented Senior Software Engineer, Site Reliability Engineering to join our team. As a key member of our SRE organization, you'll play a crucial role in bringing AI to our content cloud, ensuring the reliability and scalability of our...


  • Redwood City, California, United States Zilliz Full time

    About ZillizZilliz is a fast-growing startup that specializes in developing cutting-edge vector database technologies for enterprise-grade AI applications. Our mission is to democratize AI by simplifying data management and making vector databases accessible to every organization.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join...


  • Redwood City, California, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Senior Principal Site Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for designing and delivering mission-critical cloud infrastructure solutions that meet the needs of our customers.ResponsibilitiesCollaborate with...


  • Redwood City, California, United States Oracle Full time

    Job DescriptionOracle is seeking a highly skilled Senior Principal Site Reliability Engineer to join our team. As a key member of our Site Reliability Engineering (SRE) team, you will be responsible for designing and delivering mission-critical cloud infrastructure solutions that meet the needs of our customers.ResponsibilitiesCollaborate with...


  • Redwood City, California, United States Zilliz Full time

    About ZillizZilliz is a pioneering startup that specializes in developing cutting-edge vector database technologies for enterprise-grade AI applications.As the company behind the world's most popular open-source vector database, Milvus, Zilliz is committed to simplifying data management for AI applications and making vector databases accessible to every...


  • Redwood City, California, United States Zilliz Full time

    About ZillizZilliz is a pioneering startup that specializes in developing cutting-edge vector database technologies for enterprise-grade AI applications.As the company behind the world's most popular open-source vector database, Milvus, Zilliz is committed to simplifying data management for AI applications and making vector databases accessible to every...


  • Redwood City, California, United States Moloco Full time

    About MolocoMoloco is a pioneering machine learning company that empowers organizations to unlock the full value of their unique first-party data, revolutionizing the traditional path to performance advertising. By harnessing the power of cutting-edge machine learning technologies, we play a unique and visible role in shaping the digital economy, allowing...


  • Redwood City, California, United States Box Full time

    Transform the Future of Content ManagementAt Box, we're revolutionizing the way organizations work with content. As a Senior Engineering Manager, Site Reliability Operations, you'll play a critical role in ensuring the seamless operation of our cloud infrastructure. Join our team and be part of shaping the future of content management.Key...


  • Foster City, California, United States Bayone Full time

    As a Site Reliability Engineer at Bayone, you will be responsible for ensuring the smooth operation of our large production service. Your key responsibilities will include: **Service Maintenance** * Perform regular host OS upgrades to ensure the latest security patches and features are applied. * Upgrade Docker images to ensure the latest software versions...


  • Foster City, California, United States Bayone Full time

    Job DescriptionAs a Site Reliability Engineer at BayoneWe are seeking a highly skilled Site Reliability Engineer to join our team. The successful candidate will be responsible for ensuring the smooth operation of our large production service.Key Responsibilities:Perform OS upgrades, Docker image upgrades, and SSL certificate upgrades to maintain service...


  • Culver City, California, United States Apple Full time

    Hardware Reliability EngineerAt Apple, we're committed to delivering exceptional products that meet the highest standards of quality and reliability. As a Hardware Reliability Engineer, you'll play a critical role in ensuring the durability and reliability of our products.Key Responsibilities:Develop and implement creative reliability tests on new hardware...


  • Foster City, California, United States Zoox Full time

    About the RoleZoox is seeking a skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the uptime and reliability of our autonomous vehicle fleet's critical services.Key ResponsibilitiesDesign and implement fault-tolerant systems for our autonomous vehicle fleetCollaborate with...


  • Foster City, California, United States Zoox Full time

    About the RoleZoox is seeking a skilled Site Reliability Engineer to join our team. As a key member of our infrastructure team, you will be responsible for ensuring the uptime and reliability of our autonomous vehicle fleet services.Key ResponsibilitiesDesign and implement fault-tolerant systems for our autonomous vehicle servicesCollaborate with...


  • Redwood City, California, United States TE Connectivity Full time

    Job OverviewAt TE Connectivity, we are seeking a highly motivated and detail-oriented Quality and Reliability Technician I to join our team. As a key member of our Quality and Reliability Team, you will play a critical role in ensuring the quality and reliability of our products.Key Responsibilities:Perform all tasks in a safe and efficient mannerUnderstand...


  • Redwood City, California, United States TE Connectivity Full time

    Job Title: Quality and Reliability Technician IAt TE Connectivity, we are seeking a Quality and Reliability Technician I to join our team. As a Quality and Reliability Technician I, you will be responsible for performing activities related to quality, maintenance of documentation, product release, receiving inspection, and independent completion of projects...