Senior Cloud Reliability Engineer

3 days ago


Santa Clara, California, United States Geospatial And Cloud Analytics Inc Full time
About the Role

We are seeking a highly skilled Senior Cloud Reliability Engineer to join our team at Geospatial And Cloud Analytics Inc. As a key member of our engineering team, you will be responsible for designing, implementing, and supporting operational and reliability aspects of large-scale cloud infrastructure.

Key Responsibilities
  • Design and implement operational and reliability aspects of large-scale Kubernetes clusters, focusing on performance at scale, real-time monitoring, logging, and alerting.
  • Engage in and improve the entire lifecycle of services, from inception and design through deployment, operation, and refinement.
  • Support services before they go live through activities such as system design consulting, developing software tools, platforms, and frameworks, capacity management, and launch reviews.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Practice sustainable incident response and blameless postmortems.
  • Participate in an on-call rotation to support production systems.
Requirements
  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience.
  • 5+ years of experience with infrastructure automation, distributed systems design, and experience with designing, developing tools for running large-scale private or public cloud systems in production.
  • Experience in one or more of the following: Python, Go, Perl, or Ruby.
  • In-depth knowledge of Linux, Networking, and Containers.
What We Offer
  • A competitive base salary range of $132,000 - $310,500 USD, determined by location, experience, and pay of employees in similar positions.
  • Eligibility for equity and benefits.
  • A diverse and inclusive work environment.


  • Santa Clara, California, United States Veear Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Veear. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, scalability, and security of our cloud-based systems.Key ResponsibilitiesCollaboration and PartnershipPartner with cross-functional teams to ensure security...


  • Santa Clara, California, United States NVIDIA Full time

    About the RoleNVIDIA is seeking a seasoned Cloud Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a Senior Cloud Engineer, you will be part of a dynamic team that develops and maintains NVIDIA's internal cloud provisioning product for GPUs and Tegra systems.Key ResponsibilitiesDesign and implement scalable, resilient...


  • Santa Clara, California, United States ServiceNow Full time

    Company OverviewAt ServiceNow, we harness technology to create a better world for everyone, driven by our talented workforce. We prioritize speed and innovation to meet the demands of our customers and communities.Joining ServiceNow means becoming part of a dynamic team of innovators who possess a relentless curiosity and a commitment to creativity.We...


  • Santa Clara, California, United States Centrify Corporation Full time

    **About Centrify Corporation**Centrify Corporation is a leading provider of cloud-based identity and access management solutions. Our software runs on public clouds with 99.9% or better uptime and is mission critical for our customers.**Job Summary**We are seeking a highly skilled Cloud Site Reliability Engineer to join our Cloud DevOps team. As a Cloud Site...


  • Santa Clara, California, United States ServiceNow Full time

    Job DescriptionOverviewThe ServiceNow SRE team is a group of highly technical engineers who are tasked with maintaining and developing the reliability, scalability, and performance of the ServiceNow cloud infrastructure.Our SREs are empowered to drive technical resolutions across the technology stack from hardware through to application and all stops in...


  • Santa Clara, California, United States ServiceNow Full time

    Company OverviewAt ServiceNow, we harness technology to enhance global operations, and our dedicated workforce makes it all possible. We operate swiftly because the world demands it, innovating uniquely for our clients and communities.By becoming part of ServiceNow, you join a dynamic team of innovators who possess a relentless curiosity and a passion for...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Company OverviewOur VisionAt Palo Alto Networks, our journey begins and ends with our core mission: To be the premier cybersecurity partner, safeguarding our digital existence.We envision a future where each day is more secure than the last. Our foundation is built on challenging the status quo and redefining norms, and we seek innovators dedicated to...


  • Santa Clara, California, United States Diverse Lynx Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based applications and infrastructure.Key ResponsibilitiesDesign, implement, and maintain cloud infrastructure on...


  • Santa Clara, California, United States Nvidia Full time

    NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High-Performance Computing, and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables unique creativity and discovery, and powers what were...


  • Santa Clara, California, United States Astera Labs Full time

    Astera Labs stands at the forefront of innovative connectivity solutions, enabling the full potential of AI and cloud infrastructure. Our Intelligent Connectivity Platform seamlessly integrates PCIe, CXL, and Ethernet semiconductor-based solutions alongside the COSMOS software suite, delivering a software-defined architecture that is both scalable and...


  • Santa Clara, California, United States NVIDIA Full time

    Job SummaryNVIDIA is seeking a highly skilled Senior SRE Engineer to join its fast-paced Infrastructure, Planning and Processes organization. As a key member of the team, you will be responsible for designing and implementing scalable, resilient cloud infrastructure platforms for NVIDIA's internal cloud provisioning product.Key ResponsibilitiesDesign and...

  • Senior IT Engineer

    2 weeks ago


    Santa Clara, California, United States OmniVision Technologies Full time

    About OmniVision TechnologiesWe are a leading developer of advanced digital imaging solutions, providing a diverse culture that works together on the development of cutting-edge imaging technology, products, and solutions.Job SummaryWe are seeking a highly skilled Senior IT Engineer to lead our cloud infrastructure team. The successful candidate will be...


  • Santa Clara, California, United States Amazon Full time

    About the RoleWe are seeking a Cloud Software Engineer to join our innovative team focused on enhancing the Developer Experience. Our mission is to leverage GenAI to empower developers in creating applications that are faster, more cost-effective, secure, and reliable.GenAI will enable a diverse range of builders to harness the capabilities of AWS,...


  • Santa Clara, California, United States Trillium Staffing Full time

    Job DescriptionTrillium Staffing is seeking a seasoned Senior Cloud Operations Engineer to join its fast-paced Infrastructure, Planning and Processes organization. The ideal candidate will have a strong background in cloud infrastructure and highly available production environments.Key ResponsibilitiesDesign, implement, and maintain sophisticated cloud...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Company OverviewPalo Alto Networks is driven by a mission to be the cybersecurity partner of choice, safeguarding our digital lifestyle. Our vision encompasses a world where each day is more secure than the last.We are built on the principle of challenging the status quo and are in search of innovators dedicated to shaping the future of cybersecurity.Work...


  • Santa Clara, California, United States Palo Alto Networks Full time

    Company OverviewPalo Alto Networks is driven by a singular mission: to be the cybersecurity partner of choice, safeguarding our digital existence.Our vision encompasses a world where each day is more secure than the last. We are built on the principles of challenging norms and innovating in the cybersecurity landscape, seeking individuals who are equally...


  • Santa Clara, California, United States Sage Lake Senior Living Full time

    About the RoleWe are seeking a seasoned Senior SRE Engineer to join our team at Sage Lake Senior Living, where you will play a critical role in ensuring the high availability and performance of our AI-powered applications.Key ResponsibilitiesOperate and improve the observability and maintainability of our distributed microservice cloud applications and...


  • Santa Clara, California, United States XPENG Motors Full time

    About XPeng MotorsXpeng Motors is a leading innovator in the electric vehicle industry, dedicated to designing, developing, and manufacturing cutting-edge smart electric vehicles that seamlessly integrate advanced Internet, AI, and autonomous driving technologies.Job SummaryWe are seeking a highly skilled Senior Staff AI Infrastructure Site Reliability...


  • Santa Clara, California, United States Omnivision Technologies Full time

    Qualifications:Bachelor's degree in Physics, Electrical Engineering, Materials Science, or a related engineering field, with coursework focused on semiconductor physics and electronics. Familiarity with electronic component reliability standards such as JEDEC and AEC-Q100 is advantageous. Experience in wafer-level reliability testing is also beneficial.Key...


  • Santa Clara, California, United States Anello Full time

    About Anello Photonics:ANELLO Photonics is a leading-edge technology company based in Santa Clara, CA. The company has developed integrated photonic system-on-chip technology for next generation navigation. ANELLO's SIPHOGTM gyroscope is based on its patented photonic integrated circuit technology. The result is a product that is higher performance, much...