Cloud Infrastructure SRE-Sunnyvale
3 weeks ago
on Alibaba Cloud observability platforms (SLS/CMS/ARMS) in multinational cloud environments. Enhance system reliability and engineering delivery efficiency in these environments by implementing infrastructure automation, constructing SLO/SLI management systems, and optimizing scalable operations capabilities to ensure business continuity.Build Automated Operations Systems: Design a reliability engineering framework that includes change management, capacity planning, and self-healing mechanisms to enhance the stability and resilience of infrastructure (compute/storage/network) through Infrastructure as Code (IaC).Lead Standardized Observability Platform Delivery Framework Design: Establish risk assessment models and error budget mechanisms, and achieve quality control and efficiency optimization in the delivery process through automated toolchains.Develop SRE-Based Metrics System: Continuously optimize service health assessment models, achieve automated tracking of SLOs/SLIs, and drive decision-making with observability
RequirementMinimum qualification:Experience: Over 3 years of experience in distributed systems reliability engineering, familiar with high-availability architecture design, and proficient in at least one of Python/Go/Java.Automation: Ability to convert operations experience into automated solutions, and familiar with various observability software and systems.Preferred qualification:SRE Practices: Familiar with core SRE practices (incident review/error budgeting/chaos engineering) and experienced in building automated risk control systems.The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.
-
Cloud Infrastructure SRE-Sunnyvale
3 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...
-
API-Platform SRE-Sunnyvale
2 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionAlibaba Cloud Open Platform team is responsible for cloud enterprise-level capabilities such as API Platform, and enterprise solutions like Landing Zone/Well Architected Framework.DescriptionMaintaining system reliability and ensuring core system availability is critical for Open Platform. The goal of this role is to establish a system...
-
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionThe Alibaba Cloud Cloud-Native Middleware team is responsible for the research and development of distributed software infrastructure and is committed to delivering outstanding API Gateway and microservices solutions to tens of thousands of enterprise customers on Alibaba Cloud, accelerating their cloud migration processes and innovation...
-
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionThe Alibaba Cloud Cloud-Native Middleware team is responsible for the research and development of distributed software infrastructure and is committed to delivering outstanding API Gateway and microservices solutions to tens of thousands of enterprise customers on Alibaba Cloud, accelerating their cloud migration processes and innovation...
-
Cloud Infrastructure
3 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...
-
Cloud Infrastructure
3 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...
-
Network SRE-Sunnyvale
3 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionTeam introduction:The network operation team is responsible for the full life cycle operation of Alibaba Cloud's network infrastructure, providing stable and efficient network services for businesses. We need to continuously ensure the sustainability of cloud computing services and highly stable and efficient operations, promote the...
-
Site Reliability Engineer-Sunnyvale
4 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionCloud Intelligence Group undertakes Alibaba Group's core technologies and business innovations in the high-tech sector, dedicated to building an enterprise-level cloud computing service platform for the digital economy era. It provides leading technology solutions and services globally, characterized by massive business scale and complex...
-
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionMission of the Cloud Intelligence Group SRE TeamThe mission of the Cloud Intelligence Group SRE (Site Reliability Engineering) Team is to ensure the stability of production environments, enterprise-grade cloud data reliability, and service continuity for the Cloud Intelligence Group. Our greatest challenge lies in guaranteeing uninterrupted...
-
Cloud Security SRE-Sunnyvale
3 weeks ago
Sunnyvale, California, United States Alibaba Cloud Full timeJob DescriptionWe are the Alibaba Cloud security product development team, responsible for the development and maintenance of basic components of cloud security products, focusing on the development, operation and management of host security, container security, SIEM and other products. The team is young and energetic, with strong R&D capabilities, providing...