Cloud Infrastructure

3 weeks ago


Sunnyvale, California, United States Alibaba Cloud Full time

Job DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform aims to build intelligent operations (AIOps), big data security, business monitoring and analysis services to accelerate digital innovation.Focus on alibabaCloud observability platforms (SLS/CMS/ARMS) in multinational cloud environments. Enhance system reliability and engineering delivery efficiency in these environments by implementing infrastructure automation, constructing SLO/SLI management systems, and optimizing scalable operations capabilities to ensure business continuity.Build Automated Operations Systems: Design a reliability engineering framework that includes change management, capacity planning, and self-healing mechanisms to enhance the stability and resilience of infrastructure (compute/storage/network) through Infrastructure as Code (IaC).Lead Standardized Observability Platform Delivery Framework Design: Establish risk assessment models and error budget mechanisms, and achieve quality control and efficiency optimization in the delivery process through automated toolchains.Develop SRE-Based Metrics System: Continuously optimize service health assessment models, achieve automated tracking of SLOs/SLIs, and drive decision-making with observability data.Position RequirementMinimum qualification:Experience: Over 2 years of experience in distributed systems reliability engineering, familiar with high-availability architecture design, and proficient in at least one of Python/Go/Java.Automation: Ability to convert operations experience into automated solutions, and familiar with various observability software and systems.Preferred qualification:SRE Practices: Familiar with core SRE practices (incident review/error budgeting/chaos engineering) and experienced in building automated risk control systems.The pay range for this position at commencement of employment is expected to be between $104,400 and $171,000/year. However, base pay offered may vary depending on multiple individualized factors, including market location, job-related knowledge, skills, and experience.If hired, employee will be in an "at-will position" and the Company reserves the right to modify base salary (as well as any other discretionary payment or compensation program) at any time, including for reasons related to individual performance, Company or individual department/team performance, and market factors.


  • Cloud Infrastructure

    3 weeks ago


    Sunnyvale, California, United States Alibaba Cloud Full time

    Job DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...


  • Sunnyvale, California, United States Alibaba Cloud Full time

    Job DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...


  • Sunnyvale, California, United States Alibaba Cloud Full time

    Job DescriptionAlibaba Cloud Native Observability Team: Responsible for observability products including Alibaba Cloud Log Service (SLS), Application Real-Time Monitoring Service (ARMS), and Cloud Monitoring Service (CMS). We are committed to creating a real-time, intelligent, and large-scale observation and analysis platform for the future. This platform...


  • Sunnyvale, California, United States Alibaba Cloud Full time

    Job Description Alibaba Cloud Native Message Middleware Team is responsible for message products, including RocketMQ and other messaging products. We are committed to creating a more stable, user-friendly, streaming, and large-scale messaging platform for the future. Cloud Product Operations & Reliability Oversee stability maintenance, performance tuning,...


  • Sunnyvale, California, United States beBee Careers Full time

    About the JobThis is an exciting opportunity for a Cloud Infrastructure Developer to join our team. As a Cloud Infrastructure Developer, you will be responsible for designing and developing cloud-based infrastructure that supports our mission-critical applications. Your expertise in cloud technologies, such as Microsoft Azure and Google GCP, will be...


  • Sunnyvale, California, United States beBee Careers Full time

    Cloud Infrastructure Architect Job Description:We are looking for a highly experienced Cloud Infrastructure Architect to join our team. The successful candidate will have expertise in designing and implementing scalable and secure cloud-based systems.Key Responsibilities:Designing and implementing cloud-based infrastructure systems using GCP, AWS, and...


  • Sunnyvale, California, United States beBee Careers Full time

    Job Description: We are seeking an experienced Cloud Infrastructure Specialist to join our team. As a key member of our infrastructure team, you will be responsible for designing, building, and maintaining large-scale cloud-based systems. Your expertise in cloud computing platforms like AWS will be instrumental in driving the success of our projects.Key...


  • Sunnyvale, California, United States beBee Careers Full time

    AWS Python This role involves working as a cloud infrastructure specialist to design and deploy scalable cloud-based systems using AWS services. The ideal candidate will have hands-on experience with AWS Lambda, Glue, Athena, and Step functions. Key Responsibilities: Designing and deploying cloud-based applications on AWS Clouds Maintaining full application...


  • Sunnyvale, California, United States Alibaba Cloud Full time

    Join to apply for the Cloud Native/Middleware Reliability Engineer (SRE)-Middleware role at Alibaba Cloud .Job DescriptionThe Alibaba Cloud Cloud-Native Middleware team is responsible for the research and development of distributed software infrastructure, delivering API Gateway and microservices solutions to enterprise customers, accelerating cloud...


  • Sunnyvale, California, United States beBee Careers Full time

    About the RoleWe are seeking a highly skilled Cloud Infrastructure Manager to oversee the administration of our Microsoft Azure environment. As a key member of our IT team, you will be responsible for ensuring the security, reliability, and performance of our cloud infrastructure.The ideal candidate will have at least 8 years of experience in managing cloud...