Senior Site Reliability Engineer
2 weeks ago
Job Description
unique opportunity to join a rapidly growing world-class team to engineer cutting edge storage infrastructure that make up a major cloud provider. As part of the SRE team, you will solve interesting technical challenges by defining, designing, deploying and troubleshooting the Object Storage system. The Object Storage system is a highly durable, and available regional service for data plane, control plane, and virtualization of object storage. You will play a critical role in ensuring the service is reliable, scalable, resilient, secure, and performant.
Responsibilities:
Develop object storage systems, code, and automation for scaling deployment and mission-critical operations for the cloud provider which are deployed across data centers worldwide.
- Engineer storage systems that are resilient eliminating single points of failure and develop test automation to promote reliability, security, scale, and performance.
- Perform engineering activities to bootstrap new storage systems and work with cross-functional teams to build regional services.
- Understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of production services to provide incident response and on call support for production systems.
- Utilize a deep understanding of the service topology and their dependencies required to troubleshoot issues and define mitigations.
- Perform capacity planning for object storage systems that plan for performance and efficiency targets.
- Collect system data to drive and make decisions to achieve monitoring and availability metrics.
- dhere to change management and deployment procedures for multiple storage systems and data centers worldwide.
- Support new product introduction activities and decommission legacy storage systems to promote health of the fleet
- rticulate technical characteristics of object storage systems and guide development teams to engineer and add premier capabilities to the object storage service portfolio.
- Senior level software development proficiency to develop systems, automation, and debug production / mission critical systems.
- Strong proficiency in Java, Python, Shell scripting.
- Expertise in Linux systems internals and advanced system administration, and performance tuning skills.
- Willingness to participate in a 24/7 on-call schedule with customer notifications and escalations.
- Deep understanding of networking protocols
- Real-world experience with production architectures, scalability, and system design with cloud computing and storage design patterns
- Strong methodical approach to troubleshooting large, complex, interconnected systems.
- Familiar with best practices in change management, continuous integration and deployment.
- Bachelor's or Master's degree in Computer Science or related field
-
Senior Site Reliability Engineer
1 week ago
Seattle, WA, United States Dat Services Inc Full timeAbout DATDAT is an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on...
-
Senior Site Reliability Engineer
2 weeks ago
Seattle, WA, United States Dat Services Inc Full timeAbout DATDAT is an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on...
-
Senior Site Reliability Engineer
5 days ago
Seattle, WA, United States Dat Services Inc Full timeAbout DATDAT is an award-winning employer of choice and a next-generation SaaS technology company that has been at the leading edge of innovation in transportation supply chain logistics for 45 years. We continue to transform the industry year over year, by deploying a suite of software solutions to millions of customers every day - customers who depend on...
-
Site Reliability Engineer
3 days ago
Seattle, WA, United States Apple Full timeRole Number: 200635067-3337 Summary The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. Our SRE team combines software and systems engineering and system administration practices to build and run...
-
Site Reliability Engineer
6 days ago
Seattle, WA, United States Apple Full timeRole Number: 200635067-3337 Summary The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. Our SRE team combines software and systems engineering and system administration practices to build and run...
-
Site Reliability Engineer
17 hours ago
Seattle, WA, United States Apple Full timeRole Number: 200635067-3337 Summary The Apple Service Engineering - SRE team is looking for Site Reliability Engineers with experience in developing processes, tools, and automation for managing distributed systems in production environments. Our SRE team combines software and systems engineering and system administration practices to build and run...
-
Site Reliability Engineer
2 days ago
Seattle, WA, United States Kaav Inc. Full timeWho we are We are a yoga-inspired technical apparel company up to big things. The practice and philosophy of yoga informs our overall purpose to elevate the world through the power of practice. We are proud to be a growing global company with locations all around the world, from Vancouver to Shanghai, and places in between. We owe our success to our...
-
Site Reliability Engineer
2 days ago
Seattle, WA, United States Kaav Inc. Full timeWho we are We are a yoga-inspired technical apparel company up to big things. The practice and philosophy of yoga informs our overall purpose to elevate the world through the power of practice. We are proud to be a growing global company with locations all around the world, from Vancouver to Shanghai, and places in between. We owe our success to our...
-
Senior Site Reliability Engineer
4 days ago
Seattle, WA, United States Apple Full timeRole Number: 200604053-3337 Summary The Apple Services Engineering (ASE) team is one of the most exciting examples of Apple’s long-held passion for combining art and technology. These are the people who power the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. And they do it on a massive scale, meeting Apple’s high expectations with...
-
Site Reliability Engineer, Python
3 days ago
Seattle, WA, United States Next Step Systems LTD Full timeSite Reliability Engineer, Python, Seattle, WA There are 5 openings available for the Site Reliability Engineer position. These will be an onsite opportunities in either Los Angeles, CA; New York City, NY; or Seattle, WA. Responsibilities: - Manage cloud infrastructure, provide resource allocation, system upgrades, user access control etc. - Perform deep...