Site Reliability Engineer
3 days ago
At Georgia IT Inc, we are seeking a skilled Site Reliability Engineer to join our team. This role will drive cross-team initiatives that improve Delta engineering practices and increase uptime and performance for the business.
">Job Description:- Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation, and refinement
- Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate
- Responsible for improvements to end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence
- Partner with business and technical product owners to set SLOs / SLIs / error budgets to manage reliability of infrastructure and applications
- Partner with other SREs to bring best practices or learnings from across the organization to them
- Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency
- Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
- Maintain infrastructure (infrastructure as code) and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments
- Practice sustainable incident response and blameless postmortems
- Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems
- Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
- Step back to observe patterns and develop innovative tools and automation to eliminate or minimize menial tasks. Use those learnings to drive the best operational practices
- Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE
- Preserve operational visibility and response capabilities - fixing and improving our dashboards, alerts, and automation
- Maintain operational uptime and reliability by participating in triage and issue support calls for mission critical systems
We are looking for someone with strong experience setting SLOs / SLIs / error budgets and managing of reliability for infrastructure and applications using Kubernetes, AWS Native components, CloudWatch, Dynatrace.
Requirements:- 10+ years of total software engineering experience using Kubernetes, AWS Native components, CloudWatch, Dynatrace
- 5+ years of support a production system on a DevOps team
- 2+ years of experience Architecting using AWS Cloud
- Strong debugging, troubleshooting, and problem-solving skills
- Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties
- Experience liaising with developers, operations staff and third-party resources
- Experience with API integration projects
Salary:$120,000 - $180,000 per year, depending on experience
-
Senior Site Reliability Engineer
3 days ago
Seattle, Washington, United States Oracle Full timeJob OverviewSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of our products and services.ResponsibilitiesWork with our Site Reliability Engineering team on the shared full stack ownership of a...
-
Data Engineering Reliability Expert
3 weeks ago
Seattle, Washington, United States Tik Tok Full timeAbout UsTikTok is a world-leading video platform providing multimedia storage, delivery, and transcoding services. Our US Tech Service department focuses on building the next-generation video processing platform, offering excellent experiences for billions of users worldwide.We follow a hybrid work schedule requiring employees to work in the office 3 days a...
-
Site Reliability Engineer III
3 days ago
Seattle, Washington, United States F5 Networks Full timeJob DescriptionF5 Networks is a leader in delivering solutions that bring a better digital world to life. Our mission is to empower organizations globally to create, secure, and run applications that enhance the user experience.We prioritize diversity and inclusivity, fostering an environment where every individual can thrive. This approach drives our...
-
Reliability and Security Expert
4 days ago
Seattle, Washington, United States Apple Full timeYour ResponsibilitiesAs a Security Site Reliability Engineer, you will work closely with our ASE Security dev team to bring up and mature new services as part of our infrastructure investments. You will ensure the scalability, availability, and performance of our systems, while also maintaining their security and integrity.You will be expected to collaborate...
-
Infrastructure Reliability Engineer
3 weeks ago
Seattle, Washington, United States F5 Networks Full timeAt F5 Networks, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world.Job DescriptionWe are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation....
-
Data Infrastructure Reliability Engineer
3 weeks ago
Seattle, Washington, United States Apple Full timeAbout Apple Services Engineering">Apple's Services Engineering team is a prime example of the company's commitment to combining art and technology. This team powers various services, including the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. They achieve this at an extensive scale, meeting high expectations while delivering...
-
Cloud Reliability Engineering Specialist
3 days ago
Seattle, Washington, United States DAT Solutions Full timeAbout DAT Solutions">DAT Solutions is an award-winning technology company that has revolutionized the transportation supply chain logistics industry for 45 years. We continue to push the boundaries of innovation by deploying cutting-edge software solutions to millions of customers daily, empowering them to make informed business decisions and drive...
-
Senior Cloud Reliability Engineer
3 weeks ago
Seattle, Washington, United States CloudBC Labs Full timeJob Summary:CloudBC Labs is seeking a highly experienced Senior Cloud Reliability Engineer to join our team in Seattle, WA. This is a 12+ month contract position with a salary of $150,000-$180,000 per year.About the Role:The Senior Cloud Reliability Engineer will be responsible for ensuring the health and stability of our production systems, developing...
-
Site Development Engineering Manager
3 days ago
Seattle, Washington, United States KPFF Consulting Engineers Full timeAbout the RoleThe Special Projects Division of KPFF Consulting Engineers is growing and looking for a skilled Civil Engineer to join our dynamic team in Seattle, WA. As a key member, you'll work on a diverse range of heavy civil and industrial infrastructure projects, collaborating with teams to devise innovative solutions and drive successful outcomes.You...
-
DevOps and Cloud Engineering Lead
4 days ago
Seattle, Washington, United States Georgia IT Inc Full timeAt Georgia IT Inc, we are seeking a talented DevOps and Cloud Engineering Lead to join our team. The successful candidate will have extensive experience in Site Reliability / DevOps Engineering, with expertise in PowerShell Scripting, Azure, Monitoring and Observability, and more.The estimated salary for this position is around $150,000 - $220,000 per year,...
-
Engineering Excellence Specialist
3 weeks ago
Seattle, Washington, United States Coupang Full timeCoupang is revolutionizing e-commerce with cutting-edge technology and innovative thinking.As a Principal Engineer, Site Reliability Engineering, you will play a critical role in ensuring the health, performance, and scalability of our customer-facing services. With a strong background in software and system engineering, you will be responsible for building,...
-
Seattle, Washington, United States Hulu Full timeAbout the RoleWe are seeking an experienced Global Engineering Manager to lead our Platform team in the Commerce, Growth & Identity Business Unit. This team is responsible for planning, monitoring, and controlling the day-to-day operations and delivery aspects of Site Reliability, directly impacting subscription numbers and revenue.The successful candidate...
-
Site Development Project Manager
3 weeks ago
Seattle, Washington, United States LPD Engineering Full timeLPD Engineering - A Woman-Owned Civil Engineering Firm is seeking a seasoned Civil Engineer PE with 10+ years of experience to contribute to our team of experts. We're looking for a talented professional who can work on a variety of exciting projects, including educational campuses, civic facilities, parks, residential, mixed-use, and commercial...
-
Seattle, Washington, United States HITT Contracting Full timeAbout UsHITT Contracting is a top national general contractor with over 85 years of experience in commercial construction. Our company was founded in 1937 and has since grown to become one of the leading construction companies in the country.Job SummaryWe are seeking an experienced Construction Project Engineer to join our team. The successful candidate will...
-
Cloud Security Engineer
3 days ago
Seattle, Washington, United States Zscaler Full timeStaff Site Reliability Engineer Job DescriptionZscaler is a cloud security leader, protecting thousands of enterprise customers from cyber threats and data breaches. Our Engineering team has built the world's largest cloud security platform from scratch.About ZscalerWe drive digital transformation to empower enterprises to be more agile, efficient,...
-
Reliable Cloud Systems Architect
4 days ago
Seattle, Washington, United States Saxon Global Full timeAbout UsSaxon Global is a leading provider of innovative solutions to the global market. We pride ourselves on our commitment to quality, reliability, and customer satisfaction. Our team of experts works tirelessly to deliver cutting-edge products and services that meet the evolving needs of our customers. With a focus on scalability, security, and ease of...
-
Cloud Reliability Associate
3 weeks ago
Seattle, Washington, United States Amazon Full timeJob DescriptionThis role is part of the Amazon Web Services (AWS) Region Reliability team, where you will play a crucial part in ensuring the smooth operation of our cloud infrastructure.About the JobAs a Cloud Reliability Associate, your primary responsibility will be to execute defined operational tasks on schedule and identify any ineffective processes or...
-
Cloud Engineer Position
1 week ago
Seattle, Washington, United States Scion Staffing Full timeJob OverviewWe are seeking a Cloud Engineer to join our team at Scion Staffing in Seattle, WA. As a Cloud Engineer, you will be responsible for monitoring cloud systems, troubleshooting complex issues, and improving/automating processes.Key ResponsibilitiesCollaborate with a team of Cloud Engineers to maintain client commitments.Analyze and resolve...
-
Seattle, Washington, United States Apple Inc. Full timeSr Machine Learning Engineer, Siri Performance and ReliabilityThe AIML Performance & Reliability team is looking for a seasoned Senior Machine Learning engineer with a proven track record of building scalable statistical systems for business applications in a fast-paced environment. As the lead developer and architect on the Tools team, you will have...
-
Seattle, Washington, United States Tik Tok Full timeAbout the OpportunityTikTok Backend Infrastructure team is responsible for data access control to all online TikTok data, managing data schema in code for attribution and governing, layout foundation for modernized data tracking, deletion, retention, and linkage. The team is also building the massive horizontally scalable streaming and ingestion services...