Site Reliability Engineer

3 days ago


Seattle, Washington, United States Georgia IT Inc Full time

At Georgia IT Inc, we are seeking a skilled Site Reliability Engineer to join our team. This role will drive cross-team initiatives that improve Delta engineering practices and increase uptime and performance for the business.

">Job Description:
  • Engage in and improve the whole lifecycle of services-from inception and design through deployment, operation, and refinement
  • Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate
  • Responsible for improvements to end-to-end availability and performance of mission critical services and build automation to prevent problem recurrence
  • Partner with business and technical product owners to set SLOs / SLIs / error budgets to manage reliability of infrastructure and applications
  • Partner with other SREs to bring best practices or learnings from across the organization to them
  • Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency
  • Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
  • Maintain infrastructure (infrastructure as code) and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments
  • Practice sustainable incident response and blameless postmortems
  • Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems
  • Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
  • Step back to observe patterns and develop innovative tools and automation to eliminate or minimize menial tasks. Use those learnings to drive the best operational practices
  • Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE
  • Preserve operational visibility and response capabilities - fixing and improving our dashboards, alerts, and automation
  • Maintain operational uptime and reliability by participating in triage and issue support calls for mission critical systems

We are looking for someone with strong experience setting SLOs / SLIs / error budgets and managing of reliability for infrastructure and applications using Kubernetes, AWS Native components, CloudWatch, Dynatrace.

Requirements:
  • 10+ years of total software engineering experience using Kubernetes, AWS Native components, CloudWatch, Dynatrace
  • 5+ years of support a production system on a DevOps team
  • 2+ years of experience Architecting using AWS Cloud
  • Strong debugging, troubleshooting, and problem-solving skills
  • Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties
  • Experience liaising with developers, operations staff and third-party resources
  • Experience with API integration projects

Salary:$120,000 - $180,000 per year, depending on experience



  • Seattle, Washington, United States Oracle Full time

    Job OverviewSolve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of our products and services.ResponsibilitiesWork with our Site Reliability Engineering team on the shared full stack ownership of a...


  • Seattle, Washington, United States Tik Tok Full time

    About UsTikTok is a world-leading video platform providing multimedia storage, delivery, and transcoding services. Our US Tech Service department focuses on building the next-generation video processing platform, offering excellent experiences for billions of users worldwide.We follow a hybrid work schedule requiring employees to work in the office 3 days a...


  • Seattle, Washington, United States F5 Networks Full time

    Job DescriptionF5 Networks is a leader in delivering solutions that bring a better digital world to life. Our mission is to empower organizations globally to create, secure, and run applications that enhance the user experience.We prioritize diversity and inclusivity, fostering an environment where every individual can thrive. This approach drives our...


  • Seattle, Washington, United States Apple Full time

    Your ResponsibilitiesAs a Security Site Reliability Engineer, you will work closely with our ASE Security dev team to bring up and mature new services as part of our infrastructure investments. You will ensure the scalability, availability, and performance of our systems, while also maintaining their security and integrity.You will be expected to collaborate...


  • Seattle, Washington, United States F5 Networks Full time

    At F5 Networks, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world.Job DescriptionWe are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation....


  • Seattle, Washington, United States Apple Full time

    About Apple Services Engineering">Apple's Services Engineering team is a prime example of the company's commitment to combining art and technology. This team powers various services, including the App Store, Apple TV, Apple Music, Apple Podcasts, and Apple Books. They achieve this at an extensive scale, meeting high expectations while delivering...


  • Seattle, Washington, United States DAT Solutions Full time

    About DAT Solutions">DAT Solutions is an award-winning technology company that has revolutionized the transportation supply chain logistics industry for 45 years. We continue to push the boundaries of innovation by deploying cutting-edge software solutions to millions of customers daily, empowering them to make informed business decisions and drive...


  • Seattle, Washington, United States CloudBC Labs Full time

    Job Summary:CloudBC Labs is seeking a highly experienced Senior Cloud Reliability Engineer to join our team in Seattle, WA. This is a 12+ month contract position with a salary of $150,000-$180,000 per year.About the Role:The Senior Cloud Reliability Engineer will be responsible for ensuring the health and stability of our production systems, developing...


  • Seattle, Washington, United States KPFF Consulting Engineers Full time

    About the RoleThe Special Projects Division of KPFF Consulting Engineers is growing and looking for a skilled Civil Engineer to join our dynamic team in Seattle, WA. As a key member, you'll work on a diverse range of heavy civil and industrial infrastructure projects, collaborating with teams to devise innovative solutions and drive successful outcomes.You...


  • Seattle, Washington, United States Georgia IT Inc Full time

    At Georgia IT Inc, we are seeking a talented DevOps and Cloud Engineering Lead to join our team. The successful candidate will have extensive experience in Site Reliability / DevOps Engineering, with expertise in PowerShell Scripting, Azure, Monitoring and Observability, and more.The estimated salary for this position is around $150,000 - $220,000 per year,...


  • Seattle, Washington, United States Coupang Full time

    Coupang is revolutionizing e-commerce with cutting-edge technology and innovative thinking.As a Principal Engineer, Site Reliability Engineering, you will play a critical role in ensuring the health, performance, and scalability of our customer-facing services. With a strong background in software and system engineering, you will be responsible for building,...


  • Seattle, Washington, United States Hulu Full time

    About the RoleWe are seeking an experienced Global Engineering Manager to lead our Platform team in the Commerce, Growth & Identity Business Unit. This team is responsible for planning, monitoring, and controlling the day-to-day operations and delivery aspects of Site Reliability, directly impacting subscription numbers and revenue.The successful candidate...


  • Seattle, Washington, United States LPD Engineering Full time

    LPD Engineering - A Woman-Owned Civil Engineering Firm is seeking a seasoned Civil Engineer PE with 10+ years of experience to contribute to our team of experts. We're looking for a talented professional who can work on a variety of exciting projects, including educational campuses, civic facilities, parks, residential, mixed-use, and commercial...


  • Seattle, Washington, United States HITT Contracting Full time

    About UsHITT Contracting is a top national general contractor with over 85 years of experience in commercial construction. Our company was founded in 1937 and has since grown to become one of the leading construction companies in the country.Job SummaryWe are seeking an experienced Construction Project Engineer to join our team. The successful candidate will...


  • Seattle, Washington, United States Zscaler Full time

    Staff Site Reliability Engineer Job DescriptionZscaler is a cloud security leader, protecting thousands of enterprise customers from cyber threats and data breaches. Our Engineering team has built the world's largest cloud security platform from scratch.About ZscalerWe drive digital transformation to empower enterprises to be more agile, efficient,...


  • Seattle, Washington, United States Saxon Global Full time

    About UsSaxon Global is a leading provider of innovative solutions to the global market. We pride ourselves on our commitment to quality, reliability, and customer satisfaction. Our team of experts works tirelessly to deliver cutting-edge products and services that meet the evolving needs of our customers. With a focus on scalability, security, and ease of...


  • Seattle, Washington, United States Amazon Full time

    Job DescriptionThis role is part of the Amazon Web Services (AWS) Region Reliability team, where you will play a crucial part in ensuring the smooth operation of our cloud infrastructure.About the JobAs a Cloud Reliability Associate, your primary responsibility will be to execute defined operational tasks on schedule and identify any ineffective processes or...


  • Seattle, Washington, United States Scion Staffing Full time

    Job OverviewWe are seeking a Cloud Engineer to join our team at Scion Staffing in Seattle, WA. As a Cloud Engineer, you will be responsible for monitoring cloud systems, troubleshooting complex issues, and improving/automating processes.Key ResponsibilitiesCollaborate with a team of Cloud Engineers to maintain client commitments.Analyze and resolve...


  • Seattle, Washington, United States Apple Inc. Full time

    Sr Machine Learning Engineer, Siri Performance and ReliabilityThe AIML Performance & Reliability team is looking for a seasoned Senior Machine Learning engineer with a proven track record of building scalable statistical systems for business applications in a fast-paced environment. As the lead developer and architect on the Tools team, you will have...


  • Seattle, Washington, United States Tik Tok Full time

    About the OpportunityTikTok Backend Infrastructure team is responsible for data access control to all online TikTok data, managing data schema in code for attribution and governing, layout foundation for modernized data tracking, deletion, retention, and linkage. The team is also building the massive horizontally scalable streaming and ingestion services...