Senior Site Reliability Engineer

3 weeks ago


Seattle, Washington, United States Saxon Global Full time
Job Summary

This position contributes to Saxon Global's Data Platform Services team, maintaining and improving the data platform that many services are dependent on. The successful candidate will be responsible for the health of production systems, developing monitoring dashboards, and configuring alerts and automation for system recovery.

Key Responsibilities
  • Responsible for the health of production systems
  • Develop monitoring dashboards
  • Configure alerts and automate process for system recovery
  • Monitor alerts and take proactive steps to resolve system issues
  • Troubleshoot production issues
  • Lead production troubleshooting calls
  • Responsible for patches and updates on production systems
  • Design and build cutting-edge, multi-micro service solutions to support Saxon Global's growth worldwide
  • Work with cross-functional teams for on-going design efforts and systems support
  • Automate password and certificate rotations on application and DB servers
  • Helping CI/CD team during rolling out application and infrastructure globally
  • Collaborates with development team, other Information Technology (IT) team's developer leads
  • Initiates process improvements for new and existing systems
  • Coaches, and mentors other team members
  • Performs cross-training and facilitates information sharing among team members
  • Participates in a production support rotation that includes pager responsibilities
Requirements
  • Senior Site Reliability Engineering Experience - 5+ years as an SRE and 7+ total in IT Industry
  • Expert in Azure Cloud
  • Expert in Kubernetes
  • Strong Skills with SQL
  • Strong skills with Kafka, Event Hub, or other Messaging Broker
  • Must be able to commute to Saxon Global headquarters in Seattle 3 times a week - local to Seattle area
Additional Requirements
  • Requires 7+ years experience in the IT industry
  • Requires 5+ years of software and DevOps development engineering
  • Experience in working with cloud environment Azure preferred
  • Experience with using Kafka, Event Hub or any messaging broker
  • Experience with Cassandra, PostgresSQL, Cosmos DB
  • Experience on Jenkins/ Python / Terraform / Ansible
  • Experience with DataDog, Splunk or other logging and APM tools
  • Experience in working with Linux environment
  • In-depth understanding of Computer Science fundamentals in object-oriented design, data structures, algorithms, and problem solving
  • Experience building complex, scalable, high-performance software systems that have been successfully delivered to customers
  • Demonstrated knowledge of best practices for the design and implementation of large-scale systems as well as experience in taking such systems from design to production
  • Experience building and operating mission critical, highly available (24x7) systems
  • Ability to work well with a team in a fast-paced agile development environment
  • Bachelors in Computer Science or equivalent work experience
  • Excellent communication, analytical and problem-solving skills
  • Extensive understanding in SDLC and scrum methodologies


  • Seattle, Washington, United States SingleStore Full time

    Senior Site Reliability EngineerAt SingleStore, we're seeking a seasoned Senior Site Reliability Engineer to drive our Kubernetes product strategy and help shape the future of our managed service.Key ResponsibilitiesDesign and build elastic Kubernetes clusters across on-prem, AWS, Azure, and Google Cloud environments.Develop and maintain production container...


  • Seattle, Washington, United States Saxon Global Full time

    Job SummaryStarbucks is seeking a highly skilled Senior Site Reliability Engineer to join their Data Platform Services team. This team is responsible for maintaining and improving the data platform that many Starbucks services rely on.Key ResponsibilitiesEnsure the health and stability of production systemsDevelop and implement monitoring dashboards and...


  • Seattle, Washington, United States F5 Networks Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at F5 Networks. As a key member of our engineering team, you will be responsible for ensuring the reliability and performance of our systems.Key ResponsibilitiesDesign and implement scalable and efficient system architecturesDevelop and maintain monitoring and...


  • Seattle, Washington, United States Moloco Full time

    About MolocoMoloco is a cutting-edge machine learning company that empowers organizations to grow and unlock the full value of their unique first-party data. Our mission is to elevate the traditional path to performance advertising, making it accessible to businesses of all sizes.Our TechnologyWe operate at massive scale, ingesting 10 petabytes of training...


  • Seattle, Washington, United States Diverse Lynx Full time

    Job Title: Sr. Site Reliability EngineerLocation: RemoteDuration: 12+ Months contractJob Description:We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the availability, reliability, and performance of our applications and services.You will work...


  • Seattle, Washington, United States Sogeti Full time

    Job Title: Site Reliability EngineerAbout the Role:We are seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States HireIO Inc Full time

    Job Title: Site Reliability EngineerHireIO Inc is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and performance of our distributed systems.Key Responsibilities:Design and implement scalable and reliable systemsCollaborate with cross-functional...


  • Seattle, Washington, United States Oracle Full time

    About the Role:Oracle is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Design, develop, and deploy software to improve the availability, scalability, and efficiency of...


  • Seattle, Washington, United States Sogeti Full time

    Site Reliability Engineer **Job Summary** We are seeking an experienced Site Reliability Engineer to join our team. As a key member of our operations team, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure. **Key Responsibilities** * Design, implement, and maintain scalable and reliable cloud...


  • Seattle, Washington, United States Apple Full time

    Job Title: Site Reliability EngineerAt Apple, we're looking for a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will play a critical role in ensuring the security, reliability, and scalability of our systems and infrastructure.About the RoleWe are seeking a talented and motivated individual to join our dynamic...


  • Seattle, Washington, United States Oracle Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Oracle. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our cloud infrastructure. You will work closely with our development teams to design, implement, and operate large-scale distributed...


  • Seattle, Washington, United States Elit IT Inc. Full time

    Job Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Elit IT Inc. in Seattle, WA. As a key member of our cloud operations team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Key Responsibilities:Design, implement, and...


  • Seattle, Washington, United States F5 Networks Full time

    About the RoleWe are seeking a highly skilled Sr SRE to join our team at F5 Networks. As a key member of our engineering community, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesApply observability and data skills to proactively measure system performance, diagnose services/needs, and quickly...


  • Seattle, Washington, United States Tik Tok Full time

    About TikTok U.S. Data SecurityTikTok U.S. Data Security is a subsidiary of TikTok in the U.S., dedicated to protecting user data and ensuring the security of our platform.ResponsibilitiesWe are seeking a highly motivated and experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the...


  • Seattle, Washington, United States Saxon Global Full time

    Job SummaryThis position contributes to Starbucks on their Data Platform Services team, maintaining and improving the data platform that many Starbucks services are dependent on. The successful candidate will be responsible for the health of the production system, developing monitoring dashboards, configuring alerts, and automating process for system...


  • Seattle, Washington, United States Apple Full time

    Job Title: Site Reliability EngineerAt Apple, we're looking for a skilled Site Reliability Engineer to join our Object Storage SRE team. As a Site Reliability Engineer, you'll play a critical role in ensuring the reliability, scalability, and performance of our cloud storage systems.About the RoleWe're seeking a seasoned software and systems engineer with a...


  • Seattle, Washington, United States Sogeti Full time

    Site Reliability EngineerWe are seeking an experienced Site Reliability Engineer to join our team at Sogeti. As a key member of our operations team, you will be responsible for ensuring the reliability and scalability of our cloud-based infrastructure.Key Responsibilities:Design and implement scalable and reliable cloud infrastructure using Azure or...


  • Seattle, Washington, United States HireIO Inc Full time

    Job SummaryAt HireIO Inc, we are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the availability, scalability, and reliability of our Ads systems. This includes designing, analyzing, and troubleshooting large-scale distributed systems, as well as developing tools and...


  • Seattle, Washington, United States Capgemini Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and performance of our software systems and infrastructure.Key Responsibilities:Develop, maintain, and configure cloud observability systems (e.g.,...


  • Seattle, Washington, United States MongoDB Full time

    About the RoleMongoDB is seeking a highly skilled Senior Cloud Site Reliability Engineer to join our Cloud Team. As a key member of our team, you will be responsible for designing and building the global infrastructure on which we deploy our services.ResponsibilitiesDesign and build the infrastructure for a global cloud service that comprises hundreds of...