Reliability and Performance Expert

1 week ago


Seattle, Washington, United States Qumulo Careers Full time
Job Overview

Qumulo Careers is seeking a talented Site Reliability Engineer (SRE) to join our team. As an SRE, you will be responsible for developing and maintaining critical systems that support our platform, focusing on build and test infrastructure, automation, and monitoring. This role requires a strong background in Linux, Python, and system orchestration tools, along with experience working with cloud providers and container management.

Main Responsibilities
  • Design and implement build and test infrastructure to ensure seamless execution of multiple builds and hundreds of thousands of tests in both on-prem and cloud environments.
  • Leverage automation to improve build and test processes, enhancing efficiency and scalability.
  • Troubleshoot complex issues, from build-time errors to integration test failures involving virtual machines and Qumulo-qualified hardware.
  • Design and implement monitoring systems to guarantee optimal system performance and notify engineers of any issues.
Requirements
  • Expertise in Linux (Ubuntu)
  • Proficiency in Python or similar programming languages
  • System orchestration tool experience (Ansible, Terraform, cloud-specific implementations)
  • Familiarity with one or more major cloud providers (AWS, GCP, Azure)
  • Working knowledge of Kubernetes and container management
  • Monitoring tools expertise (homegrown solutions and tools like Grafana, InfluxDB, Prometheus)
  • Strong troubleshooting skills
  • Build automation and test framework knowledge
Salary Range: Estimated annual salary $140,000-$170,000, including comprehensive benefits package and competitive compensation.

  • Seattle, Washington, United States Diverse Lynx Full time

    Site Reliability and Monitoring Expert WantedDiverse Lynx LLC is seeking a Site Reliability and Monitoring Expert to join our team in Seattle/Bellevue, WA. This is a contract position that requires onsite work from day one. The successful candidate will have 6-8 years of experience in building monitoring dashboards, automating tasks, and troubleshooting...


  • Seattle, Washington, United States Georgia IT Inc Full time

    Job OverviewGeorgia IT Inc is seeking an experienced Cloud Reliability Expert to join our team. The ideal candidate will have a strong background in Site Reliability / DevOps Engineering and extensive experience with Azure, PowerShell Scripting, Monitoring and Observability tools, Infrastructure as a Code platforms, and Chef and Ansible automation...


  • Seattle, Washington, United States Amazon Full time

    About the Role:">We're looking for a High-Performance Computing Expert to join our team as a Machine Learning Optimization Specialist. In this role, you'll work on designing and implementing efficient compilation pipelines for complex transformer architectures, collaborating closely with scientists to influence model architectures for optimal hardware...


  • Seattle, Washington, United States Apple Full time

    Your ResponsibilitiesAs a Security Site Reliability Engineer, you will work closely with our ASE Security dev team to bring up and mature new services as part of our infrastructure investments. You will ensure the scalability, availability, and performance of our systems, while also maintaining their security and integrity.You will be expected to collaborate...


  • Seattle, Washington, United States Robbins-Gioia LLC Full time

    Job Title: Business Performance Optimization ExpertAbout the OpportunityWe are looking for an experienced Business Performance Optimization Expert to join our team at Robbins-Gioia LLC and support our Air Force customer in optimizing business operations.This role is critical to driving mission success, and the ideal candidate will have the opportunity to...


  • Seattle, Washington, United States Tik Tok Full time

    About UsTikTok is a world-leading video platform providing multimedia storage, delivery, and transcoding services. Our US Tech Service department focuses on building the next-generation video processing platform, offering excellent experiences for billions of users worldwide.We follow a hybrid work schedule requiring employees to work in the office 3 days a...


  • Seattle, Washington, United States MacDonald-Miller Full time

    About UsMacDonald-Miller Facility Solutions is a leading mechanical contracting firm in the Northwest, renowned for its innovative approach to building solutions. With over 1000 employees across 11 offices, we offer a diverse and engaging work environment that fosters growth and excellence.Salary RangeThe salary for this role ranges from $95K to $124K...


  • Seattle, Washington, United States SingleStore Full time

    **About SingleStore**SingleStore is seeking a Senior Site Reliability Engineer to drive its Kubernetes product strategy surrounding its managed service. This role will be instrumental in crafting the design, building out the collaborated vision, and sustaining the envisioned product strategy.The ideal candidate will have a strong background in Kubernetes and...


  • Seattle, Washington, United States Apple Inc. Full time

    Sr Machine Learning Engineer, Siri Performance and ReliabilityThe AIML Performance & Reliability team is looking for a seasoned Senior Machine Learning engineer with a proven track record of building scalable statistical systems for business applications in a fast-paced environment. As the lead developer and architect on the Tools team, you will have...


  • Seattle, Washington, United States Tik Tok Full time

    About Our TeamOur Backend Infrastructure team is a high-impact and fast-paced environment that requires innovative thinking and collaboration. We are looking for motivated individuals who can work closely with multidisciplinary teams to drive impact.Job SummaryWe are seeking an experienced Senior Software Engineer to join our team, focusing on complex...


  • Seattle, Washington, United States EOS USA Full time

    Job Summary:This is a unique opportunity to join EOS USA as a leading expert in Collaboration Reliability Engineering. The successful candidate will have extensive experience in IT, operations, networking, scripting, and collaboration technologies. As a senior leader, you will manage a team of reliability engineers and oversee the development of...


  • Seattle, Washington, United States Saxon Global Full time

    Job OpportunityWe are seeking a highly skilled Senior Site Reliability Engineer to join our Data Platform Services team. In this role, you will contribute to the design, development, and operation of large-scale data platforms that support various Starbucks services. Your primary responsibility will be to ensure the health and performance of production...


  • Seattle, Washington, United States Slate Full time

    About the Job: We are seeking a Principal Quality Assurance Engineer to join our team at Slate. As a key member of our engineering organization, you will be responsible for ensuring the quality and reliability of our products.Key Responsibilities: Develop, implement, and maintain comprehensive test plans and test casesCollaborate with cross-functional teams...


  • Seattle, Washington, United States Apple Full time

    **About Apple**Apple is a technology company that designs, manufactures, and markets consumer electronics, computer software, and online services. Our mission is to bring the best user experience to our customers through innovative products and services.We are committed to making a positive impact on the world by reducing our environmental footprint,...


  • Seattle, Washington, United States SingleStore Full time

    Job SummaryMemoirLogics is seeking a seasoned professional to lead our Kubernetes product strategy as a Senior Site Reliability Engineer. This role requires a deep understanding of Kubernetes, container ecosystems, and Unix/Linux operating systems.The ideal candidate will have experience designing and implementing production-ready container orchestration...


  • Seattle, Washington, United States Honda and Toyota of Seattle Full time

    About the RoleAs a highly skilled Sales Professional at Honda and Toyota of Seattle, you will be responsible for delivering exceptional customer experiences while meeting or exceeding sales targets. This is an excellent opportunity to join a dynamic team in the Pacific Northwest, where you can leverage your expertise to drive business growth and...


  • Seattle, Washington, United States Tik Tok Full time

    About This OpportunityTikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs, or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please let us know.We're looking for a skilled CDN Site Reliability...


  • Seattle, Washington, United States Amazon Full time

    The Amazon Web Services (AWS) Global Demand and Operations team is seeking a seasoned technical expert to spearhead the development of high-quality, scalable systems that meet business needs.As an AWS Software Engineer, you will serve as a technical leader on cross-functional projects, ensuring the quality of architecture and design of systems. Your...


  • Seattle, Washington, United States Georgia IT Inc Full time

    Georgia IT Inc is seeking an expert Azure DevOps Engineer to lead our efforts in designing, deploying, and maintaining scalable cloud-based systems on Azure. As a senior team member, you will drive the development and implementation of new technologies to enhance our cloud infrastructure, ensuring high availability and performance.The estimated salary range...


  • Seattle, Washington, United States Saxon Global Full time

    About UsSaxon Global is a leading provider of innovative solutions to the global market. We pride ourselves on our commitment to quality, reliability, and customer satisfaction. Our team of experts works tirelessly to deliver cutting-edge products and services that meet the evolving needs of our customers. With a focus on scalability, security, and ease of...