Senior Site Reliability Engineer/DevOps

4 weeks ago


San Francisco CA, United States Fractal Full time

Site Reliability Engineer
Fractal Analytics is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets. We believe that a true Fractalite empowers imagination with intelligence. You will need to work onsite Monday - Friday. We offer paid relocation.**
As a Site Reliability Engineer with Fractal, you will be dedicated to ensuring the highest system availability and performance levels. This role involves comprehensive monitoring, addressing complex technical issues, automating solutions to recurring problems, and contributing to developing resilient system architectures and deployment strategies. You will work closely with our Services and Engineering teams, playing a crucial role in optimizing our platforms and infrastructures.
Ensure maximum uptime and system availability to meet or exceed functional and performance SLAs.
Implement thorough end-to-end monitoring and alerting on all critical components to ensure quick detection and response.
Drive the development of innovative designs, architectures, standards, and methodologies to support and enhance our platform.
Design and configure essential infrastructure, tools, and frameworks to enhance the deployment lifecycle.
Collaborate effectively with cross-functional teams within Services and Engineering.
Have interest and ability to become certified on the end client AI platform. (We will provide all the necessary training and support)
Bachelor’s or master’s degree in computer science, a related field, or equivalent professional experience.
Proven experience in deploying, managing, and optimizing scalable, fault-tolerant Linux/Kubernetes/JVM infrastructure across various cloud platforms like AWS, GCP, and Azure.
Deep expertise in Linux Operating Systems, Networking principles, and Database management.
Proficiency with major cloud services providers, notably AWS, Azure, and GCP.
Familiarity with configuration management tools such as Ansible or Terraform.
Proficiency in programming languages like Ruby or Python, particularly for system automation and monitoring.
Prior experience in a DevOps or system administration role, ideally supporting commercial SaaS solutions.
The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions, including but not limited to skill sets; experience and training; In addition, you may be eligible for a discretionary bonus for the current performance period.
As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take the time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
Employment type Full-time
Job function Information Technology, Consulting, and Engineering
Industries Technology, Information and Media, IT Services and IT Consulting, and Business Consulting and Services



  • San Francisco, CA, United States Jobright.ai Full time

    Mid-Level Site Reliability/ DevOps Engineer Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Mid-Level Site Reliability/ DevOps Engineer 2 days ago Be among the first 25 applicants Join to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright is an AI-powered career platform that helps job seekers discover...


  • San Francisco, United States Jobright.ai Full time

    Mid-Level Site Reliability/ DevOps EngineerJoin to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.aiMid-Level Site Reliability/ DevOps Engineer2 days ago Be among the first 25 applicantsJoin to apply for the Mid-Level Site Reliability/ DevOps Engineer role at Jobright.aiJobright is an AI-powered career platform that helps job...


  • San Francisco, CA, United States Primer Full time

    Primer helps B2B products break out of the B2C-centric marketing box. Our platform turns consumer ad channels, data streams, and emerging AI workflows into measurable growth engines for go-to-market teams. We ingest billions of rows from first- and third-party sources, map them to rich company context, and surface hyper-targeted audiences and real-time...


  • San Francisco, United States Gradle Inc. Full time

    Senior Site Reliability Engineer Gradle Inc. About the Role Join Gradle Inc. as a Senior Site Reliability Engineer overseeing the reliability, performance, and availability of Develocity instances serving paying customers, open‑source projects, and public‑facing services, along with supporting infrastructure such as artifact registries. Company Overview...


  • San Francisco, United States Circle Full time

    Join to apply for the Senior Site Reliability Engineer role at Circle. Circle (NYSE: CRCL) is one of the world’s leading internet financial platform companies, building the foundation of a more open, global economy through digital assets, payment applications, and programmable blockchain infrastructure. Circle’s platform includes the world’s largest...


  • San Francisco, United States Gridware Full time

    OverviewGridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid. We pioneered a new class of grid management called active grid response (AGR), focused on monitoring the electrical, physical, and environmental aspects of the grid that affect reliability and safety. Our Active Grid Response platform uses...


  • San Francisco, United States Gridware Full time

    About GridwareGridware is a San Francisco-based technology company dedicated to protecting and enhancing the electrical grid. We pioneered a groundbreaking new class of grid management called active grid response (AGR), focused on monitoring the electrical, physical, and environmental aspects of the grid that affect reliability and safety. Gridware’s...


  • San Francisco, United States scribehow.com Full time

    A leading documentation and workflow automation platform is looking for a Senior DevOps Engineer in California. This role involves architecting and maintaining critical infrastructure to ensure system reliability and performance optimization. Candidates should have over 7 years of DevOps experience, deep AWS and Kubernetes expertise, and strong skills in...


  • San Francisco, CA, United States Canonical Full time

    Site Reliability Engineer Please ensure you read the below overview and requirements for this employment opportunity completely. Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data...


  • San Francisco, CA, United States Mvp VC Full time

    Wanna join the adventure? In order to make an application, simply read through the following job description and make sure to attach relevant documents. Loft Orbital is revolutionizing access to space by building reliable, shareable satellites that drastically reduce the time and complexity traditionally required to get to orbit. We operate satellites, fly...