Current jobs related to Site Reliability Engineer - San Francisco - Focal Systems


  • San Francisco, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Francisco, United States Bun Full time

    Bun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Unreal Gigs Full time

    Are you passionate about building and maintaining resilient systems that ensure high availability and performance? Do you excel at automating processes, troubleshooting complex issues, and creating systems that scale smoothly? If you're ready to take on the challenge of ensuring reliable, efficient, and secure system operations, our client has the perfect...


  • san francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco Bay Area, United States Bun Full time

    Bun is an open-source JavaScript tooling company focused on making programming simpler. We've raised $26 million from top investors in Silicon Valley, are among the top GitHub repositories and have a growing community of 33,000 Discord members.We're hiring an experienced Site Reliability Engineer to scale and maintain the infrastructure that builds and tests...


  • San Francisco, United States Asystem Full time

    Particle is a startup based in the San Francisco Bay Area. We are seeking candidates who are self-starters, adaptable, and flexible in a startup environment. As a team of veteran technologists from Twitter, Tesla, Periscope, and more, we are developing a next-generation news platform to redefine your daily intake of news. We value active engagement in...


  • San Francisco, United States Perplexity AI Full time

    Perplexity is seeking a Site Reliability Engineer (SRE) to join our small team in revolutionizing the way people search and interact with the internet. You will be responsible for leading the design, implementation, and scaling of the infrastructure and systems that support our web and mobile products. The ideal candidate should have experience in designing...


  • San Francisco, California, United States Springshot Full time

    Springshot lives at the intersection between technology and humanity. We assimilate and simplify the complex, striving to provide users with easy-to-use web and mobile interfaces that present the right information at the right time so they can make the right decision or take the right physical action, including through robotics and autonomous machines.This...


  • San Francisco, United States Focal Systems Full time

    Location: San Francisco - hybrid (1-2 days per week)Salary: $170-190k + stockCompany DescriptionFocal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail...


  • San Jose, United States Avance Consulting Full time

    Role: Site Reliability EngineerLocation: San Jose, CA - OnsiteDuration: Full Time (Permanent Role)We are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage...


  • San Jose, United States Avance Consulting Full time

    Role: Site Reliability EngineerLocation: San Jose, CA - OnsiteDuration: Full Time (Permanent Role)We are seeking a Skied Site Renaulty Engineer (SRE) with expertise in Github Actions, AWS DevOps, nem Charts, and YAML Configuration. The real candidate will be responsible based applications. You will work closely with development teams to implement and manage...


  • San Jose, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Jose, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, United States WEX, Inc. Full time

    About the RoleThe WEX Site Reliability Engineering (SRE) team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits Reliability organization which supports our internal...


  • San Jose, California, United States Syntricate Technologies Full time

    Job Title: Site Reliability EngineerAbout the Job: Syntricate Technologies is seeking an experienced Site Reliability Engineer to join our team in San Jose, CA. As a key member of our infrastructure team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based systems.Responsibilities:Design, implement, and...


  • San Francisco, United States Indotronix International Corporation Full time

    Pay Rate:- W2 Rate $ 61.75 Looking in PST time zone, preferred to be local to SF and willing to go into office occasionally, but okay with Remote (needs to hive high work ethic!) Lead DevOps/Site Reliability Enginee Looking for a resource more senior in the DevOps space, with a leaning toward site reliability engineering. Docker containers,...


  • San Francisco, CA, United States Earnest Current Job Openings Full time

    The Site Reliability Engineer II position will report to the Lead Cloud Engineer. As an SRE II Engineer, you will: Set up and maintain comprehensive monitoring, create and refine playbooks, build dashboards, and adopt industry-standard practices to enhance the reliability and resilience of our site and systems. Develop and manage IaC to ensure reliable,...


  • San Francisco, United States Federal Reserve Bank of San Francisco Full time

    Company: Federal Reserve Bank of San FranciscoJob Description:While the SF Fed is a Reserve Bank, we're not what you might expect. We're unreserved here. That means we seek new and diverse perspectives. We spark conversations and encourage debate. We build opportunity. We pursue careers that are true to ourselves. We are looking for people who want to help...

Site Reliability Engineer

2 months ago


San Francisco, United States Focal Systems Full time

Location: San Francisco - hybrid (1-2 days per week)
Salary: $165-175k + stock

Company Description

Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail using deep learning computer vision. Focal Systems has been deployed at scale with the top retailers in the world. We are looking for smart, creative and passionate people who want to help build a great and enduring company and deploy Deep Learning to the world

Mission of the role: To enable us to scale from 200k to 1 million cameras.

Job Summary

As a Sr. DevOps/Site Reliability Engineer (SRE) at our company, you will play a pivotal role in ensuring the smooth operation and continuous improvement of our infrastructure, deployment processes, and overall system reliability.

Responsibilities

  • Set up and manage blue/green and canary deployments to ensure smooth launches without downtime.
  • Operate multiple large GCP Kubernetes clusters and fine-tune for reliability vs cost.
  • Manage the various distributed services of the company, ensuring to always provide graceful updates, comprehensive test coverage, tracking of logs, and 99.9% uptime.
  • Work with Backend, Frontend and Deep Learning teams and write infrastructure automation code for their needs.
  • Identify scalability bottlenecks through load testing and plan infrastructure architecture.
  • Create tools to provide transparency/ease of access into the company's rich datasets stored across varying geographic locations and data formats.
  • Design, build, and manage a robust Continuous Integration and Continuous Deployment (CI/CD) pipeline.
  • Lead uptime improvement processes including: postmortem review, on-call setup.

Requirements

  • Solid experience in an infrastructure or Site Reliability Engineer (SRE) role.
  • Hands-on experience with containerization (Docker) and orchestration platforms (Kubernetes) required.
  • Experience in cloud cost management.
  • Great understanding of SQL, networking, distributed systems, operating systems (debian) and software engineering practices.
  • Experience with messaging systems.
  • Terraform or other Infrastructure as Code automation solution.
  • Operating Relational SQL databases and Redis at terabyte scale.
  • Proven experience with setting up monitoring/alerting and reliability engineering.
  • Scripting skills in Python.

Nice to have experience:

  • GitOps.
  • Setting up automation for complex load testing scenarios.
  • Tuning Deep Learning pipelines with Python, Pytorch and Multiprocessing.
  • Backend programming with Python.

Why Focal Systems

Strong Values and Mission - We are a tightly-knit team with an ambitious mission and a strong set of core values, which define our approach to business and have successfully guided us since inception.

Exceptional Team - We are a team of hard-working, fun-loving professionals from some of the most eminent universities, research labs, and tech companies of our time. We pride ourselves on recruiting exceptional individuals to help us redefine the state-of-the-art.

Outstanding Partners - We work with 10+ of the largest retailers in the world and have a world-class roster of investors, advisors, and partners to support & advise us in our endeavors.

Benefits

We care deeply about the health, happiness, and wellbeing of all of our employees. We offer:

  • Competitive Salary & Attractive Stock.
  • Paid Time Off.
  • Quarterly Team Retreats.
  • Education grants.
#J-18808-Ljbffr