Current jobs related to Site Reliability Engineer - San Francisco - Talkdesk


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry.With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index...


  • San Francisco, California, United States Diverse Lynx Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Diverse Lynx LLC. As a key member of our organization, you will play a critical role in ensuring the reliability and efficiency of our digital infrastructure.Key Responsibilities:Design and implement reliable digital infrastructure solutionsCollaborate with...


  • San Francisco, California, United States Diverse Lynx Full time

    Role OverviewWe are seeking a highly skilled Reliability Engineer to join our team at Diverse Lynx LLC. As a key member of our organization, you will be responsible for ensuring the reliability and resilience of our digital systems.Key ResponsibilitiesDesign and implement reliable digital systems and processesCollaborate with cross-functional teams to...


  • San Francisco, California, United States Diverse Lynx Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer with 7+ years of experience in Java SRE to join our team at Diverse Lynx LLC. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our systems.Key ResponsibilitiesDesign and implement monitoring and alerting systems to ensure prompt...


  • San Francisco, California, United States Instabase Full time

    About InstabaseAt Instabase, we're passionate about harnessing the power of AI to democratize access to cutting-edge innovation and empower organizations to solve complex unstructured data problems. With a global presence and a customer-centric approach, we're committed to delivering top-tier solutions that drive business success.Job SummaryWe're seeking a...


  • San Francisco, California, United States Xero Full time

    About the RoleXero is a leading cloud-based accounting platform that empowers small businesses and their advisors to thrive. As a Site Reliability Engineer on our Reliability Enablement team, you'll play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in...


  • San Francisco, United States PicnicHealth Full time

    [Full Time] Site Reliability Engineer at PicnicHealth (United States) Site Reliability Engineer PicnicHealth United States Date Posted: 10 Aug, 2023 Work Location: San Francisco, United States Salary Offered: $160 — $190 yearly Job Type: Full Time Experience Required: 6+ years Remote Work: Yes Stock Options: No Vacancies: 1 available Healthcare needs good...


  • San Francisco, United States Patreon Full time

    Patreon is the best place for creators to build exclusive content and community for their fans. We enable creators (podcasters, writers, musicians, illustrators, etc) to connect with their fans directly and make money from their creative work. Creators can sell one-off items from their own shops or offer recurring monthly memberships with exclusive access to...


  • San Francisco, United States Instabase Full time

    At Instabase, we're passionate about democratizing access to cutting-edge AI innovation to enable any organization to solve previously unsolvable unstructured data problems in their industry. With customers representing some of the largest and most complex organizations in the world, and investors like Greylock, Andreessen Horowitz, and Index Ventures, our...


  • San Francisco, United States Autodesk Full time

    Job Requisition ID # 24WD81384 Position Overview At Autodesk, we're not just a world leader in 3D design, engineering, and entertainment software; we're a hub of innovation committed to solving complex design and real-world problems. Our extensive software suite empowers users across industries to bring their ideas to life and shape a sustainable future....


  • San Francisco, California, United States Autodesk Full time

    {"Responsibilities": "As a Senior Site Reliability Engineer at Autodesk, you will be responsible for leading the development and maintenance of robust cloud infrastructure to support millions of daily users. You will automate processes to improve system reliability and introduce best practices in continuous integration and deployment. You will also lead...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer SRE Apollo Solutions have proudly partnered with a Series E SaaS organization based in San Francisco. They have recently employed a highly respected CEO who has spent his career successfully scaling multiple start-ups with large exit events including a $1 billion+ IPO. We are looking for a Principal SRE based in San...


  • San Francisco, United States PostHog Full time

    [Full Time] Site Reliability Engineer at PostHog (United States) Site Reliability Engineer PostHog United States Date Posted: 31 Oct, 2022 Work Location: San Francisco, United States Salary Offered: Not Specified Job Type: Full Time Experience Required: 3+ years Remote Work: Yes Stock Options: No Vacancies: 1 available About PostHog PostHog is an open-source...


  • San Francisco, California, United States Pager Full time

    About the RolePagerDuty is seeking a highly skilled Senior Site Reliability Engineer to join our SRE-Platform team. As a key contributor, you will play a crucial role in building, maintaining, and scaling our Kubernetes platform.Key ResponsibilitiesMaintain the overall health of the platform, including triaging and troubleshooting production issues,...


  • San Francisco, United States Autodesk Full time

    Senior Site Reliability Engineer Apply Location: San Francisco, CA, USA Time Type: Full time Posted On: Posted 3 Days Ago Job Requisition ID: 24WD81384 Position Overview At Autodesk, we're not just a world leader in 3D design, engineering, and entertainment software; we're a hub of innovation committed to solving complex design and real-world problems. Our...


  • San Francisco, United States Resource Informatics Group Full time

    Job Title: Site Reliability Engineer Work Location: San Francisco, CA (Hybrid after showing successful engagement) Duration: 18+ months Most important skills:10 years of Oracle database administration experience on large production environment Database hands on skills especially around database and system troubleshooting and administration GoldenGate setup,...


  • San Francisco, United States Fieldguide.ai Full time

    [Full Time] Senior Site Reliability Engineer at Fieldguide (United States) | BEAMSTART Jobs Senior Site Reliability Engineer Fieldguide United States Date Posted: 31 Oct, 2022 Work Location: San Francisco, United States Salary Offered: Not Specified Job Type: Full Time Experience Required: 3+ years Remote Work: Yes Stock Options: No Vacancies: 1...


  • San Francisco, California, United States AutoRABIT Holding Inc. Full time

    Job OverviewAbout AutoRABIT:AutoRABIT is a rapidly expanding SaaS company recognized as the premier provider of Salesforce DevSecOps solutions tailored for regulated sectors such as finance, insurance, and healthcare. Our platform empowers developers to streamline their workflows, enhancing productivity and accelerating release cycles while adhering to...


  • San Francisco, California, United States DataRobot Full time

    Job Title: Director of Site Reliability EngineeringDataRobot is the leader in Value-Driven AI, a unique and collaborative approach to generative and predictive AI that combines an open platform, deep expertise, and broad use-case experience to improve how organizations run, grow, and optimize their business. The DataRobot AI Platform is the only complete AI...


  • San Francisco, California, United States PicnicHealth Full time

    About the RoleWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at PicnicHealth. As a key member of our engineering team, you will be responsible for ensuring the reliability, efficiency, and architecture of our cloud, developer, and security operations.As a Senior SRE, you will take the lead in identifying and resolving...

Site Reliability Engineer

4 months ago


San Francisco, United States Talkdesk Full time

At Talkdesk, we are courageous innovators focused on helping organizations around the world create better customer experiences. Our AI-powered cloud contact center solutions optimize our customers’ most critical customer service processes. We are recognized as a Contact Center as a Service (CCaaS) leader by influential research organizations including Gartner. With $498 million in total funding, a valuation of more than $10 Billion, and a ranking of #8 on the Forbes Cloud 100 list, now is the time to be part of the Talkdesk legacy to help accelerate our success in a new decade of transformational growth. We champion an inclusive and diverse culture representative of the communities in which we live and serve. And, we give back to our community by volunteering our time, supporting non-profits and minimizing our global footprint. Our Engineering team follows a micro-service architecture approach to build the next generation of Talkdesk, with vertical teams responsible for all the decisions under their services. Through our Agile Coaches, we promote agile and collaborative practices, we are huge fans of Scrum, pair programming and we won’t let a single line of code reach production without peer code reviews. We strongly believe that the only true authority stems from knowledge, not from position and we always treat others with respect, deference and patience. We are looking for Site Reliability Engineers (SREs) who can help us design, build, and maintain high-performance, scalable, and reliable services. As Talkdesk provides a Contact Center service, we play a very critical role in our Customer’s business operations and therefore need to provide a highly available and fault tolerant service.

As an SRE at Talkdesk you will build, run, and maintain components that serve as the infrastructure foundation for the rest of Talkdesk, with the objective of having the least manual intervention possible, while also ensuring high availability and reliability of those components. You will also partner with other product engineering teams to help make their services more performant, scalable, observable and reliable. We believe in a DevOps philosophy where every engineering team at Talkdesk should be responsible for the software they build and deploy and SREs play a critical role in ensuring that the teams have the tools, practices, and expertise to make that happen in a blame free culture. Responsibilities: Design, build, harden, and maintain the core infrastructure used by all of Talkdesk’s engineering teams Automate every aspect of our infrastructure to remove as much as possible any human intervention Help keep existing base infrastructure running smoothly Develop effective tooling, alerts, and response to both identify and address reliability risks Drive and promote protocols on production readiness and operational excellence Participate in on-call rotation alongside other engineering teams (opt-in) Partner with product engineering teams to debug production outages and carry out action items to improve reliability of those systems Participate in design reviews and production reviews for new features, products, or pieces of infrastructure Plan for growth of Talkdesk’s infrastructure Requirements: Understanding of the importance of observability, and have good intuitions about what to measure and how Know your way around a Linux/Unix system Experience with Terraform and Packer Ability to identify time consuming and error prone manual tasks and then build tooling to automate them Understand large-scale complex systems from a reliability perspective Ability to identify root causes of instability in a large-scale distributed system, across stacks Hold yourself and others around you to higher stands when working with production Bringing a developer mindset and applying it to infrastructure You value simplicity Nice to haves / Pluses: Experience with cloud-based solutions such as Amazon AWS, Google Cloud, or Microsoft Azure Experience with technologies such as Docker, Consul, Vault, Jenkins, Concourse, Prometheus, Nexus Experience with PaaS-like solutions such as Heroku, Kubernetes, Docker Swarm, Mesos, or OpenStack Experience with messaging systems such as RabbitMQ or Kafka Operational knowledge with various data stores such as MongoDB, Postgres, Redis, Cassandra, Elasticsearch Experience with configuration management software such as Ansible or Chef Experience with a programming language such as Ruby, Elixir/Erlang, Go, or any JVM-based language Experience with designing and operating IP networks The Talkdesk story hinges on empathy and acceptance. It is the shared goal among all Talkdeskers to empower a new kind of customer hero through our innovative software solution, and we firmly believe that the best path to success for our mission is inclusivity, diversity, and genuine acceptance. To that end, we will hire, promote, work along, cheer for, bond with, and warmly welcome into the Talkdesk family all persons without regard to ethnic and racial identity, indigenous heritage, national origin, religion, gender, gender identity, gender expression, sexual orientation, age, disability, marital status, veteran status, genetic information, or any other legally protected status. #J-18808-Ljbffr