Site Reliability Engineer, Security

3 weeks ago


San Francisco, United States Okta Full time
OktaOkta's Workforce and Customer Identity Clouds enable secure access, authentication, and automation—putting identity at the heart of business security and growth.

Get to know Okta

Okta is The World’s Identity Company. We free everyone to safely use any technology—anywhere, on any device or app. Our Workforce and Customer Identity Clouds enable secure yet flexible access, authentication, and automation that transforms how people move through the digital world, putting Identity at the heart of business security and growth.

At Okta, we celebrate a variety of perspectives and experiences. We are not looking for someone who checks every single box - we’re looking for lifelong learners and people who can make us better with their unique experiences.

The Security Engineering Team

Our Infrastructure Security team has a niche skill-set that balances Security domain expertise with the ability to design, implement, rollout infrastructure across multiple cloud environments without adding friction to product functionality or performance. We are responsible for the ever-growing need to improve our customer safety and privacy by providing security services that are coupled with the core Okta product.

We embrace innovation and pave the way to transform bright ideas into excellent security solutions that help run large-scale, critical infrastructure. We encourage you to prescribe defense-in-depth measures, industry security standards and enforce the principle of least privilege to help take our Security posture to the next level.

The SRE Opportunity

Okta’s Workforce Identity Cloud Security Engineering group is looking for an experienced and passionate Site Reliability Engineer to join a team focused on designing and developing Security solutions to harden our cloud infrastructure.

This is a high-impact role in a security-centric, fast-paced organization that is poised for massive growth and success. You will act as a liaison between the Security org and the Engineering org to build technical leverage and influence the security roadmap. You will focus on engineering security aspects of the systems used across our services. Join us and be part of a company that is about to change the cloud computing landscape forever.

What you’ll be doing

  • Building, running, and monitoring Okta's production infrastructure
  • Be an evangelist for security best practices and also lead initiatives/projects to strengthen our security posture for critical infrastructure
  • Responding to production incidents and determining how we can prevent them in the future
  • Triaging and troubleshooting complex production issues to ensure reliability and performance
  • Identifying and automating manual processes
  • Continuously evolving our monitoring tools and platform
  • Promoting and applying best practices for building scalable and reliable services across engineering
  • Developing and maintaining technical documentation, runbooks, and procedures
  • Supporting a 24x7 online environment as part of an on-call rotation

What we are looking for

  • Are always willing to go the extra mile: see a problem, fix the problem.
  • 2+ years experience automating, securing, and running large-scale production IAM and containerized services in AWS (EC2, ECS, KMS, Kinesis, RDS), GCP (GKE, GCE) or other cloud providers.
  • 2+ years of experience with configuration management tools like Chef and Terraform.
  • Have experience in operational tooling languages such as Ruby, Python, Go and shell, and use of source control.
  • Strong knowledge of CI/CD principles, Linux fundamentals, OS hardening, networking concepts, and IP protocols.
  • Experience with industry-standard security tools like Nessus, Qualys, OSQuery, Splunk, etc.
  • Experience with Public Key Infrastructure (PKI) and secrets management
  • Unflappable troubleshooting skills
  • Security background and knowledge.
  • BS in computer science (or equivalent experience).

Bonus Experience

  • Experience conducting threat assessments, and assessing vulnerabilities in a high-availability setting.
  • Understand MySQL, including replication and clustering strategies, and are familiar with data stores such as DynamoDB, Redis, and Elasticsearch.

Additional requirements:

  • This position requires the ability to access federal environments and/or have access to protected federal data. As a condition of employment for this position, the successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire.
#J-18808-Ljbffr

  • San Francisco, United States WEX Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • San Francisco, California, United States WEX Full time

    Job SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...


  • San Francisco, California, United States Swish Analytics Full time

    {"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...


  • San Jose, United States EVONA Full time

    Site Reliability Engineer (SRE)Location: San Francisco Bay AreaRole Overview:We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation...


  • San Francisco, California, United States Hinge Health Full time

    About the RoleHinge Health is seeking a skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and scalability of our platform, including automation, logging, monitoring, and alerting.You will thrive in a collaborative environment, have excellent communication skills, and be...


  • San Francisco, United States WEX, Inc. Full time

    About the RoleThe WEX Site Reliability Engineering (SRE) team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits Reliability organization which supports our internal...


  • San Francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • san francisco, United States New York Technology Partners Full time

    Must Have's in the order of preference.Typical Java/J2EE experience between 6 and 10 yearsApplication Production Support(SRE - Site Reliability Engineering) with 3+ years - Preferably in e-commerce domainHands-on experience in any of the UI Frameworks(AngularJS, VueJS etc) - 1+ years


  • San Francisco, United States WEX Full time

    About the Role The WEX Site Reliability Engineering (SRE) team is seeking a Senior Staff SRE who is passionate about developing software and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits Reliability organization which supports our internal...


  • San Francisco, United States Focal Systems Full time

    Location: San Francisco - hybrid (1-2 days per week)Salary: $165-175k + stock Company Description Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar...


  • San Francisco, California, United States Arbitrum Inc Full time

    Reliability EngineerAt Arbitrum Inc, we're on a mission to bring blockchain to a billion people. Our developer platform is designed to make building on the blockchain easy, and we're looking for a skilled Reliability Engineer to join our Infrastructure team.As a Reliability Engineer, you'll collaborate with our engineering team to design, deploy, and...


  • San Francisco, California, United States TBWA\Chiat\Day Full time

    Job Title:Senior Site Reliability Engineer with Perplexity AIJob Summary:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Perplexity AI. As a key member of our infrastructure team, you will be responsible for designing, implementing, and scaling our cloud infrastructure to support our AI-powered search...


  • San Francisco, United States Federal Reserve Bank of San Francisco Full time

    Company: Federal Reserve Bank of San FranciscoJob Description:While the SF Fed is a Reserve Bank, we're not what you might expect. We're unreserved here. That means we seek new and diverse perspectives. We spark conversations and encourage debate. We build opportunity. We pursue careers that are true to ourselves. We are looking for people who want to help...


  • San Francisco, California, United States Aitopics Full time

    About the RoleWe are seeking a highly skilled Staff Site Reliability Engineer to join our Data Engineering team. As a key member of our team, you will be responsible for maintaining and enhancing the reliability of our data infrastructure.Your work will directly impact the availability and performance of our data services, enabling the organization to make...


  • San Francisco, United States CV Library Full time

    JOB TITLE: Staff SRETOP 3 SKILLS:GoLangKubernetesRubyLOCATION: RemoteDURATION: Direct HireRATE RANGE: $160-180KSUMMARY:We're looking for a driven software engineer who cares deeply about their craft, and who wants to use their skills to bring about positive change in the world while working in a high performing organization using modern software development...


  • San Francisco, United States OpenAI Full time

    Site Reliability Engineer, Enterprise IAMOpenAI's IT organization supports the mission of deploying artificial general intelligence (AGI) for the benefit of all. Our team is committed to providing seamless technological support and solutions to ensure that all OpenAI employees are well–equipped and connected. This enables them to contribute effectively...