Sr. SRE
1 week ago
Job Title: Sr. SRE/ Kubernetes Engineer
Location: San Francisco, CA (Hybrid – 2 days in-office) Must be currently local to SF Bay Area.
About the Role:
StratITech is seeking a Sr. Site Reliability Engineer / Kubernetes Engineer for our client based in San Francisco, CA. This is a full-time position offering competitive pay and stock options. We are only accepting applicants who are US Citizens or US Permanent Residents/Green Card Holders. No C2C or third-party applications will be considered. Must be local to the SF Bay Area, we do not do relocation.
In this hybrid role, you will be working two days a week in-office, as part of a dynamic team responsible for deploying, managing, optimizing, and upgrading the systems that support innovative software solutions.
This person must be excited about working in an interrupt-driven startup environment. The ideal candidate will be passionate about learning new technologies, solving complex problems, and embracing Infrastructure as Code (IaC) to automate infrastructure processes. Your role will involve collaborating closely with team members to address architectural challenges and ensure the reliability and efficiency of the client’s cloud infrastructure.
Implementation is key in this role, as you’ll be directly responsible for turning ideas into reliable and scalable solutions.
Key Responsibilities:
- Cloud Operations: Leverage DevOps principles to provide technical operational support, including production operational support, for cloud infrastructure operations for internal and external customers.
- Tool Development & CI/CD: Write CI/CD pipelines from scratch and build tools that support internal platforms, improving stability, reliability, and efficiency.
- Feature Flags & Modifications: Implement and manage feature flags, enabling or modifying features as necessary to support platform flexibility and customer requirements.
- Troubleshooting: Diagnose and resolve complex system problems across the entire technology stack, including CI/CD pipelines, container-based systems, networking, operating systems, cloud resources, and databases. Must have very strong troubleshooting skills.
- Monitoring & Alerting: Implement and manage monitoring and alerting infrastructure for critical services, ensuring stability and performance across all platform components.
- Automation & Runbooks: Create, revise, and test operational runbooks and automation scripts to maintain infrastructure efficiently and securely.
- Operational Innovation: Proactively seek opportunities for innovation to enhance operational processes, increasing reliability, availability, and performance while promoting a security-first culture.
- On-Call Support: Participate in an on-call rotation (7am-7pm, 7 days a week, every three weeks rotating) to support 24/7 operations and ensure system availability.
- Documentation: A willingness and desire to author technical documentation for design, workflows, processes, and best practices.
- External Customer Focus: Provide direct support for external customer requirements, ensuring that solutions align with customer needs and expectations.
- Quality & Security: Embody a Quality-first & Security-first culture in all that you do.
Must-Have Requirements:
- 5+ years of experience with Azure (or AWS/GCP) for cloud infrastructure.
- Strong experience with Terraform for infrastructure automation.
- Strong experience with Kubernetes in production.
- Proficiency in Helm for managing Kubernetes applications.
- 5+ years of coding experience in Python.
- Experience using Infrastructure as Code (IaC) and CI/CD tools like FluxCD, Jenkins, Terraform, or GitHub.
- Strong experience with Linux operating systems.
- Solid working knowledge of networking (TCP/IP, DNS) and cloud infrastructure performance.
- Operational experience with monitoring/alerting systems such as Sentry, Opsgenie, or Prometheus.
- Must have production operations and client-facing experience.
- Willingness to mentor junior team members and contribute to technical documentation for workflows and best practices.
- Hands-on problem-solver with the ability to balance risk and impact to customers.
These skills are a plus:
- Experience with elements of the current tech stack: FluxCD, Prometheus, Elasticsearch, Java, Kafka, Postgres, and Jenkins.
- Previous experience or a keen interest in industrial IoT, analytics, or manufacturing.
-
Sr. SRE
2 days ago
san francisco, United States Walter Bacon, LLC Full timeSr. SRE (Site Reliability Engineer), (Contract or Contract-to-hire)IDEAL CANDIDATE:10+ years of SRE experienceSupporting Very High-traffic, Mission Critical, Fintech.Hybrid. Work in San Francisco on Tuesdays and Fridays.Customer-facing skills. You might interact with some Clients.AWS, Splunk, APM tools, Monitoring Tools, Automation, Scripting, Python, Bash.
-
Sr. SRE
3 days ago
san francisco, United States Walter Bacon, LLC Full timeSr. SRE (Site Reliability Engineer), (Contract or Contract-to-hire)IDEAL CANDIDATE:10+ years of SRE experienceSupporting Very High-traffic, Mission Critical, Fintech.Hybrid. Work in San Francisco on Tuesdays and Fridays.Customer-facing skills. You might interact with some Clients.AWS, Splunk, APM tools, Monitoring Tools, Automation, Scripting, Python, Bash.
-
Sr. SRE
1 week ago
san francisco, United States Stratitech Services LLC Full timeJob Title: Sr. SRE/ Kubernetes EngineerLocation: San Francisco, CA (Hybrid – 2 days in-office) Must be currently local to SF Bay Area.About the Role:StratITech is seeking a Sr. Site Reliability Engineer / Kubernetes Engineer for our client based in San Francisco, CA. This is a full-time position offering competitive pay and stock options. We are only...
-
Sr. SRE
1 week ago
san francisco, United States Stratitech Services LLC Full timeJob Title: Sr. SRE/ Kubernetes EngineerLocation: San Francisco, CA (Hybrid – 2 days in-office) Must be currently local to SF Bay Area.About the Role:StratITech is seeking a Sr. Site Reliability Engineer / Kubernetes Engineer for our client based in San Francisco, CA. This is a full-time position offering competitive pay and stock options. We are only...
-
SRE with Linux
4 weeks ago
San Jose, United States Diverse Lynx Llc Full timePosition: SRE with Linux Location: RTP, NC / San Jose, CAType: Contract Job DescriptionSKILL: Docker, Kubernetes, Ansible, Python, Shell scripting,Linux, Extensive experience working with linux flavors like rhel/centos os, shells, filesystems and utilities.Knowledge of distributed computing and experience working with container orchestration frameworks...
-
Sr. Data Reliability Engineer
3 weeks ago
San Antonio, United States Solugenix Full timeSr. Data Reliability Engineer San Antonio, TX or Irvine, CA (Hybrid - 3 days onsite) Contract with potential to convert to FTE Job ID 24-09149Solugenix is assisting a client, a prestigious and large investment management company in their search for a Sr. Data Reliability Engineer.We are currently seeking a highly skilled Data Reliability Engineer to join our...
-
Sr. Data Reliability Engineer
2 months ago
San Antonio, United States Solugenix Full timeSr. Data Reliability Engineer San Antonio, TX or Irvine, CA (Hybrid - 3 days onsite) Contract with potential to convert to FTE Job ID 24-09149Solugenix is assisting a client, a prestigious and large investment management company in their search for a Sr. Data Reliability Engineer. We are currently seeking a highly skilled Data Reliability Engineer to join...
-
Senior Site Reliability Manager
4 weeks ago
San Jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
Senior Site Reliability Manager
1 month ago
San Jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
Senior Site Reliability Manager
1 month ago
San Jose, United States Triune Infomatics Inc Full timeRole: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...
-
AWS Cloud Architect
5 days ago
San Jose, California, United States McAfee Full timeJob Title:Sr. AWS Cloud & DevOps Architect - RemoteRole Overview:We are seeking a highly skilled and experienced Sr. AWS Cloud & DevOps Architect to join our dynamic team at McAfee. As a key member of our team, you will design, implement, and manage scalable, secure, and cost-effective cloud solutions using Amazon Web Services (AWS).About the Role:Cloud...
-
Sr. Software Engineer
1 week ago
San Francisco, United States OpenGov Full timeOpenGov is home to an exceptional team - passionate about our mission to power more effective and accountable government. By bringing the OpenGov Cloud to our nation's state and local government, we're transforming communities so they can thrive! While professional experience and qualifications are key for this role, make sure to check you have the...
-
Cloud Infrastructure Engineer
1 week ago
San Francisco, California, United States Stratitech Services LLC Full timeJob Title: Sr. SRE/Kubernetes EngineerLocation: San Francisco, CA (Hybrid – 2 days in-office)About the Role:StratITech Services LLC is seeking a highly skilled Sr. Site Reliability Engineer/Kubernetes Engineer to join our team in San Francisco, CA. This is a full-time position offering competitive pay and stock options. We are only accepting applicants who...
-
Senior Network Infrastructure Specialist
2 weeks ago
San Antonio, Texas, United States Élan Partners Full timeJob SummaryWe are seeking a highly skilled Sr. Network Engineer to join our team at Élan Partners. The ideal candidate will have advanced knowledge of firewall management, network switch configuration, and network architecture design. Key Responsibilities Maintain and support network infrastructures in data center and corporate environments Manage network...
-
Data Reliability Engineer
1 month ago
San Antonio, Texas, United States Solugenix Full timeSr. Data Reliability EngineerWe are seeking a highly skilled Data Reliability Engineer to join our team at Solugenix. Our client, a prestigious investment management company, requires a blend of expertise in data pipeline technologies and Site Reliability Engineering (SRE) principles to ensure the highest standards of data reliability and system...
-
Sr. Product Manager
2 weeks ago
SAN JOSE, United States NetApp Full timeAbout NetApp NetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people. If this sounds like something you want to be part of, NetApp is the...
-
Sr. Product Manager
2 weeks ago
San Jose, United States NetApp Full timeAbout NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people.If this sounds like something you want to be part of, NetApp is the...
-
AWS Cloud Architect
1 week ago
San Jose, California, United States McAfee Full timeJob Title:Sr. AWS Cloud & DevOps Architect - RemoteRole Overview:We are seeking a highly skilled and experienced Sr. AWS Cloud & DevOps Architect to join our dynamic team at McAfee. The ideal candidate will design, implement, and manage scalable, secure, and cost-effective cloud solutions using Amazon Web Services (AWS).Key Responsibilities: Design and...
-
Senior Data Reliability Engineer
3 weeks ago
San Antonio, Texas, United States Solugenix Corporation Full timeSr. Data Reliability EngineerWe are seeking a highly skilled Data Reliability Engineer to join our team at Solugenix Corporation. This role requires a blend of expertise in data pipeline technologies and Site Reliability Engineering (SRE) principles to ensure the highest standards of data reliability and system performance.Key Responsibilities:Design, build,...
-
Sr. Product Manager
3 weeks ago
San Jose, CA, United States NetApp Full timeAbout NetAppNetApp is the intelligent data infrastructure company, turning a world of disruption into opportunity for every customer. No matter the data type, workload or environment, we help our customers identify and realize new business possibilities. And it all starts with our people.If this sounds like something you want to be part of, NetApp is the...