Current jobs related to Site Reliability Engineer - San Jose - TCWGlobal


  • San Jose, United States Adobe Full time

    Site Reliability Engineer page is loadedAdobe’s Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. Adobe Sign is the fastest, and easiest way to get contracts signed and filed.You have a track record as a site reliability engineer in large-scale SaaS businesses, and a strong...


  • San Jose, California, United States Adobe Full time

    Site Reliability Engineer page is loadedAdobe's Reliability Engineering team is looking for a Site Reliability Engineer (SRE) to help build and operate services like Adobe Sign. Adobe Sign is the fastest, and easiest way to get contracts signed and filed.You have a track record as a site reliability engineer in large-scale SaaS businesses, and a strong...


  • San Jose, United States Trianz Full time

    Job Description Role: Site Reliability Engineer Employment Type: Contract – Only VISA FREE Work location: Sanjose, CA Work mode: Onsite- 2 days in a week / 3 days Remote About the Role We seek a highly skilled and dynamic Site Reliability Engineer – Consultant. In this role you will: Maintain and improve the reliability, performance, and availability of...


  • San Jose, United States F5 Full time

    F 5 Inc. is actively seeking an exceptional Senior Site Reliability Engineer to play a pivotal role in our SRE team for the groundbreaking F 5 Distributed Cloud Product. Due to the nature of work this role requires US Citizenship. Primary Responsibil Reliability Engineer, Liability, Engineer, Reliability, Reliability, Technology, Support


  • San Jose, United States Zscaler Full time

    Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as a leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Zscaler Full time

    About ZscalerAt Zscaler, our Engineering team has developed the largest cloud security platform globally, and we continue to innovate. With over 100 patents and ambitious plans for service enhancement and global expansion, our team has established us as the leader in cloud security, serving more than 15 million users across 185 countries. We invite you to...


  • San Jose, California, United States Zscaler Full time

    About UsZscaler has developed the world's largest cloud security platform, continually innovating and expanding our services. With a robust portfolio of over 100 patents and ambitious plans for global growth, our team has established itself as a leader in cloud security, serving more than 15 million users across 185 countries. We are looking for talented...


  • San Jose, United States Zscaler Full time

    Our Engineering team built the world's largest cloud security platform from the ground up, and we keep building. With more than 100 patents and big plans for enhancing services and increasing our global footprint, the team has made us and our multitenant architecture today's cloud security leader, with more than 15 million users in 185 countries. Bring your...


  • San Jose, United States VDart Inc Full time

    Job DescriptionJob DescriptionJob Title: Lead Site Reliability EngineerLocation: San Jose, CA (2 Days Hybrid)Duration: / Term: 6+ monthsJob Description:Experience Desired: 14+ Years. Responsibilities:Please look for 14 years hands on Coding/scripting (Ansible) , Python , Cloud Computing About the Role We seek a highly skilled and dynamic Site Reliability...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this pivotal role, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Jose, California, United States VDart Inc Full time

    Job OverviewPosition: Lead Site Reliability EngineerLocation: San Jose, CA (Hybrid Work Model)Contract Duration: 6+ monthsExperience Required: 14+ YearsRole Summary:We are in search of a highly experienced and proactive Site Reliability Engineer Consultant. In this capacity, you will be responsible for:Key Responsibilities:Enhancing the reliability,...


  • San Jose, United States TCWGlobal Full time

    Site Reliability Engineer (Kubernetes)*US citizenship or Greencard holder- W2 ContractSan Jose, CA 95134 ( LOCAL CANDIDATES ONLY- MUST BE LIVING IN SAN JOSE, CA)$80-110hr (Weekly pay + benefits)6 month contract (Excellent potential for extension)Full-time: M-F 8am-5pm (Onsite 2 days a week)***Please note: This role is only accepting candidates that currently...


  • San Jose, United States TCWGlobal Full time

    Job DescriptionJob DescriptionSite Reliability Engineer (Kubernetes)*US citizenship or Greencard holder- W2 ContractSan Jose, CA 95134 ( LOCAL CANDIDATES ONLY- MUST BE LIVING IN SAN JOSE, CA)$80-110hr (Weekly pay + benefits)6 month contract (Excellent potential for extension)Full-time: M-F 8am-5pm (Onsite 2 days a week)***Please note:This role is only...


  • San Jose, United States TCWGlobal Full time

    Job DescriptionJob DescriptionSite Reliability Engineer (Kubernetes)*US citizenship or Greencard holder- W2 ContractSan Jose, CA 95134 ( LOCAL CANDIDATES ONLY- MUST BE LIVING IN SAN JOSE, CA)$80-110hr (Weekly pay + benefits)6 month contract (Excellent potential for extension)Full-time: M-F 8am-5pm (Onsite 2 days a week)***Please note:This role is only...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionJob DescriptionPosition Description:Location: Usa/Usa/California/Sf Bay Area, SeattleBase Salary: 187K - 280KSponsor Visa? YesLanguage Requirements: English, Mandarin (Preferred)Our Team:Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionJob DescriptionPosition Description:Location: Usa/Usa/California/Sf Bay Area, SeattleBase Salary: 187K - 280KSponsor Visa? YesLanguage Requirements: English, Mandarin (Preferred)Our Team:Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and...


  • San Jose, California, United States Hireio, Inc. Full time

    Exciting Opportunity: Data Infrastructure Site Reliability Engineering (SRE) TeamJoin Hireio, Inc., a premier platform for short-form mobile video hosting services. As a trailblazer in technology, our SRE team integrates software development with infrastructure management to architect, construct, and oversee extensive, highly distributed systems. We operate...


  • San Jose, United States Tik Tok Full time

    Responsibilities TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. TikTok has global offices including Los Angeles, New York, London, Paris, Berlin, Dubai, Singapore, Jakarta, Seoul and Tokyo. Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is...


  • San Francisco, United States PicnicHealth Full time

    [Full Time] Site Reliability Engineer at PicnicHealth (United States) Site Reliability Engineer PicnicHealth United States Date Posted: 10 Aug, 2023 Work Location: San Francisco, United States Salary Offered: $160 — $190 yearly Job Type: Full Time Experience Required: 6+ years Remote Work: Yes Stock Options: No Vacancies: 1 available Healthcare needs good...

Site Reliability Engineer

2 months ago


San Jose, United States TCWGlobal Full time

Site Reliability Engineer (Kubernetes)

W-2 Contract ( *US citizenship, Greencard holder or EAD; authorized to work in the US)

San Jose, CA 95134 (Hybrid **Local candidates)

$80-110hr ( Weekly pay + benefits)

6 month contract (Excellent potential for extension)

Full-time: M-F 8am-5pm ( Onsite 2 days a week)




***Please note: This role is only accepting candidates that currently live in San Jose, CA.



Our client is a cloud security company. They offer enterprise cloud security services for the worlds most established companies. Named a Best Workplace in Technology by Fortune and others, they fosters an inclusive and supportive culture that is home to some of the brightest minds in the industry. The team is looking for someone who can thrive in an environment that is fast-paced and collaborative, and you are passionate about building and innovating for the greater good.



About the Role:


We are seeking a skilled and experienced Site Reliability Engineer (SRE) to join our team. The primary focus of this role is to develop and maintain a comprehensive observability solution for our Kubernetes-based applications. The ideal candidate will be proficient in using various monitoring and logging tools to ensure the reliability and scalability of our services.


Key Responsibilities:


● Design and Implementation: Develop and implement observability solutions for Kubernetes based applications using Fluentbit, Cloud Watch, StackDriver, Grafana Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD.

● Monitoring and Logging: Configure and maintain logging pipelines using Fluentbit to collect, process, and route logs for storage and analysis.

● Metrics and Tracing: Set up Prometheus for metrics collection and Grafana Tempo for distributed tracing. Integrate these with Grafana for real-time monitoring and alerting via open telemetry.

● Telemetry: Utilize Open Telemetry to instrument applications for better traceability and observability.

● CI/CD: Use ArgoCD for continuous deployment and ensure observability tools are integrated into the CI/CD pipeline to deploy the observability suite.

● Observability Optimization: Analyze and optimize the performance of the observability stack to ensure minimal overhead and maximum efficiency.

● Troubleshooting: Proactively identify and resolve issues related to the observability infrastructure. Collaborate with development and operations teams to troubleshoot and resolve incidents.

● Documentation and Training: Document observability processes and best practices.

Provide training and support to other team members on the observability tools and techniques.




Required Qualifications:


  • 4+ yrs experience as a Site reliability engineer, Product Reliability or similar in Kubernetes environment
  • W-2 Contract ( *US citizenship, Greencard holder or EAD - authorized to work in the US)
  • ***Please note: This role is only accepting candidates that currently live in San Jose, CA.
  • Experience with observatory stacks in Kubernetes in multiple workloads
  • Experience implementing stacks in Kubernetes
  • Understanding of API gateway files
  • Experience running multiple app stacks
  • Understanding of SLI (Service level indicators) in product teams
  • Experience in Telemetry: Utilize Open Telemetry to instrument applications for better traceability and observability.
  • Experience with strong focus on observability in Kubernetes environments supporting applications in EKS in AWS.
  • Kubernetes: In-depth knowledge of Kubernetes and container orchestration.
  • Experience to develop and maintain a comprehensive observability solution for our Kubernetes-based applications.
  • Technologies: Hands-on experience with Fluentbit, Cloud Watch, StackDriver, Grafan

Loki, Grafana Tempo, Prometheus, Envoy Health Probes, Open Telemetry, and ArgoCD.

  • Scripting and Automation: Proficiency in scripting languages such as Python, Bash, or similar for automation tasks.
  • Monitoring and Logging: Strong understanding of monitoring, logging, and tracing concepts and best practices.
  • Collaboration: Strong communication skills and the ability to work effectively in a team environment.


Bonus Qualifications:


  • Certifications: Relevant certifications such as Certified Kubernetes Administrator
  • (CKA) or Certified Kubernetes Application Developer (CKAD)
  • Cloud Platforms: Experience with cloud platforms such as AWS and EKS.
  • DevOps Practices: Familiarity with DevOps practices and tools.



Please send your resume. Thank you