Site Reliability Engineer

3 weeks ago


San Jose, United States HCLTech Full time

About HCLTech:

HCLTech is a global technology company, home to 221,000+ people across 60 countries, delivering industry-leading capabilities centered around digital, engineering and cloud, powered by a broad portfolio of technology services and products. We work with clients across all major verticals, providing industry solutions for Engineering Services, Manufacturing, Life Sciences and Healthcare, Technology and Services, Telecom and Media, Retail and CPG, and Public Services. To learn how we can supercharge progress for you, visit



Job Description:

  • 12+ years of proven experience in compute platform engineering with a focus on automation.
  • Experience with design and deployment of virtualization architectures, including VMware, Openshift or KubeVirt platforms.
  • Proven experience evaluating existing application architectures and identify opportunities for containerization to improve scalability, reliability, and efficiency.
  • Strong analytical skills with the ability to define and track key performance metrics.
  • Experience in developing tools for data analysis and performance profiling, Development with Terraform, Config Management tools.
  • Proficiency in programming languages such as Go and/or Python.
  • Experience with running large environments consisting of BareMetal, large scale virtualized environment with a mix of tens of thousands of VM’s and cloud infrastructure.

Ways to stand out from the crowd:

  • Deep understanding of other infrastructure components like Storage, DNS, AD, Security Tools etc..
  • Hands-on experience with cloud platforms such as AWS, Azure, or Google Cloud Platform.
  • Solid understanding of microservices architecture, infrastructure as code (IaC) and configuration management tools.
  • Understanding of AI ops and how to leverage LLMs to automate various optimization initiatives



  • San Jose, United States Myriad Consulting Inc Full time

    This role also open for junior (3+ yoe) candidates, and SRE lead (7+ yoe).Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and fault-tolerant systems. In our team, you ll have the opportunity to manage the complex challenges of scale, while using expertise in coding,...


  • San Jose, United States TEKsystems Full time

    Description: Adobe is looking for an experienced Site Reliability Engineer to join the internal tooling team support, configure, integrate, upgrade, and automate the use of enterprise tools used across their large Engineering organization. Role will be focused on user interaction, troubleshooting tickets, and maintaining servers. Skills: Linux,...


  • San Jose, United States TEKsystems Full time

    Description: Adobe is looking for an experienced Site Reliability Engineer to join the internal tooling team support, configure, integrate, upgrade, and automate the use of enterprise tools used across their large Engineering organization. Role will be focused on user interaction, troubleshooting tickets, and maintaining servers. Skills: Linux,...


  • San Jose, United States OKX Full time

    Who We Are OKX is revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems.We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology and to date, we have 50+ million users, 3000+ employees and 180+ countries believing in the same vision as us....


  • San Jose, United States OKX Full time

    Who We Are OKX is revolutionising world systems through our cutting-edge digital asset exchange, Web3 portal and blockchain ecosystems.We are deeply committed to shaping a fairer, more transparent and accessible society through blockchain technology and to date, we have 50+ million users, 3000+ employees and 180+ countries believing in the same vision as us....


  • San Diego, United States ObjectWin Technology Full time

    Job Title: Site Reliability Engineer Location: San Diego, CA or Remote in CA Duration: 6 Months Description: It is an exciting time to be part of SIE’s CICD and Cloud Site Reliability Engineering (SRE) team. SREs operate right at the intersection of Software Engineering and Infrastructure Engineering. The SRE team strives to make PlayStation highly...


  • San Jose, United States The Dignify Solutions LLC Full time

    AWS Infra SRE/DevOps engineer with proven work experience ensuring reliability, availability and performance of cloud infra and platform. - Specialist on Cisco Cloud run-on for infrastructure management, who can install, run, and maintain software like docker, and containers. - Responsible for upkeep, improvements, and configurations & migration of cloud...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionJob DescriptionPosition Description:Location: Usa/Usa/California/Sf Bay Area, SeattleBase Salary: 187K - 280KSponsor Visa? YesLanguage Requirements: English, Mandarin (Preferred)Our Team:Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionJob DescriptionPosition Description:Location: Usa/Usa/California/Sf Bay Area, SeattleBase Salary: 187K - 280KSponsor Visa? YesLanguage Requirements: English, Mandarin (Preferred)Our Team:Site Reliability Engineering(SRE) team combines software and systems engineering to build and run large-scale, massively distributed, and...


  • San Francisco, CA, United States Apollo Solutions Full time

    Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking artifical inteligence business who are making major developments in how we use AI/ML for gaming/security. They are working closely with government contracts as well as gaming consoles companys and are now searching for an SRE to join their growing team. The Site Reliability...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...


  • San Francisco, United States Patreon Full time

    Patreon is the best place for creators to build exclusive content and community for their fans. We enable creators (podcasters, writers, musicians, illustrators, etc) to connect with their fans directly and make money from their creative work. Creators can sell one-off items from their own shops or offer recurring monthly memberships with exclusive access to...


  • San Francisco, United States Pelago Full time

    Role Overview: At Pelago, we run a serverless architecture on AWS, with infrastructure managed using Terraform. Our system has been built to deliver our virtual clinic for Substance Use Management, and we are looking for a talented Site Reliability Engineer to join the engineering team supporting Pelago.As a HIPAA compliant, HITRUST certified organization it...


  • San Jose, United States HireIO Inc Full time

    About the company It is the leading destination for short-form mobile video. It is the largest Unicorn startup. It's the leader in short-form video hosting service now. It surpassed 1.3 billion mobile downloads in United States and 2 billion worldwide. With 1.5 billion monthly active users worldwide, it ranked one of the most popular social entertainment...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionAbout the companyIt is the leading destination for short-form mobile video. It is the largest Unicorn startup. It's the leader in short-form video hosting service now. It surpassed 1.3 billion mobile downloads in United States and 2 billion worldwide. With 1.5 billion monthly active users worldwide, it ranked one of the most...


  • San Jose, United States Hireio, Inc. Full time

    Job DescriptionJob DescriptionAbout the companyIt is the leading destination for short-form mobile video. It is the largest Unicorn startup. It's the leader in short-form video hosting service now. It surpassed 1.3 billion mobile downloads in United States and 2 billion worldwide. With 1.5 billion monthly active users worldwide, it ranked one of the most...


  • San Francisco, United States Apollo Solutions Full time

    Principal Site Reliability Engineer Apollo Solutions have partnered with a groundbreaking Fintech start-up backed by top tier venture capital. They are looking to significantly disrupt how we view, store and invest our personal finance and have already made significant waves in the industry. The Principal Site Reliability Engineer will be working closely...


  • San Diego, United States TalentBurst Full time

    SENIOR SITE RELIABILITY ENGINEER Location: San Diego, CA 92127 - 100% onsite (San Diego site preferred, open to other sites located in San Francisco 94107, San Mateo 94404, Los Angeles 90045 or Aliso Viejo 92656) Duration: 6 months **W2 Acceptable It is an exciting time to be part of Continuous Integration/Continuous Deployment (CI/CD) and Cloud Site...


  • San Diego, United States TalentBurst Full time

    SENIOR SITE RELIABILITY ENGINEER Location: San Diego, CA 92127 - 100% onsite (San Diego site preferred, open to other sites located in San Francisco 94107, San Mateo 94404, Los Angeles 90045 or Aliso Viejo 92656) Duration: 6 months W2 Acceptable It is an exciting time to be part of Continuous Integration/Continuous Deployment (CI/CD) and Cloud Site...


  • San Diego, United States TalentBurst Full time

    SENIOR SITE RELIABILITY ENGINEER Location: San Diego, CA 92127 - 100% onsite (San Diego site preferred, open to other sites located in San Francisco 94107, San Mateo 94404, Los Angeles 90045 or Aliso Viejo 92656) Duration: 6 months W2 Acceptable It is an exciting time to be part of Continuous Integration/Continuous Deployment (CI/CD) and Cloud Site...