Site Reliability Engineer/

4 days ago


San Jose, United States PDSSOFT INC. Full time

Site Reliability Engineer (SRE) / AWS DevOps Engineer


Location: San Jose,CA




Duration: Long Term


Job Description:


  • We are seeking a highly skilled Site Reliability Engineer (SRE) with expertise in GitHub Actions, AWS DevOps, Helm Charts, and YAML configuration. The ideal candidate will be responsible for ensuring the reliability, scalability, and efficiency of our cloud-based applications. You will work closely with development teams to implement and manage automation processes, infrastructure, and deployment strategies.
  • Key Responsibilities
  • • Develop and maintain CI/CD pipelines using GitHub Actions to streamline the software development lifecycle.
  • • Design, deploy, and manage AWS infrastructure, ensuring high availability and security.
  • • Implement and manage Helm Charts for Kubernetes to automate the deployment of applications.
  • • Utilize YAML configuration files for defining and managing infrastructure and application settings.
  • • Apply SRE principles to enhance system reliability, performance, and capacity through automation and monitoring.
  • • Collaborate with development teams to integrate reliability and scalability into the software development process.
  • • Monitor application and infrastructure performance, troubleshoot issues, and implement solutions to improve system reliability.
  • • Implement infrastructure as code (IaC) using tools like Terraform or CloudFormation for efficient resource management.
  • Required Skills and Qualifications
  • • Proven experience in Site Reliability Engineering (SRE) practices.
  • • Strong expertise in GitHub Actions for CI/CD pipeline development.
  • • Working experience with AWS services, including EC2, S3, Lambda, RDS, and VPC.
  • • Proficiency in Helm Charts for Kubernetes deployment and management.
  • • Strong knowledge of YAML for configuration management.
  • • Experience with infrastructure as code tools such as Terraform or CloudFormation.
  • • Some Exp in scripting and automation using languages such as Python, Bash, or PowerShell.
  • • Understanding of containerization technologies like Docker and orchestration with Kubernetes.
  • • Excellent problem-solving skills and ability to work collaboratively in a fast-paced environment.
  • • Strong communication and collaboration skills.
  • Preferred Qualifications
  • • AWS Certified Solutions Architect or AWS Certified DevOps Engineer certification.



  • San Jose, California, United States HireIO Inc Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at HireIO Inc. As a Site Reliability Engineer, you will be responsible for designing and developing solutions to automate the technical operations of large-scale systems, working closely with teams to improve stability from a Software Development Lifecycle...


  • San Jose, United States NInfo Systems, Inc. Full time

    Company DescriptionNInfo Systems Inc. is a Certified minority-owned national IT Recruiting and Solutions provider with two decades of experience. It works with Fortune 500 corporations, mid-sized companies, Boutique Consulting companies, startups, SME-level organizations, Federal/ State agencies, and tier-one vendors.Role: Senior Reliability Engineer, Hybrid...


  • San Jose, California, United States Tik Tok Full time

    About Team Site Reliability Engineering at TikTokTikTok's mission is to inspire creativity and bring joy. Our platform is built to help imaginations thrive, and our Site Reliability Engineering team plays a crucial role in making this happen.ResponsibilitiesDesign and implement software platforms and monitor frameworks for efficient, automated, and...


  • san jose, United States Altimetrik Full time

    We are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...


  • san jose, United States Altimetrik Full time

    We are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...


  • San Jose, United States Altimetrik Full time

    We are looking to hire a Site reliability EngineerEducational Background: Holds a bachelor’s or master’s degree in computer science, information technology, or a related technical field. Alternatively, significant work experience in DevOps or cloud infrastructure management could offset a formal degree requirement.Cloud Infrastructure Expertise: Has at...


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability Engineer, AI Platform TrainingJob Summary: We are seeking a highly skilled Site Reliability Engineer to join our team at Adobe. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, scalability, and security of our AI Platform.About the Role:* Identify and implement methodologies and solutions to...


  • San Leandro, California, United States Omni Inclusive Full time

    About the Role:We are seeking a highly skilled Site Reliability Engineer to join our team at Omni Inclusive. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and availability of our Digital Sales & Marketing platforms.Key Responsibilities:Design, implement, and maintain scalable and efficient systems to...


  • San Francisco, California, United States Withorb Full time

    About UsOrb is a cutting-edge technology company on a mission to revolutionize the way businesses approach revenue growth. Our team is passionate about building a robust infrastructure that enables our customers to unlock their full potential.Job DescriptionWe are seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our...


  • San Francisco, United States WEX Full time

    The WEX Site Reliability Engineering (SRE) team is seeking an entry-level Site Reliability Engineer Level 1 who is passionate about learning and growing in the field of software development and solutions focused on observability, incident response, reliability and performance, operational excellence, and compliance. The team will be part of the Benefits...


  • San Francisco, California, United States Outdefine Full time

    About the JobWe are seeking a highly skilled Site Reliability Engineer to join our team at Outdefine. As a key member of our engineering team, you will be responsible for ensuring the reliability, scalability, and performance of our ecommerce platform.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure using Kubernetes...


  • San Jose, California, United States Tik Tok Full time

    Job SummaryTikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. As a Site Reliability Engineer on our Compute Platform team, you will play a critical role in ensuring the reliability of all Big Data services and products across the company.Key Responsibilities Responsible for the reliability of...


  • San Francisco, California, United States Roman Health Pharmacy LLC Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Xero. As a key member of our Reliability Enablement team, you will play a critical role in ensuring the reliability and performance of our systems.Key ResponsibilitiesInvestigate operational surprises and support teams in post-incident activitiesConduct in-depth...


  • San Jose, California, United States Adobe Full time

    Job Title: Site Reliability Engineering Manager, AI PlatformAbout the Role:We are seeking an experienced Site Reliability Engineering Manager to lead our AI Inference Platform team at Adobe. As a key member of our Engineering organization, you will be responsible for developing and implementing strategies to ensure the reliability, scalability, and security...


  • San Francisco, California, United States Swish Analytics Full time

    {"h1": "Site Reliability Engineer at Swish Analytics"} Swish Analytics is a sports analytics and betting startup that's revolutionizing the industry with cutting-edge predictive data products. We're on a mission to make oddsmaking a challenge rooted in engineering, mathematics, and sports betting expertise, not intuition. We're looking for a team-oriented...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • San Francisco, United States Ellation, Inc. Full time

    Who We AreWe‘re a cast of characters working to shine a spotlight on anime. Crunchyroll is an international business focused on creating both online and offline experiences for fans through content (licensed, co-produced, originals, distribution), merchandise, events, gaming, news, and more. Visit our About Us pages for more information about our...


  • san jose, United States Triune Infomatics Inc Full time

    Role: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...


  • san jose, United States Triune Infomatics Inc Full time

    Role: Senior Site Reliability ManagerFull-Time - HybridLocal to San Jose, CAThe Client is a simple and scalable cloud-based IoT edge orchestration solution that delivers visibility, control, and security for the distributed edge. Their platform allows customers to seamlessly manage and deploy any compute node, unlocking the value of IoT data, enabling...


  • San Francisco, California, United States WEX Full time

    Job SummaryThe WEX Site Reliability Engineering team is seeking a highly motivated and quick-learning individual to join our team as a Site Reliability Engineer Level 1. As a key member of our team, you will be responsible for ensuring the reliability, performance, and security of our systems.Key Responsibilities:Actively participate in training and...