Site Reliability Engineer

2 months ago


Austin, United States Computer Futures Full time
Position Summary: We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our client in Austin. The ideal candidate will have a strong background in infrastructure as code (IaC), automation, container orchestration, and monitoring solutions. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our applications and infrastructure.

Key Responsibilities:
  • Infrastructure as Code (IaC):
    • Design, implement, and manage IaC templates using AWS CloudFormation to provision and configure AWS resources (EC2 instances, VPCs, RDS databases, IAM policies).
  • Configuration Management:
    • Configure Ansible to manage AWS environments and automate the build process for core AMIs used by all application deployments.
  • CI/CD Pipelines:
    • Design and implement scalable and automated CI/CD pipelines using Jenkins/AWS CodePipeline and Bitbucket/AWS CodeCommit.
  • Container Orchestration:
    • Orchestrate containerized workloads using Docker, improving resource utilization and application scalability.
  • Monitoring and Alerting:
    • Implement robust monitoring and alerting solutions using Sumologic, NewRelic, and AWS CloudWatch stack for proactive identification and resolution of system issues.
  • Collaboration and Performance Tuning:
    • Collaborate with development teams to optimize application performance and reliability through regular performance tuning and troubleshooting sessions.
  • Standardization and Mentorship:
    • Lead initiatives to standardize DevOps practices across multiple teams, facilitating knowledge sharing and ensuring consistency in deployment processes.
    • Mentor junior team members on best practices, tools, and methodologies in the DevOps domain.
  • On-Call and Incident Management:
    • Participate in on-call rotations, respond to incidents promptly, and implement preventive measures to mitigate future occurrences.
  • Scripting and Automation:
    • Utilize Bash/Python for scripting and automation to monitor, patch, and resolve issues related to the environments.
    • Develop monitoring tool deployment automation.
  • Complex Systems Diagnosis:
    • Diagnose complex systems issues with multiple influencing factors.
  • Feature Development and Support:
    • Work closely with the development team on upcoming features and assist the support team with escalated customer issues.
  • System Security and Administration:
    • Manage system security and admin credential administration.
  • Documentation:
    • Create and review documentation and processes related to recurring issues, new procedures, and knowledge transfer.
  • Agile Methodology:
    • Scope, plan, and execute utilizing Agile methodology and JIRA/Confluence tools.
  • AWS Management:
    • Configure AWS EC2 Instances using AMIs and launch instances with specific application requirements.
    • Create and manage REST APIs via AWS API Gateway.
    • Manage SSL Certificates.
    • Build, deploy, and manage AWS services (EBS, EC2, S3, ELB, ECS, VPC, RDS, etc.).
  • Security Solutions:
    • Implement security solutions by installing Qualys, CrowdStrike, and OSSEC.
    • Follow best practices to optimize cost reduction.
    • Ensure compliance with PCI-DSS standards.
Qualifications for Site Reliability Engineer:
  • Proven experience in designing, implementing, and managing infrastructure as code (IaC) using AWS CloudFormation.
  • Proficiency in Ansible for configuration management and automation.
  • Expertise in designing and implementing CI/CD pipelines using Jenkins, AWS CodePipeline, Bitbucket, and AWS CodeCommit.
  • Strong experience with Docker and container orchestration.
  • In-depth knowledge of monitoring and alerting solutions such as Sumologic, NewRelic, and AWS CloudWatch.
  • Experience in collaborating with development teams and optimizing application performance.
  • Leadership experience in standardizing DevOps practices and mentoring junior team members.
  • Ability to participate in on-call rotations and manage incident responses.
  • Proficiency in scripting/programming languages like Bash and Python.
  • Experience with AWS services, including EC2, VPC, RDS, and others.
  • Knowledge of system security and compliance standards, especially PCI-DSS.
  • Familiarity with Agile methodology and tools like JIRA and Confluence.
Preferred Qualifications:
  • AWS certifications (e.g., AWS Certified DevOps Engineer, AWS Certified Solutions Architect).
  • Experience with additional configuration management tools (e.g., Puppet, Chef).
  • Knowledge of additional monitoring tools and solutions.
  • Experience with additional cloud platforms (e.g., Google Cloud Platform, Microsoft Azure).

EOE Statement: Specialist Staffing Group is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or veteran status.

To find out more about Computer Futures, please visit

  • Austin, Texas, United States Apex Systems Full time

    Job DescriptionPosition: Site Reliability EngineerLocation: RemoteDuration: 1 yearRate: $67/hr W-2We are seeking a highly skilled Site Reliability Engineer to join our team at Apex Systems. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key...


  • Austin, Texas, United States Cape Henry Associates, Acquired by JANUS Research Group Full time

    Janus is looking for a seasoned Site Reliability Engineer / DevSecOps Developer to help grow our capability with our DoD clients.Develop Infrastructure as Code (IaC) designing, implementing, and maintaining infrastructure using IaC technologies(e.g. terraform or similar) ensuring scalable, reliable, and efficient platformsCollaborate with data and other...


  • Austin, United States JobRialto Full time

    Skills: 6+ years of experience in systems and platform operations and technology Experience with On Prem and Public Cloud - AWS, EKS Scripting languages like Python Linux Administration and Cloud, DevOps experience would be a plus Team As a member of the Site Reliability Engineering & Production Services team, you will work with other technology...


  • Austin, Texas, United States NinjaOne Full time

    About the RoleAt NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Site Reliability Engineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for...


  • Austin, Texas, United States Thales Full time

    About the RoleThales is seeking an experienced Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability, performance, and security of our cloud-based services.Key ResponsibilitiesCollaborate with project managers and service delivery managers to analyze traffic trends and capacity...


  • Austin, Texas, United States Expedia Group Full time

    Principal Site Reliability EngineerWe are looking for a highly qualified and seasoned Principal Site Reliability Engineer (SRE) to enhance our operations. The successful candidate will play a crucial role in guaranteeing the stability, scalability, and efficiency of our systems and services. You will collaborate closely with both development and operational...


  • Austin, Texas, United States Iodine Software Full time

    Director of Site Reliability Engineering Join us. Let's make a direct impact in healthcare. Being an Iodine employee means becoming part of something bigger: using clinical AI echnology to drive smarter healthcare processes and positively impact patient care. Who we are: Iodine is an enterprise AI company that is championing a radical rethink of how to...


  • Austin, United States Visa Full time

    Company Description Visa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • Austin, Texas, United States Expedia Group Full time

    Principal Software Development Engineer - Site ReliabilityWe are looking for a highly proficient and seasoned Principal Software Development Engineer (SRE) to enhance our team. The successful candidate will be accountable for maintaining the reliability, scalability, and performance of our systems and services. You will collaborate closely with both...


  • Austin, United States Thales USA, Inc. Full time

    Location: Austin, United States of America. Thales people architect identity management and data protection solutions at the heart of digital security. Business and governments rely on us to bring trust to the billons of digital interactions they hav Reliability Engineer, Liability, Reliability, Engineer, Reliability, Monitoring


  • Austin, Texas, United States Apple Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineering Manager to join our Apple Service Engineering team. As a key member of our team, you will be responsible for establishing and maintaining the reliability and scalability of our cloud services.Key ResponsibilitiesLead a team of engineers in providing a platform for mission-critical...


  • Austin, Texas, United States NinjaOne Full time

    About the RoleAt NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Site Reliability Engineering Manager to join our Platform Engineering team and help us scale our products to millions of end-users. You will have the opportunity to build the SRE team from the ground up...


  • Austin, United States Cape Henry Associates, Acquired by JANUS Research Group Full time

    Janus is looking for a seasoned Site Reliability Engineer / DevSecOps Developer to help grow our capability with our DoD clients.Develop Infrastructure as Code (IaC) designing, implementing, and maintaining infrastructure using IaC technologies(e.g. terraform or similar) ensuring scalable, reliable, and efficient platformsCollaborate with data and other...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, Texas, United States Expedia Group Full time

    Principal Software Development Engineer - Site ReliabilityWe are in search of a highly qualified and seasoned Principal Software Development Engineer (SRE) to enhance our operations. The ideal candidate will be tasked with ensuring the dependability, scalability, and efficiency of our services and systems. You will collaborate closely with both development...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, United States Terminal Industries Full time

    About Us Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last...


  • Austin, TX, United States Visa Full time

    Company DescriptionVisa is a world leader in payments and technology, with over 259 billion payments transactions flowing safely between consumers, merchants, financial institutions, and government entities in more than 200 countries and territories each year. Our mission is to connect the world through the most innovative, convenient, reliable, and secure...


  • Austin, United States Expedia Group Full time

    Senior Software Development Engineer - Site Reliability  We are seeking a highly skilled and experienced Senior Software Development Engineer (SRE) to join our team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our services and systems. You will work closely with development and operations teams to...