Site Reliability Engineer

2 weeks ago


Houston, United States INSPYR Solutions Full time
Title: Site Reliability Engineer (SRE)

Scroll down to find the complete details of the job offer, including experience required and associated duties and tasks.
Location: Houston, TX or Bartlesville, OK
Duration: 12 month+ contract
Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S.

Our Client is seeking a Site Reliability Engineer (SRE) to become a part of their growing Digital IT team focused on building an OpenShift/Kubernetes capability. The SRE will support the reliability of Digital IT/OT critical applications. This transformative role involves automating IT infrastructure tasks and driving SRE best practices, tools, and processes. The ideal candidate should exhibit a growth mindset and proactively monitor and work with application developers to respond to incidents for optimal user experience.

The candidate must have senior level experience deploying OpenShift on premises and supporting applications in Kubernetes. The ideal candidate will have experience in both on-prem OpenShift and Azure Kubernetes container platforms.

The successful candidate will possess strong infrastructure and developer background as well as interpersonal skills needed to communicate design requirements and objectives while providing thought leadership to peers and leadership.

Candidates should be self-motivated and collaborative IT professionals with a strong background in software development, systems administration and IT automation.

Responsibilities:

Maintaining survivability and reliability of IT/OT critical resources.
Write and build CI/CD pipelines and build/release processes for IT/OT workflow applications.
Provide mentoring to the IT/OT Devops team in the best practices associated with CI/CD deployments using ADO, and GIT.
Perform periodic load and scalability testing to establish baselines, drift, and capacity planning.
Conduct weekly operational state reviews covering performance trends, anomalies, errors, and other availability events with SREs, product owners, and development teams.
Participate in quarterly business and operational reviews aligning on roadmaps, development velocity, efficiency, growth trends, patching, etc.
Plan and execute periodic Disaster Recovery exercises including both tabletop and simulated failures (fault injection).
Required Qualifications
Candidates must have a bachelor's degree and 7 years of IT experience.
Senior level experience with OpenShift and Kubernetes.
Familiarity with continuous integration/deployment processes and tools such as IDEs (Eclipse), Source Code management. (GIT/Stash), ADO Pipelines, Maven, Nexus artifacts, etc.
Strong understanding of SRE practices: incident response, change/release management, capacity planning, infrastructure automation, elastic environments, chaos engineering and blameless postmortems.
Expertise in application performance monitoring, observability, and proactive alert correlation, including monitoring containers and failure-based alerting.
Scripting experience such as Python and Bash
Experienced in deploying applications in OpenShift in both public and private cloud.
Excellent written and oral communications skills
Demonstrated ability to communicate to nontechnical audience on technical issues.
Demonstrated ability to communicate on a technical level to a technical audience.
Strong interpersonal skills, adaptable and able to learn quickly.
Requires limited supervision and have excellent time management skills.
Self-motivated and self-starter.
Ability to work and interact with others in a structured/team environment.

Technology Stack Experience with at least one technology in each of the tech stack categories below:
Monitoring and Logging Tools(s): AppDynamics, Splunk, ELK Stack, DataDog, Prometheus, AWS CloudWatch/X-Ray, Grafana
Programming: C# .NET, PowerShell, Python, YAML
Containers: Docker, Helm Chart
OS: Linux – RHEL, Ubuntu, CentOS
Code Repos: Azure Repos, GitHub, GitLab
Infrastructure as code: Terraform, Ansible
Automation Tools: Ansible,Jenkins, Chef, Puppet
Agile: JIRA, SAFe

Desired Qualifications Experience in cloud/virtual technologies and management – OpenShift, VMware, AWS, Azure, etc.
Familiarity with security best practices for containerized applications.
Knowledge of DevOps practices and tools.
Knowledge, skills and abilities to automate the creation of Platform as a Services (PaaS) infrastructure using industry standard tools such as Ansible and Chef.
Familiarity with Industrial Control System (ICS) security architecture – Purdue model

Work Location: On-Site-Houston or Bartlesville

Our benefits package includes: Comprehensive medical benefits
Competitive pay
401(k) retirement plan
…and much more

About INSPYR Solutions
Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.

INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities

  • Houston, United States CV Library Full time

    Job title: Site Reliability EngineerLocation: Houston, TX 77002Project Duration: 12 MonthsJob Summary:Site Reliability Engineer with 5+ years of experience in the IT industry and expertise in designing, developing, testing, and implementation of software applications.Experienced in various web application development frameworks and technologies such as...


  • Houston, United States Jpmorgan Chase Full time

    Elevate your engineering prowess to unprecedented levels by joining a team of exceptionally gifted professionals and position yourself among the top echelon in site reliability. As a Senior Lead Site Reliability Engineer at JPMorgan Chase within the Enterprise technology, Market risk team, you work with your fellow stakeholders to define non-functional...


  • Houston, United States JPMorgan Chase & Co. Full time

    There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the CORPORATE SECTOR within INFRASTRUCTURE PLATFORMS , you will solve complex...


  • Houston, Texas, United States Fintex Holdings Inc Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer to join our team at Fintex Holdings Inc. This is an exciting opportunity to work with a comprehensive platform that helps people save money, customize their investments based on their risk tolerance and savings goals, and earn bigger returns.About the RoleAs a Site Reliability Engineer,...

  • EHSS Site Manager

    1 month ago


    Houston, United States Allied Reliability Full time

    QualificationsResponsible for keeping the plant current and in compliance with all EHS&S regulations (including but not limited to Clean Air Act, RCRA, Clean Water Act, DHS, DOT, etc.)Schedule or self-perform activities like HAZWOPER, fit testing, lift plans, excavation checklists, Job Hazard Assessments (JHAs), etcMinimum of a BS degree in EHS, Industrial...

  • EHSS Site Manager

    1 month ago


    houston, United States Allied Reliability Full time

    QualificationsResponsible for keeping the plant current and in compliance with all EHS&S regulations (including but not limited to Clean Air Act, RCRA, Clean Water Act, DHS, DOT, etc.)Schedule or self-perform activities like HAZWOPER, fit testing, lift plans, excavation checklists, Job Hazard Assessments (JHAs), etcMinimum of a BS degree in EHS, Industrial...

  • EHSS Site Manager

    1 month ago


    houston, United States Allied Reliability Full time

    QualificationsResponsible for keeping the plant current and in compliance with all EHS&S regulations (including but not limited to Clean Air Act, RCRA, Clean Water Act, DHS, DOT, etc.)Schedule or self-perform activities like HAZWOPER, fit testing, lift plans, excavation checklists, Job Hazard Assessments (JHAs), etcMinimum of a BS degree in EHS, Industrial...

  • EHSS Site Manager

    4 weeks ago


    Houston, United States Allied Reliability Full time

    QualificationsResponsible for keeping the plant current and in compliance with all EHS&S regulations (including but not limited to Clean Air Act, RCRA, Clean Water Act, DHS, DOT, etc.)Schedule or self-perform activities like HAZWOPER, fit testing, lift plans, excavation checklists, Job Hazard Assessments (JHAs), etcMinimum of a BS degree in EHS, Industrial...


  • Houston, Texas, United States Channel Personnel Services Full time

    Job OverviewAt Channel Personnel Services, we are seeking a highly skilled Machinery Reliability Engineer to join our team. This role is part of the Reliability Group, supporting plant operation and reliability improvement efforts. As a key member of the team, you will be responsible for implementing reliability best practices, developing and optimizing...


  • Houston, United States Aryan Solutions Pte Ltd Full time

    Aryan Solutions is headquartered in Singapore with offices in KL (Malaysia), Lucknow (India), and Dubai (UAE). We have a team of highly experienced Consultants working across the Region covering Singapore, USA, UK, Malaysia, India, Dubai, Indonesia, Hong Kong & Australia. The Role Job Description Key Responsibilities: Design and implement resilient system...

  • Reliability Engineer

    2 weeks ago


    Houston, United States Channel Personnel Services Full time

    Job DescriptionJob Description·         Work with Operations and Maintenance counterparts to solve day-to-day problems, ensuring safe and reliable unit operation.·         Participate in production team meetings to maintain awareness of key activities and unit operating status.·         Conduct routine field walkthroughs with...


  • Houston, Texas, United States SailPoint Technologies Full time

    SailPoint Technologies is seeking a seasoned Site Reliability Engineering (SRE) Senior Manager to spearhead the reliability and operational excellence of our engineering organization. As a pivotal member of our team, you will lead a group of SREs in setting and enforcing standards, providing guidance, and driving a culture of service ownership across all...


  • Houston, Texas, United States HS Solutions Inc Full time

    Reliability Assurance Engineer Position at HS Solutions IncThis position comes with an estimated salary of $130,000 - $200,000 per year.About the RoleWe are seeking a highly skilled Reliability Assurance Engineer to join our team at HS Solutions Inc. This is an exciting opportunity to ensure the reliability and manufacturability of our electronic...

  • Reliability Engineer

    2 months ago


    Houston, United States ECOVYST CATALYST TECHNOLOGIES LLC Full time

    Job DescriptionJob DescriptionEcovyst is a materials science, catalyst, and services company dedicated to creating innovative technologies that play a critical role in supporting ecological health and help our customers solve complex challenges.Our cutting-edge solutions span across two industry-leading businesses, Ecoservices andAdvanced Materials and...


  • Houston, Texas, United States Channel Personnel Services Full time

    Job DescriptionA leading global chemical company requires an experienced I/E Reliability Engineer for its petrochemical production facility.Duties and ResponsibilitiesProvide engineering support to the site I&E maintenance team.Lead root cause failure analysis and incident investigation teams.Develop detailed work scopes for engineered projects.Promote safe...


  • Houston, Texas, United States Channel Personnel Services Full time

    Channel Personnel Services is seeking a skilled Electrical Reliability Specialist to join its team. This exciting opportunity offers a competitive salary of $120,000 - $150,000 per year, making it an attractive choice for professionals in the field.About the RoleWe are looking for an experienced I/E Reliability Engineer to maintain and improve the safety and...


  • houston, United States Eco Services Operations, LLC Full time

    Ecovyst is a materials science, catalyst, and services company dedicated to creating innovative technologies that play a critical role in supporting ecological health and help our customers solve complex challenges.   Our cutting-edge solutions span across two industry-leading businesses, Ecoservices and Advanced Materials and Catalyst.   ...


  • Houston, United States Southern Recruiting Solutions, Inc. Full time

    Responsibilities:The I&E Reliability Engineer will lead and /or serve on root cause failure analysis and incident investigation teams.Develop detailed work scopes for engineered projects.Develop maintenance work procedures and preventive maintenance procedures.Working knowledge of power system analysis and protective relays.Investigate instrument failures...


  • Houston, United States Southern Recruiting Solutions Full time

    Responsibilities:The I&E Reliability Engineer will lead and /or serve on root cause failure analysis and incident investigation teams.Develop detailed work scopes for engineered projects.Develop maintenance work procedures and preventive maintenance procedures.Working knowledge of power system analysis and protective relays.Investigate instrument failures...


  • Houston, Texas, United States Channel Personnel Services Full time

    We are seeking a skilled Instrument Electrical Reliability Engineer to join our team at Channel Personnel Services.This role plays a crucial part in maintaining and enhancing the safety and reliability of electrical distribution equipment and instrumentation across our site.Duties and Responsibilities:Provide technical expertise to the site's I&E maintenance...