Site Reliability Engineer

2 weeks ago


Houston, United States Imubit Full time
TL;DR:

Imubit is looking for a Site Reliability Engineer to help disrupt the refining and chemical industries with breakthrough machine learning technologies.

About us:

Imubit directly controls and optimizes refineries and chemical plants with AI to add millions of dollars to the plant bottom line while managing safe operating limits, energy efficiency, and sustainability objectives. Imubit's Closed Loop Neural Network platform allows customers to leverage an advanced form of AI called Reinforcement Learning (RL). Through our patented approach to apply RL for industrial processes, industry leaders have been able to fundamentally change the way they optimize their plants and improve profitability in real-time. Imubit's solution is currently optimizing the manufacturing facilities of Fortune-500 companies. Imubit has combined the industry expertise from companies like Exxon and Shell with award-winning data scientists endorsed by Google. Imubit is backed by tier-1 venture capital firms such as Insight Partners.

We are looking for:

You, a top-notch Site Reliability Engineer, who will design and support Imubit's cloud infrastructure. As part of this, you will work to optimize deployment processes and keep systems running. You will work with a variety of cloud technologies, automation, and infrastructure-as-code. Additionally, our SREs keep an ever-watchful eye on our systems capacity and performance. Much of our time is spent optimizing existing systems, building infrastructure and reducing repetitive work through automation.

You will also play a critical role in incident management, swiftly identifying and resolving issues to minimize downtime and ensure seamless operations. Collaboration is key in this role, as you will work closely with software developers, DevOps engineers, and other stakeholders to implement robust solutions and drive continuous improvement. As a proactive member of our team, you will stay updated with the latest industry trends and best practices, applying this knowledge to enhance our infrastructure's resilience and scalability. Your contributions will directly impact the reliability and efficiency of our services, making you an integral part of our success.

In this position, you will:
  • Design, deploy and maintain Imubit's cloud infrastructure to provide high uptime, scalability and security.
  • Leverage public cloud services and tools to improve efficiency and reliability of our services and workflows.
  • Architect and manage cross-cloud network infrastructure (e.g. subnets, routing tables, IPSec VPNs, Transit Gateways, firewall rules).
  • Engage in and improve the whole lifecycle of services, from inception and design, through deployment, operation and refinement.
  • Participate in infrastructure on-call rotation and respond in a timely manner.
  • Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Minimum Qualifications:
  • 5 years experience maintaining production level cloud infrastructure, including public cloud services (e.g. AWS, GCP).
  • Preferred BA/B.Sc. in Computer Science or equivalent
  • Experience with a programming language such as Python or Go.
  • Experience deploying and supporting services in Kubernetes, including GitOps management tools such as ArgoCD.
  • Familiarity with software development principles/concepts (e.g. Version control (Git), software development lifecycle).
  • Experience implementing and utilizing monitoring tools (e.g New Relic, Splunk, Grafana, Prometheus).
  • Experience managing production databases (e.g. PostgreSQL), including managed services (e.g. AWS RDS).
  • Experience with Infrastructure-as-code concepts and tools (e.g. Terraform, Ansible)
  • Experience with secrets management tools (e.g. HashiCorp Vault, AWS Secrets Manager)
  • Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Ability to debug and optimize code and automate routine tasks.
  • Systematic problem-solving approach, coupled with effective communication skills and a sense of ownership and drive.


Imubit provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability or genetics. In addition to federal law requirements, Imubit complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation, and training.

Imubit does not accept or retain unsolicited CVs or phone calls and/or respond to them or to any third party representing job seekers.

No visa sponsorship is available for this position.

careers@imubit.com

  • Houston, United States Thyme Tech Full time

    Site Reliability Engineer - Remote FriendlyJob OverviewOur company is dedicated to helping businesses harness the power of cloud technology to drive innovation and enhance operational efficiency. We specialize in managed services across leading cloud platforms and are searching for a dedicated Site Reliability Engineer (SRE) with a passion for technology and...


  • Houston, United States SLB Full time

    Employer: Schlumberger Technology Corporation Full-time or part-time: Full-time Job title: Site Reliability Engineer Job Location: 1430 Enclave Parkway, Houston, TX 77077Job Description: Create ultra-scalable and highly reliable software systems through system design consulting, capacity planning, system health monitoring, and sustainable incident...


  • Houston, United States Schlumberger Full time

    Employer: Schlumberger Technology Corporation Full-time or part-time: Full-time Job title: Site Reliability Engineer Job Location: 1430 Enclave Parkway, Houston, TX 77077Job Description: Create ultra-scalable and highly reliable software systems through system design consulting, capacity planning, system health monitoring, and sustainable incident...


  • Houston, Texas, United States Schlumberger Full time

    Full-time or part-time: Full-timeJob title: Site Reliability EngineerJob Location: 1430 Enclave Parkway, Houston, TX 77077Job Description:Create ultra-scalable and highly reliable software systems through system design consulting, capacity planning, system health monitoring, and sustainable incident response. Engage in and improve the entire lifecycle of...


  • Houston, Texas, United States Veradigm® Full time

    Welcome to Veradigm Our mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our Vision We envision a connected community of health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...


  • Houston, United States VMC Soft Technologies, Inc Full time

    W2 CONTRACT ONLY C2C CANDIDATES PLEASE DO NOT APPLYTITLE: Site Reliability EngineerRemote 3 years experience in below technologies must:New Relic Platform with APM, Synthetic, and Browser experienceNew Relic Query Language (NRQL)PythonTechnical Requirements• Very Proficient in New Relic platform (APM, Synthetic, and Browser Monitors)• Develop code or...


  • Houston, United States VMC Soft Technologies, Inc Full time

    W2 CONTRACT ONLY C2C CANDIDATES PLEASE DO NOT APPLYTITLE: Site Reliability EngineerRemote 3 years experience in below technologies must:New Relic Platform with APM, Synthetic, and Browser experienceNew Relic Query Language (NRQL)PythonTechnical Requirements• Very Proficient in New Relic platform (APM, Synthetic, and Browser Monitors)• Develop code or...


  • Houston, Texas, United States SLB Full time

    Employer: Schlumberger Technology Corporation Full-time or part-time: Full-time Job title: Site Reliability Engineer Job Location: 1430 Enclave Parkway, Houston, TX 77077Job Description: Create ultra-scalable and highly reliable software systems through system design consulting, capacity planning, system health monitoring, and sustainable incident...


  • Houston, United States JPMorgan Chase & Co. Full time

    There’s nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission-critical systems. As a Site Reliability Engineer III at JPMorgan Chase within the CORPORATE SECTOR within INFRASTRUCTURE PLATFORMS , you will solve complex...


  • Houston, United States Veradigm® Full time

    Welcome to Veradigm! Our Mission is to be the most trusted provider of innovative solutions that empower all stakeholders across the healthcare continuum to deliver world-class outcomes. Our Vision is a Connected Community of Health that spans continents and borders. With the largest community of clients in healthcare, Veradigm is able to deliver an...


  • Houston, Texas, United States Invesco Full time

    About InvescoAs a premier global asset management firm, Invesco is committed to assisting investors across the globe in achieving their financial goals. We harness the strength of our unique investment management capabilities to offer a diverse array of investment strategies and vehicles to our clients worldwide.At Invesco, we value challenging work,...


  • Houston, United States Fintex Holdings Inc Full time

    Job DescriptionJob DescriptionJob DescriptionWe’re looking for a Site Reliability Engineer with strong software development skills combined with an engineering mindset.  Your responsibility will be to ensure platform performance and scalability by monitoring and investigating activity with an eye toward building the suite of programs necessary to automate...


  • Houston, United States Charles Schwab Full time

    Your Opportunity This full-time role is part of a nine-month NERD (New Employee Recruitment and Development) program that blends on-the-job experience with an extensive training curriculum that covers tools, technologies, processes, and soft skills required to be successful in Schwab Technology Services. By pairing the curriculum, on-the-job experience, and...


  • Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionThe role is part of the Reliability Group supporting plant operation and reliability improvement efforts. Working in a team environment, it carries responsibility for implementing reliability best practices, developing and optimizing preventive maintenance tasks, and supporting maintenance and turnaround activities. The position...


  • Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionThe role is part of the Reliability Group supporting plant operation and reliability improvement efforts. Working in a team environment, it carries responsibility for implementing reliability best practices, developing and optimizing preventive maintenance tasks, and supporting maintenance and turnaround activities. The position...


  • Houston, Texas, United States Fintex Holdings Inc Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer to join our team at Fintex Holdings Inc. As a key member of our technical staff, you will be responsible for ensuring the performance, scalability, and reliability of our platform.Key ResponsibilitiesPlatform Performance and ScalabilityMonitor and investigate platform activity to identify...


  • Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionReliability Manager - Fixed Equipment. The responsibility of the Reliability Manager - Fixed Equipment is to improve the performance of the fixed equipment assets across all sites, build and lead a program which identifies and manages the fixed equipment assets and supporting/surrounding systems reliability risks. The...


  • Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionReliability Manager - Fixed Equipment. The responsibility of the Reliability Manager - Fixed Equipment is to improve the performance of the fixed equipment assets across all sites, build and lead a program which identifies and manages the fixed equipment assets and supporting/surrounding systems reliability risks. The...

  • Reliability Engineer

    2 months ago


    Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionThe reliability engineer is the first point of contact for most mechanical systems in their respective production areas. The client wants to hire aLocal area Houston, TX applicant, no relocation is provided.DUTIES / RESPONSIBILITIES Ø Operations Support· Work with Operations and Maintenance counterparts to solve day-to-day...

  • Reliability Engineer

    2 months ago


    Houston, United States Channel Personnel Services Full time

    Job DescriptionJob DescriptionThe reliability engineer is the first point of contact for most mechanical systems in their respective production areas. The client wants to hire a Local area Houston, TX applicant, no relocation is provided. DUTIES / RESPONSIBILITIES Ø Operations Support· Work with Operations and Maintenance counterparts to solve day-to-day...