Senior Engineer, Site Reliability Engineering

1 week ago


Schiller Park, United States Balyasny Asset Management Full time

We are looking for a Senior Site Reliability Engineer who can cultivate our SRE philosophy, processes, and technologies from the ground up. As a Senior Site Reliability Engineer within the Platform group, you will lay the groundwork for our SRE infrastructure. Your role will entail driving standards and fostering adoption across our technology teams, whilst closely partnering with our DevOps and Cloud teams. With a hands-on approach, you'll work across both cloud and on-premises hosting platforms, ensuring the reliability and scalability of our trading systems and production environments. This is a chance to play a pivotal role in transforming our operational capabilities and enhancing performance across a wide array of environments and platforms. As a Site Reliability Engineer at BAM, you will: Develop and promote our SRE philosophy, establishing best practices and processes that will be instrumental in scaling our infrastructure. Create and maintain thorough documentation for SRE processes, systems design, and incident post-mortems to foster a culture of learning and improvement. Drive adoption of SRE principles across various technology teams, acting as a mentor and advisor to embedded SREs. Implement end-to-end observability and monitoring solutions using Prometheus, Grafana, Loki, and AWS CloudWatch, ensuring high visibility into application performance and infrastructure health. Utilize and build standards around Sentry for application monitoring and error tracking to proactively identify and address reliability issues. Review and define standards for application reliability requirements within our Kubernetes environment, ensuring application configuration is optimized for performance, cost and reliability. Develop automation and tooling to improve efficiency and reliability of deployment pipelines, system health checks, and recovery procedures. Collaborate with development teams to enhance service stability, scalability, and fault tolerance through SRE best practices like blameless post-mortems and service level objectives (SLOs). Conduct a regular review of the infrastructure and application metrics, logs, and traces to proactively spot and address potential issues before they affect customers. Introduce a reliability by default approach to software delivery. Core Tech Stack: Languages: Python, Java, NodeJS, C#, Shell Public cloud: AWS CI/CD: TeamCity, Octopus, Jenkins Configuration Management: Puppet, Ansible Infrastructure Code: Terraform, CloudFormation Application Management: Kubernetes, Docker, Helm OS: Linux and Windows Observability: Prometheus, Amazon CloudWatch, Sentry, Grafana, Loki To be considered a good cultural fit, you must be: An ambitious self-starter Hungry to learn Driven towards success A very strong and efficient communicator Able to multi-task and excel in a fast-paced trading environment A problem solver; able to develop quick and sound solutions to complex problems To be considered a good fit, you must have: 5 years of experience in SRE or similar roles within complex, distributed systems environments. A Bachelors degree in engineering, computer science, information systems, or equivalent experience Proficient with key SRE technologies such as Prometheus, Grafana, Loki, AWS CloudWatch, and Sentry. Extensive knowledge of container orchestration using Kubernetes and containerization with Docker. Hands-on experience with both cloud (AWS preferred) and on-premises hosting platforms. Proven ability to script in languages like Python, Bash, or Go, to automate routine tasks and deployment pipelines. Strong understanding of CI/CD principles, agile methodologies, and DevOps culture. Excellent troubleshooting and problem-solving skills, with a systematic approach to handle unexpected situations. High level of initiative, passion for reliability engineering, detail orientation, and follow-through capabilities. Exceptional interpersonal and communication skills, with the ability to explain complex technical concepts to a diverse audience. Experience with immutable infrastructure, infrastructure automation and provisioning tools, such as AWS CloudFormation or Terraform Strong knowledge of Linux administration particularly RHEL and CentOS Strong knowledge of distributed systems concepts, including best practices and troubleshooting Knowledge of Windows Server administration and automation with PowerShell Operational understanding of networking concepts, architecture, and best practices, especially as it relates to hybrid cloud integration Analytical skills Ability to troubleshoot and logically assess problems and determine solutions Detailed documentation skills ability to represent ideas, requirements, reference architecture and problems in clear, concise, and business-friendly documents Bonus points for: Experience in a high throughput/low latency environment Experience with successful SRE team build outs Experience with security patterns and distributed authentication Experience managing high-pressure incident response Experience with Chaos Engineering technologies Contributions to open source libraries, projects, or communities Any AWS, Azure, or GCP resource specializations or certifications Any Kubernetes resource specializations or certifications Dont have all the skills listed above? Have extra skills you think are important that we havent thought of? Please, let us know by applying and telling us a bit more about yourself and why you think youre qualified J-18808-Ljbffr



  • Schiller Park, United States Stardom Employment Consultants Full time

    Job Description: As a Site Reliability Engineer, you will be responsible for maintaining and improving the reliability, availability, and performance of our systems. You will collaborate closely with development, operations, and security teams to build and automate scalable infrastructure, monitor system health, and address issues before they impact users....


  • Schiller Park, United States Fetch Full time

    What We're Building And Why We're Building It. There's a reason Fetch is ranked top 10 in Shopping in the App Store. Every day, millions of people earn Fetch Points buying brands they love. From the grocery aisle to the drive-through, Fetch makes saving money fun. We're more than just a build-first tech unicorn. We're a revolutionary shopping platform where...


  • Schiller Park, United States Northern Trust Full time

    About Northern Trust: Northern Trust, a Fortune 500 company, is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. Northern Trust is proud to provide innovative financial services and guidance to the worlds most successful individuals, families, and institutions by remaining true to our enduring...


  • Schiller Park, United States Remotely Full time

    This is a remote position. Site Reliability Engineer (1 year experience, remote) Be part of our future This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs. If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider...


  • Schiller Park, Illinois, United States Northern Trust Full time

    About the RoleNorthern Trust is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. We provide innovative financial services and guidance to the world's most successful individuals, families, and institutions by remaining true to our enduring principles of service, expertise, and integrity.Key...


  • Menlo Park, California, United States Character Technologies Full time

    About the Position:As a pivotal member of our Site Reliability Engineering team at Character Technologies, you will play a crucial role in managing our extensive infrastructure, which encompasses thousands of nodes, vast amounts of data, and millions of active users daily. Your primary focus will be to ensure the reliability, scalability, and performance of...


  • Menlo Park, California, United States Robinhood Full time

    About the RoleWe're a dynamic team serving a highly ambitious financial services organization. The Reliability Engineering team provides a specialization within engineering focused on designing, engineering, evolving, and safely making changes to large-scale distributed systems; these systems are often composed of disparate components which are each...

  • Senior Instrument

    2 weeks ago


    Schiller Park, United States Burns & McDonnell Full time

    Description Burns & McDonnell is a full-service, global AEC firm that excels at every step of project delivery. Our diverse service range gives our clients access to a uniquely broad talent pool of 10,000 professionals and enables us to help solve some of their most complex challenges. We're able to seamlessly plan, design, permit, construct and manage...


  • Menlo Park, United States Character Full time

    About the roleResponsibilities: As a Multimodal Site Reliability Engineer (SRE) at Character, you will be responsible for ensuring the reliability, scalability, and performance of our app and AI multimodal services (e.g., voice interfacing services). You will work closely with our development team to design and implement processes and systems that ensure the...


  • Baldwin Park, California, United States Caelux Corporation Full time

    Job OverviewCompany Background:Caelux Corporation is at the forefront of innovation in perovskite solar cell technology, dedicated to transforming the renewable energy landscape through advanced solutions. We are seeking a skilled Reliability Engineer IV to enhance our manufacturing processes and contribute to our mission of delivering high-efficiency solar...


  • Schiller Park, United States SkyWater Search Partners Full time

    Choose a job you love, and you will never have to work a day in your life. Confucius The Project Engineer will blend their engineering background in this multi-faceted role and will be directly involved in optimizing manufacturing process performance, interface with customers, and identify opportunities for quality improvements and cost savings. To be...


  • Valley Park, Missouri, United States HDR Engineering, Inc. Full time

    Primary ResponsibilitiesDuties of HDR's Senior Electrical Engineer include: responsibility for engineering assignments related to high voltage transmission and substation projects; preparing scopes, schedules, and budgets, and ensuring that schedules and budgets are met; providing input on company/client design and engineering protocols and guidelines and...


  • Forest Park, Georgia, United States Novo Nordisk Full time

    About the Department You will be joining Fill & Finish Expansions (FFEx), which is responsible for all major expansion activities within aseptic production, solid dosage forms, finished products, fill & finish warehousing, and QC across all production areas in Product Supply. The area is anchored in Product Supply, Quality & IT, which globally employ approx....

  • Project Engineer

    3 weeks ago


    Schiller Park, United States Sterling Engineering Full time

    Title: Project Engineer Overview: Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S. Job Duties: Review and analyze customer specifications and drawings to...

  • Senior IAM Engineer

    2 weeks ago


    Schiller Park, United States Complete Staffing Inc Full time

    Senior IAM Engineer Summary The Engineer for the Identity & Access Management (IAM) function is responsible for identifying, delivering and supporting the technology used to deliver Sidley’s overall Identity & Access Management program, which is designed to ensure the Firm’s user identities, accounts, credentials and system access are fully and...


  • Schiller Park, Illinois, United States MJ Celco Full time

    Fabrication Engineer RoleWe are seeking a talented Fabrication Engineer to contribute to our operations at MJ Celco. In this role, you will oversee the planning, organization, and execution of engineering initiatives aimed at achieving corporate goals, improving product offerings, and fostering innovation.Key Responsibilities:Establish strategic guidance for...


  • Schiller Park, United States Northern Impact Full time

    OverviewNational Award-Winning Engineering Firm is seeking a highly motivated Senior Bridge Engineer who is highly motivated, qualified, and client service-orientated Engineer specializing in Bridge Engineering. This position will be in the Illinois office.The company has developed a mentor-mentee culture that allows them to share their experiences with one...

  • Project Engineer

    3 weeks ago


    Schiller Park, United States Sterling Engineering Full time

    Title: Project EngineerOverview:Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S.Job Duties:Review and analyze customer specifications and drawings to ensure...

  • Project Engineer

    3 weeks ago


    Schiller Park, United States Sterling Engineering Full time

    Title: Project EngineerOverview:Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S.Job Duties:Review and analyze customer specifications and drawings to ensure...


  • New Hyde Park, New York, United States M&J Engineering Full time

    Position Overview: M&J Engineering, P.C. is a distinguished provider of comprehensive consulting services, employing over 300 professionals. Established in 2004, M&J has evolved into a versatile provider of engineering, construction management, inspection, technology, and environmental services, catering to a diverse clientele that includes federal, state,...