Senior Engineer, Site Reliability Engineering
1 week ago
We are looking for a Senior Site Reliability Engineer who can cultivate our SRE philosophy, processes, and technologies from the ground up. As a Senior Site Reliability Engineer within the Platform group, you will lay the groundwork for our SRE infrastructure. Your role will entail driving standards and fostering adoption across our technology teams, whilst closely partnering with our DevOps and Cloud teams. With a hands-on approach, you'll work across both cloud and on-premises hosting platforms, ensuring the reliability and scalability of our trading systems and production environments. This is a chance to play a pivotal role in transforming our operational capabilities and enhancing performance across a wide array of environments and platforms. As a Site Reliability Engineer at BAM, you will: Develop and promote our SRE philosophy, establishing best practices and processes that will be instrumental in scaling our infrastructure. Create and maintain thorough documentation for SRE processes, systems design, and incident post-mortems to foster a culture of learning and improvement. Drive adoption of SRE principles across various technology teams, acting as a mentor and advisor to embedded SREs. Implement end-to-end observability and monitoring solutions using Prometheus, Grafana, Loki, and AWS CloudWatch, ensuring high visibility into application performance and infrastructure health. Utilize and build standards around Sentry for application monitoring and error tracking to proactively identify and address reliability issues. Review and define standards for application reliability requirements within our Kubernetes environment, ensuring application configuration is optimized for performance, cost and reliability. Develop automation and tooling to improve efficiency and reliability of deployment pipelines, system health checks, and recovery procedures. Collaborate with development teams to enhance service stability, scalability, and fault tolerance through SRE best practices like blameless post-mortems and service level objectives (SLOs). Conduct a regular review of the infrastructure and application metrics, logs, and traces to proactively spot and address potential issues before they affect customers. Introduce a reliability by default approach to software delivery. Core Tech Stack: Languages: Python, Java, NodeJS, C#, Shell Public cloud: AWS CI/CD: TeamCity, Octopus, Jenkins Configuration Management: Puppet, Ansible Infrastructure Code: Terraform, CloudFormation Application Management: Kubernetes, Docker, Helm OS: Linux and Windows Observability: Prometheus, Amazon CloudWatch, Sentry, Grafana, Loki To be considered a good cultural fit, you must be: An ambitious self-starter Hungry to learn Driven towards success A very strong and efficient communicator Able to multi-task and excel in a fast-paced trading environment A problem solver; able to develop quick and sound solutions to complex problems To be considered a good fit, you must have: 5 years of experience in SRE or similar roles within complex, distributed systems environments. A Bachelors degree in engineering, computer science, information systems, or equivalent experience Proficient with key SRE technologies such as Prometheus, Grafana, Loki, AWS CloudWatch, and Sentry. Extensive knowledge of container orchestration using Kubernetes and containerization with Docker. Hands-on experience with both cloud (AWS preferred) and on-premises hosting platforms. Proven ability to script in languages like Python, Bash, or Go, to automate routine tasks and deployment pipelines. Strong understanding of CI/CD principles, agile methodologies, and DevOps culture. Excellent troubleshooting and problem-solving skills, with a systematic approach to handle unexpected situations. High level of initiative, passion for reliability engineering, detail orientation, and follow-through capabilities. Exceptional interpersonal and communication skills, with the ability to explain complex technical concepts to a diverse audience. Experience with immutable infrastructure, infrastructure automation and provisioning tools, such as AWS CloudFormation or Terraform Strong knowledge of Linux administration particularly RHEL and CentOS Strong knowledge of distributed systems concepts, including best practices and troubleshooting Knowledge of Windows Server administration and automation with PowerShell Operational understanding of networking concepts, architecture, and best practices, especially as it relates to hybrid cloud integration Analytical skills Ability to troubleshoot and logically assess problems and determine solutions Detailed documentation skills ability to represent ideas, requirements, reference architecture and problems in clear, concise, and business-friendly documents Bonus points for: Experience in a high throughput/low latency environment Experience with successful SRE team build outs Experience with security patterns and distributed authentication Experience managing high-pressure incident response Experience with Chaos Engineering technologies Contributions to open source libraries, projects, or communities Any AWS, Azure, or GCP resource specializations or certifications Any Kubernetes resource specializations or certifications Dont have all the skills listed above? Have extra skills you think are important that we havent thought of? Please, let us know by applying and telling us a bit more about yourself and why you think youre qualified J-18808-Ljbffr
-
Site Reliability Engineer
2 weeks ago
Schiller Park, United States Stardom Employment Consultants Full timeJob Description: As a Site Reliability Engineer, you will be responsible for maintaining and improving the reliability, availability, and performance of our systems. You will collaborate closely with development, operations, and security teams to build and automate scalable infrastructure, monitor system health, and address issues before they impact users....
-
Site Reliability Engineer
1 week ago
Schiller Park, United States Fetch Full timeWhat We're Building And Why We're Building It. There's a reason Fetch is ranked top 10 in Shopping in the App Store. Every day, millions of people earn Fetch Points buying brands they love. From the grocery aisle to the drive-through, Fetch makes saving money fun. We're more than just a build-first tech unicorn. We're a revolutionary shopping platform where...
-
Site Reliability Engineer
1 week ago
Schiller Park, United States Northern Trust Full timeAbout Northern Trust: Northern Trust, a Fortune 500 company, is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. Northern Trust is proud to provide innovative financial services and guidance to the worlds most successful individuals, families, and institutions by remaining true to our enduring...
-
Site Reliability Engineer
1 week ago
Schiller Park, United States Remotely Full timeThis is a remote position. Site Reliability Engineer (1 year experience, remote) Be part of our future This job posting builds our talent pool for potential future openings. We'll compare your skills and experience against both current and future needs. If there's a match, we'll contact you directly. No guarantee of immediate placement, and we only consider...
-
Site Reliability Engineer
1 week ago
Schiller Park, Illinois, United States Northern Trust Full timeAbout the RoleNorthern Trust is a globally recognized, award-winning financial institution that has been in continuous operation since 1889. We provide innovative financial services and guidance to the world's most successful individuals, families, and institutions by remaining true to our enduring principles of service, expertise, and integrity.Key...
-
Senior Site Reliability Engineer
1 week ago
Menlo Park, California, United States Character Technologies Full timeAbout the Position:As a pivotal member of our Site Reliability Engineering team at Character Technologies, you will play a crucial role in managing our extensive infrastructure, which encompasses thousands of nodes, vast amounts of data, and millions of active users daily. Your primary focus will be to ensure the reliability, scalability, and performance of...
-
Menlo Park, California, United States Robinhood Full timeAbout the RoleWe're a dynamic team serving a highly ambitious financial services organization. The Reliability Engineering team provides a specialization within engineering focused on designing, engineering, evolving, and safely making changes to large-scale distributed systems; these systems are often composed of disparate components which are each...
-
Senior Instrument
2 weeks ago
Schiller Park, United States Burns & McDonnell Full timeDescription Burns & McDonnell is a full-service, global AEC firm that excels at every step of project delivery. Our diverse service range gives our clients access to a uniquely broad talent pool of 10,000 professionals and enables us to help solve some of their most complex challenges. We're able to seamlessly plan, design, permit, construct and manage...
-
Senior Site Reliability Engineer
3 weeks ago
Menlo Park, United States Character Full timeAbout the roleResponsibilities: As a Multimodal Site Reliability Engineer (SRE) at Character, you will be responsible for ensuring the reliability, scalability, and performance of our app and AI multimodal services (e.g., voice interfacing services). You will work closely with our development team to design and implement processes and systems that ensure the...
-
Senior Reliability Engineer
2 weeks ago
Baldwin Park, California, United States Caelux Corporation Full timeJob OverviewCompany Background:Caelux Corporation is at the forefront of innovation in perovskite solar cell technology, dedicated to transforming the renewable energy landscape through advanced solutions. We are seeking a skilled Reliability Engineer IV to enhance our manufacturing processes and contribute to our mission of delivering high-efficiency solar...
-
Senior Project Engineer
1 week ago
Schiller Park, United States SkyWater Search Partners Full timeChoose a job you love, and you will never have to work a day in your life. Confucius The Project Engineer will blend their engineering background in this multi-faceted role and will be directly involved in optimizing manufacturing process performance, interface with customers, and identify opportunities for quality improvements and cost savings. To be...
-
Senior Electrical Engineer
2 months ago
Valley Park, Missouri, United States HDR Engineering, Inc. Full timePrimary ResponsibilitiesDuties of HDR's Senior Electrical Engineer include: responsibility for engineering assignments related to high voltage transmission and substation projects; preparing scopes, schedules, and budgets, and ensuring that schedules and budgets are met; providing input on company/client design and engineering protocols and guidelines and...
-
Reliability Engineer III
2 months ago
Forest Park, Georgia, United States Novo Nordisk Full timeAbout the Department You will be joining Fill & Finish Expansions (FFEx), which is responsible for all major expansion activities within aseptic production, solid dosage forms, finished products, fill & finish warehousing, and QC across all production areas in Product Supply. The area is anchored in Product Supply, Quality & IT, which globally employ approx....
-
Project Engineer
3 weeks ago
Schiller Park, United States Sterling Engineering Full timeTitle: Project Engineer Overview: Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S. Job Duties: Review and analyze customer specifications and drawings to...
-
Senior IAM Engineer
2 weeks ago
Schiller Park, United States Complete Staffing Inc Full timeSenior IAM Engineer Summary The Engineer for the Identity & Access Management (IAM) function is responsible for identifying, delivering and supporting the technology used to deliver Sidley’s overall Identity & Access Management program, which is designed to ensure the Firm’s user identities, accounts, credentials and system access are fully and...
-
Senior Fabrication Engineer
2 weeks ago
Schiller Park, Illinois, United States MJ Celco Full timeFabrication Engineer RoleWe are seeking a talented Fabrication Engineer to contribute to our operations at MJ Celco. In this role, you will oversee the planning, organization, and execution of engineering initiatives aimed at achieving corporate goals, improving product offerings, and fostering innovation.Key Responsibilities:Establish strategic guidance for...
-
Senior Bridge Engineer
2 weeks ago
Schiller Park, United States Northern Impact Full timeOverviewNational Award-Winning Engineering Firm is seeking a highly motivated Senior Bridge Engineer who is highly motivated, qualified, and client service-orientated Engineer specializing in Bridge Engineering. This position will be in the Illinois office.The company has developed a mentor-mentee culture that allows them to share their experiences with one...
-
Project Engineer
3 weeks ago
Schiller Park, United States Sterling Engineering Full timeTitle: Project EngineerOverview:Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S.Job Duties:Review and analyze customer specifications and drawings to ensure...
-
Project Engineer
3 weeks ago
Schiller Park, United States Sterling Engineering Full timeTitle: Project EngineerOverview:Sterling has helped build careers for thousands of professionals like yourself. Our expert recruiters support you at every step in the process and as a Best of Staffing company, Sterling provides exciting work with exceptional employers across the U.S.Job Duties:Review and analyze customer specifications and drawings to ensure...
-
Site Engineering Manager
2 weeks ago
New Hyde Park, New York, United States M&J Engineering Full timePosition Overview: M&J Engineering, P.C. is a distinguished provider of comprehensive consulting services, employing over 300 professionals. Established in 2004, M&J has evolved into a versatile provider of engineering, construction management, inspection, technology, and environmental services, catering to a diverse clientele that includes federal, state,...