Lead Site Reliability Engineer

3 weeks ago


Baltimore, United States T Rowe Price Group, Inc Full time
There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. We are a premier asset manager focused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We invite you to explore the opportunity to join us and grow your career with us.

Department: CDO Technology Group

Summary:

We are seeking a highly motivated and experienced Lead Site Reliability Engineer (SRE) to join our CDO Technology Group. As an SRE, you will play a crucial role in ensuring the availability, latency, performance, efficiency, and stability of our critical infrastructure, which supports a range of data platforms, applications, and services. You will collaborate closely with development teams to implement and maintain reliable and scalable systems while adhering to industry best practices and security standards.

Responsibilities:

Availability

  • Proactively monitor and proactively identify potential issues that could impact the availability of our systems.
  • Implement and maintain automated alerting mechanisms to notify the appropriate parties of potential outages or performance degradation.
  • Collaborate with development teams to design and implement solutions that enhance system resilience and reduce downtime.
Latency:
  • Analyze performance metrics to identify and resolve latency bottlenecks in our infrastructure.
  • Implement performance optimization techniques and tools to improve the overall responsiveness of our systems.
  • Work with development teams to ensure that new features and code changes do not introduce performance regressions.
Performance:
  • Develop and maintain metrics dashboards to track key performance indicators (KPIs) for our critical systems.
  • Identify performance trends and anomalies that may indicate potential issues or areas for improvement.
  • Recommend and implement performance optimization strategies to enhance the overall efficiency of our systems.
Efficiency:
  • Optimize resource utilization and minimize unnecessary expenditure on IT infrastructure.
  • Collaborate with development teams to optimize resource allocation for new applications and services.
Release Management:
  • Participate in the release planning process to ensure that software releases are conducted smoothly and without disruptions.
  • Develop and implement automated deployment and rollback procedures to mitigate risks associated with software updates.
  • Monitor the performance of new releases and address any issues that arise promptly.
Monitoring:
  • Design, implement, and maintain a comprehensive monitoring infrastructure to track the health and performance of our systems.
  • Analyze monitoring data to identify potential issues and proactively troubleshoot problems before they impact users.
  • Develop and implement alerts and notifications for critical events to ensure timely intervention.
Emergency Response:
  • Respond promptly to incidents and work collaboratively to resolve them in a timely manner.
  • Analyze root causes of incidents to identify and implement preventive measures to minimize their recurrence.
  • Document incident responses and lessons learned to enhance our incident handling processes.
  • Participate in capacity planning exercises to anticipate future workloads and make proactive recommendations to expand or optimize infrastructure resources.
  • Stay abreast of emerging technologies, trends, and industry best practices in the field of site reliability engineering and contribute to the continuous improvement of our practices and tools.
  • Work with development teams to review architecture design to ensure high availability and proper disaster recovery strategy
  • Collaborate with reliability and infrastructure engineering team in T Rowe Price to build synergy in tooling for the implementation of observability, tracing, and alerting
Qualifications:
  • Bachelor's degree in Computer Science, Information Technology, or a related field.
  • 8+ years of experience as a Site Reliability Engineer or equivalent in a similar role.
  • Proven experience in monitoring, analyzing, and optimizing the performance of large-scale distributed systems.
  • Expertise in Linux systems administration, including managing servers, operating systems, and network configurations.
  • Strong scripting and automation skills, preferably with experience in Bash, Python, or similar languages.
  • Familiarity with AWS.
  • Experience with DevOps tools and practices, such as GitLab CI/CD, and Docker.
  • Excellent troubleshooting and problem-solving skills with a knack for identifying and resolving complex technical issues.
  • Ability to work independently and as part of a collaborative team, effectively communicating technical concepts to both technical and non-technical stakeholders.
  • A passion for maintaining high availability, performance, and reliability of critical systems in a fast-paced financial environment.


Benefits:
  • Competitive salary and comprehensive benefits package.
  • Opportunity to work with cutting-edge technologies and contribute to the development of innovative solutions.
  • Collaborative and supportive work environment with a focus on continuous learning and professional development.
FINRA Requirements

FINRA licenses are not required and will not be supported for this role.

Work Flexibility

This role is eligible for remote work up to three days a week.

Commitment to Diversity, Equity, and Inclusion:

We strive for equity, equality, and opportunity for all associates. When we embrace the power of diversity and create an environment where people can bring their authentic and best selves to work, our firm is stronger, and we create greater value for our clients. Our commitment and inclusive programming aim to lift the experience for each associate and builds allies for our global associate community. We know that a sense of belonging is key not only to your success at the firm, but also to your ability to bring your best each day.

Benefits: We invest in our people through a wide range of programs and benefits, including:
  • Competitive pay and bonuses as well as a generous retirement plan and employee stock purchase plan with matching contributions
  • Flexible and remote work opportunities
  • Health care benefits (medical, dental, vision)
  • Tuition assistance
  • Wellness programs (fitness reimbursement, Employee Assistance Program)


Our policies may change as our working lives evolve. Yet, our commitment to supporting our associates' well-being and addressing the needs of our clients, business, and communities is unwavering.

T. Rowe Price is an equal opportunity employer and values diversity of thought, gender, and race. We believe our continued success depends upon the equal treatment of all associates and applicants for employment without discrimination on the basis of race, religion, creed, color, national origin, sex, gender, age, mental or physical disability, marital status, sexual orientation, gender identity or expression, citizenship status, military or veteran status, pregnancy, or any other classification protected by country, federal, state, or local law.

  • Baltimore, Maryland, United States T. Rowe Price Full time

    T. Rowe Price Lead Site Reliability Engineer (SRE) in Baltimore , Maryland There is a place for you at T. Rowe Price to grow, contribute, learn, and make a difference. We are a premierassetmanagerfocused on delivering global investment management excellence and retirement services that investors can rely on today and in the future. The work we do matters. We...


  • Baltimore, United States SITEC Consulting LLC Full time

    About SITEC SITEC is an employee and customer focused Information Technology and Professional Services Firm specializing in design, development, and delivery of state-of-the-art technology solutions, as well as cybersecurity, software and systems engineering services. Summary The Site Reliability Engineer provides support in software development/engineering,...


  • Baltimore, Maryland, United States Salesforce Full time

    Inc's Candidate Privacy Notice contains more details about the handling and use of the personal data of job applicants.For more information about our website privacy practices, please see our Privacy Statement.DevOps/Site Reliability Engineer (SRE) with TS/SCI (on site Northern Virginia) page is loaded DevOps/Site Reliability Engineer (SRE) with TS/SCI (on...


  • Baltimore, United States Enterprize Software LLC Full time

    We are seeking a dedicated Site Reliability Engineer for our Operations & Sustainment Team. Our ideal candidate would be someone with a comprehensive knowledge of IT environments, software applications, and technical frameworks. They should have a proven record of providing Tier 1 and Tier 2 support and exhibit a constant zest for learning and enhancing...


  • Baltimore, United States Fearless Full time

    Fearless is looking for a Site Reliability Engineer II to add to our diverse team of 250+ employees (and counting!). What You’ll Be Doing We’re looking to change the world by building software with a soul, and we want your help. The Site Reliability Engineer II implements reliable infrastructure solutions according to the strategic direction of the team....


  • Baltimore, United States Booz Allen Hamilton Full time

    Job Number: R0194855 Site Reliability Engineer The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or sof tware development-if you have a passion for making systems better, we need you! Your combination...


  • Baltimore, United States Booz Allen Hamilton Full time

    Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or sof tware development—if you have a passion for making systems better, we need you! Your combination of people skills and te chn ical expertise makes you the team...


  • Baltimore, United States Akina Full time

    TS/SCI w/Polygraph required Approved for 60% telework 06-11-SRE Description: DevOps refers to a software development concept that unites and brings together developers and IT staff. The DevOps approach involves consistent, small edits to software coding. This means frequent updates and testing of software that results in very quick releases. DevOps is a...


  • Baltimore, United States Akina Full time

    TS/SCI w/Polygraph required Approved for 60% telework 06-10-SRE Description: DevOps refers to a software development concept that unites and brings together developers and IT staff. The DevOps approach involves consistent, small edits to software coding. This means frequent updates and testing of software that results in very quick releases. DevOps is a...


  • Baltimore, United States Salesforce Full time

    Salesforce.com Inc's Candidate Privacy Notice contains more details about the handling and use of the personal data of job applicants. For more information about our website privacy practices, please see our Privacy Statement. DevOps/Site Reliability Engineer (SRE) with TS/SCI (on site Northern Virginia) page is loaded DevOps/Site Reliability Engineer (SRE)...


  • Baltimore, United States NiSUS Technologies Corporation Full time

    Seeking a Site Reliability Engineer that has both development and system administration experience with large systems who can use their experience to formulate and implement automation solutions to support our monitoring and system administration teams in tasks that either are risky to the system, prone to mistakes, labor intensive, time consuming and/or...


  • Baltimore, United States Booz Allen Hamilton Full time

    Job Number: R0199951 Site Reliability Administrator The Opportunity: Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you come from a background in network engineering, systems administration, or software development-if you have a passion for making systems better, we need you! As a site...

  • Reliability Engineer

    2 weeks ago


    Baltimore, Maryland, United States W. R. Grace Full time

    Requisition ID: 22915 Built on talent, technology, and trust, Grace is a leading global supplier of catalysts and engineered materials. The company's two industry-leading business segments-Catalysts Technologies and Materials Technologies-provide innovative products, technologies, and services that enhance the products and processes of our customers around...


  • Baltimore, United States Wyetech LLC Full time

    This is a position within an open source Accumulo product development team. The candidate will have a primary focus of supporting all aspects of agile software development/engineering, including requirements analysis, software development, installation, integration, evaluation, enhancement, maintenance, testing and problem diagnosis/resolution for the open...


  • Baltimore, United States Horizon Controls Group Full time

    Job DescriptionJob DescriptionSummaryWe are seeking a highly skilled and experienced Lead DeltaV Engineer to join our dynamic engineering team. The ideal candidate will be responsible for overseeing the design, implementation, and support of Emerson DeltaV Distributed Control Systems (DCS) in various projects. This role requires strong technical expertise,...


  • Baltimore, United States Vicinity Energy Full time

    Job DescriptionJob DescriptionThe Lead Stationary Engineer– DPSCS is responsible for safely, reliably, and efficiently operating the steam distribution system, ancillary equipment, and chilled water plants within a prison site. The ideal candidate must be familiar with steam pressure reducing stations, hot water converters, chillers in the 500-ton range,...


  • Baltimore, United States Tential Full time

    SUMMARY: Our client, the nation's largest non-bank lender is seeking a strong engineering leader to be a technical subject matter expert within our AUTO platform focusing on process orchestration using Camunda and end to end integration of the platform. This Lead Engineer works with product, engineering, security, and operations teams to design, develop, and...


  • Baltimore, Maryland, United States Middle River Aerostructure Systems Full time

    Position Title: Lead Materials and Process Engineer - Sustaining Location: Baltimore, MD, US, 21220 Date: Mon, 27 May 24 06:04:52 CDT Company Name: STENAHCM20 Description: About Us: ST Engineering MRAS is a world-leading manufacturer of complex aerostructures including nacelle systems and specialized structural components of the airframe. It supplies and...


  • Baltimore, United States Energy Philadelphia Env_SiteCivil Full time

    Job Description Company Overview is a full-service professional services consulting firm providing creative design, planning, and environmental solutions to the challenges of everyday life. Since 1946, we’ve focused on our relationships with people—our employees, our clients, and the communities we serve. Our matters because our people...


  • Baltimore, Maryland, United States Middle River Aerostructure Systems Full time

    Job DescriptionPosition Title: Lead Materials and Process Engineer - SustainingLocation: Baltimore, MD, US, 21220Date: Sat, 25 May 24 06:04:57 CDTCompany Name: STENAHCM20Description:About Us:ST Engineering MRAS is a world-leading manufacturer of complex aerostructures including nacelle systems and specialized structural components of the airframe. It...