Sr Edge Platforms SRE

1 week ago


Evanston, IL, United States Northwestern University Full time
Apply for Job Job ID 52172
Location Evanston, Illinois
Add to Favorite Jobs Email this Job

Department: NAISE - NU ANL Inst Sci Eng
Salary/Grade: ITS/82

Job Summary:

This will be an SRE role with a focus on maintaining and improving operations of the edge fleet, cloud infrastructure, and data pipeline associated with multiple NSF and DOE funded projects. At this time, the projects collectively operate nearly 200 remote edge devices, each running Linux and a local Kubernetes cluster to host user applications. We expect this number to grow by around 300 devices over the next 5 years as part of Sage Grande, our latest NSF funded project, totaling to a fleet of nearly 500 devices. (See NSF award for more information.)

This incumbent will work closely with the software team to understand the existing design, requirements, and prior issues to inform decisions on monitoring tooling to either be selected or built as needed. The incumbent will also work with key collaborators (various universities, national labs, industry partners, tribal partners, and other non-profit organizations) to ensure that their expectations for nodes and data are being met.

Finally, we expect that this role will provide good opportunities for career growth. First, our fleet will continue to grow, so we expect multiple iterations on designing and implementing ideas and new technologies as they become available. Second, we anticipate additional cloud infrastructure and backends as we support more projects. This will provide plenty of time to understand cloud infrastructure and work with the software team to learn useful patterns for instrumentation and monitoring. Last, the unique nature of our fleet deployment means the incumbent will likely develop software engineering and data analysis skills through implementing novel tooling for addressing issues at scale.

This is a one-year term position. Opportunity for renewal will be based on performance and available funding.

The primary work location is Argonne National Laboratory. This position is primarily on-site, with the possibility of occasional remote work depending on job responsibilities and with management approval. Some travel to other sites is required.

*Note: Not all aspects of the job are covered by this job description.

Specific Responsibilities:

List the primary job duties and responsibilities in order of importance. Typically includes 7-9 bullet points.

  • Addressing software and minor hardware issues in the edge fleet in a timely manner and escalating issues which need attention from the deployment team and/or on-site staff.
  • Selecting, developing, and managing tooling and infrastructure for monitoring and alerting. Due to the unique aspects of our edge deployment, it is expected that you will develop substantial software tooling to address gaps that existing tools do not cover.
  • Developing relevant dashboards for the software team to understand how well services are performing.
  • Performing routine maintenance such as software upgrades and minor tasks such as renewing domain certificates annually.
  • Setup and manage support ticket systems for platform and device issues.
  • Lead a small team (1-2 people) of junior SREs, as we grow the SRE team.

Miscellaneous

Perform other duties as assigned.

MINIMUM QUALIFICATIONS (EDUCATION, EXPERIENCE, CERTIFICATIONS, SKILLS)

  • Successful completion of a full 4-year course of study in an accredited college or university leading to a bachelor's or higher degree in a major such as computer science, information technology, or related; OR appropriate combination of education and experience.
  • 4-5 years of direct experience supporting code, services, and deployments in production.
  • Demonstrated experience in Linux, including fundamentals of scripting, user management, networking, package management, SSH, and debugging.
  • Experience in software engineering and Python.
  • Familiarity with Kubernetes, particularly using Kubernetes for deployments, and being familiar with deploying and administering Kubernetes clusters.
  • Familiarity with monitoring and data collection tooling such as Prometheus, Grafana, Fluentbit, and Loki.
  • Familiarity with basic cybersecurity best practices such as how to securely deploy a web service.
  • Strong willingness to learn new tools and technologies on the job.
  • Strong communication skills.

PREFERRED QUALIFICATIONS (EDUCATION, EXPERIENCE, CERTIFICATIONS, SKILLS)

  • Familiarity with embedded Linux devices such as Raspberry Pi or Nvidia Jetson and Orin family.
  • Familiarity with basic cloud infrastructure concepts such as time series databases (ex. InfluxDB) S3 storage, message brokers (ex. RabbitMQ), caching (ex. Redis), and web services.
  • Familiarity with Infrastructure as Code and config management tooling such as Ansible.
  • Familiarity with basic data analysis and visualization in Python, with a strong ability to communicate issues using these tools.
  • A B.S. or M.S. degree in CS or related fields
  • Linux Operating System
  • Puppet/Chef/Ansible
  • SQL/MySQL/Postgres
  • Python
  • Shell Scripting

Target hiring range for this position will be between $$115,000 to $132,750 per year. Offered salary will be determined by the applicant's education, experience, knowledge, skills and abilities, as well as internal equity and alignment with market data.

Benefits:
At Northwestern, we are proud to provide meaningful, competitive, high-quality health care plans, retirement benefits, tuition discounts and more Visit us at https://www.northwestern.edu/hr/benefits/index.html to learn more.

Work-Life and Wellness:
Northwestern offers comprehensive programs and services to help you and your family navigate life's challenges and opportunities, and adopt and maintain healthy lifestyles.
We support flexible work arrangements where possible and programs to help you locate and pay for quality, affordable childcare and senior/adult care. Visit us at https://www.northwestern.edu/hr/benefits/work-life/index.html to learn more.

Professional Growth & Development:
Northwestern supports employee career development in all circumstances whether your workspace is on campus or at home. If you're interested in developing your professional potential or continuing your formal education, we offer a variety of tools and resources. Visit us at https://www.northwestern.edu/hr/learning/index.html to learn more.



Northwestern University is an Equal Opportunity Employer and does not discriminate on the basis of protected characteristics, including disability and veteran status. View Northwestern's non-discrimination statement. Job applicants who wish to request an accommodation in the application or hiring process should contact the Office of Civil Rights and Title IX Compliance. View additional information on the accommodations process.

#LI-GY1

  • Evanston, IL, United States ULSE Full time

    Job Description We have an exciting opportunity for a Sr. Manager, Cloud and Infrastructure Engineering at UL Research Institutes and UL Standards & Engagement, based in our Evanston, IL, office. The Sr. Manager, Cloud and Infrastructure Engineering, provides strategic leadership for the design, implementation, and management of UL Research Institutes (ULRI)...


  • Evanston, IL, United States ULRI Full time

    Job Description We have an exciting opportunity for a Sr. Manager, Cloud and Infrastructure Engineering at UL Research Institutes and UL Standards & Engagement, based in our Evanston, IL, office. The Sr. Manager, Cloud and Infrastructure Engineering, provides strategic leadership for the design, implementation, and management of UL Research Institutes (ULRI)...


  • Evanston, IL, United States ZS Full time

    ZS is a place where passion changes lives. As a management consulting and technology firm focused on improving life and how we live it, we transform ideas into impact by bringing together data, science, technology and human ingenuity to deliver better outcomes for all. Here you'll work side-by-side with a powerful collective of thinkers and experts shaping...


  • Evanston, IL, United States ZS Full time

    ZS is a place where passion changes lives. As a management consulting and technology firm focused on improving life and how we live it, we transform ideas into impact by bringing together data, science, technology and human ingenuity to deliver better outcomes for all. Here you'll work side-by-side with a powerful collective of thinkers and experts shaping...


  • Evanston, IL, United States ZS Full time

    ZS is a place where passion changes lives. As a management consulting and technology firm focused on improving life and how we live it, we transform ideas into impact by bringing together data, science, technology and human ingenuity to deliver better outcomes for all. Here you'll work side-by-side with a powerful collective of thinkers and experts shaping...


  • Evanston, United States Northwestern University Full time

    A prestigious research institution in Illinois is seeking a Site Reliability Engineer to maintain and improve operations of a large edge fleet and cloud infrastructure. This role focuses on collaboration, software development, and expansive career growth in a fast-paced environment. Ideal candidates have a strong background in Linux, Python, and Kubernetes....


  • Evanston, United States LexisNexis Risk Solutions Inc. Company Full time

    About the Role This position individuals are responsible for challenging reliability and toil reduction projects. Key Responsibilities:Monitoring & Observability: Design and implement advanced monitoring queries and dashboards; establish and refine service level baselines.Incident Response: Lead incident resolution efforts; contribute to post-mortems and...


  • Evanston, United States LexisNexis Risk Solutions Inc. Company Full time

    About the Role This position will resolve incidents and collate data in support of root cause analysis and systems designKey Responsibilities:Monitoring & Observability: Create and optimize monitoring queries; establish service level baselines.Incident Response: Support senior engineers during incidents; contribute to post-incident reviews.Disaster Recovery:...

  • Life Insurance Agent

    2 weeks ago


    Evanston, United States AO Garcia Agency Full time

    Licensed Life Insurance Agents OnlyTired of outdated insurance models? Join a team that uses technology, automation, and marketing funnels to keep your pipeline full - no cold calling required.Perks Include:• Work from home or anywhere you choose• Qualified leads, connected for you• Cutting-edge CRM and training platform• Competitive commissions•...


  • Evanston, United States Beghou Consulting LLC Full time

    Beghou brings over three decades of experience helping life sciences companies optimize their commercialization through strategic insight, advanced analytics, and technology. From developing go-to-market strategies and building foundational data analytics infrastructures to leveraging artificial intelligence to improve customer insights and engagement,...


  • Evanston, United States Beghou Consulting Full time

    Beghou brings over three decades of experience helping life sciences companies optimize their commercialization through strategic insight, advanced analytics, and technology. From developing go-to-market strategies and building foundational data analytics infrastructures to leveraging artificial intelligence to improve customer insights and engagement,...


  • Evanston, United States ZS Full time

    Business Technology Solutions Associate InternZS is a place where passion changes lives. As a management consulting and technology firm focused on transforming global healthcare and beyond, our most valuable asset is our people. Here you’ll work side-by-side with a powerful collective of thinkers and experts shaping solutions from start to finish. At ZS,...


  • Evanston, United States NorthShore University HealthSystem Full time

    Clinical Geneticist | Redefine the Future of Genomic Medicine Location: Evanston, ILSchedule: Monday-Friday | Full-Time Step into a career at the forefront of personalized medicine with Endeavor Health (formerly NorthShore University HealthSystem), the principal teaching affiliate of the University of Chicago Pritzker School of Medicine. We're seeking a...


  • Evanston, United States EMERGE Full time

    OverviewWe are partnered with a technology company specializing in data-driven solutions that empower Property and Casualty Insurance Carriers and Managing General Agents (MGAs) to streamline connectivity, enhance distribution, and drive growth. They offer innovative technology solutions designed to modernize insurance operations and improve business...


  • Evanston, United States ZS Full time

    ZS is a place where passion changes lives. As a management consulting and technology firm focused on improving life and how we live it, we transform ideas into impact by bringing together data, science, technology and human ingenuity to deliver better outcomes for all. Here you'll work side-by-side with a powerful collective of thinkers and experts shaping...