Site Reliability Engineer

2 weeks ago


Richmond, United States NVIDIA Full time

NVIDIA is looking for a Site Reliability Engineer (SRE) to join its Networking Support team. As an SRE at NVIDIA you will ensure that our customers production environments have reliability and uptime. We are seeking an SRE with a mentality and methodology of how maintain, monitor and troubleshoot DC networking equipment.

SRE's culture of diversity, intellectual curiosity, problem solving and openness is important to our success. Our organization brings together people with a wide variety of backgrounds, experiences and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment. We promote self-direction to work on meaningful projects, while we also strive to build an environment that provides the support and mentorship needed to learn and grow.

What you will be doing:

Supervise equipment, applications and processes through various tools applications and consoles

Rapidly debug and triage incidents and user-reported issues

Work with Tier 2 and Tier 3 support as required

Make valuable contribution to the overall health, performance, and reliability of the networking equipment and Infrastructure Services

Develop documentation for Operations processes

Work rotating shifts, including weekends and holidays; and overtime as required

What we need to see:

BS degree or diploma in the Information Technology field, or equivalent experience

4+ years Site reliability engineering experience working on large scale distributed micro services in a production environment with a real passion for automation and tooling

Must be able to operate network devices and pull cables over the racks in a data center environment

Physical labor to Rack/Unrack network equipment in data center

An expertise with Incident management, organizational change and problem management process. Ability to detection of all service-impacting issues, accurate triage, partner communication, impact containment, service restoration, and post-incident follow-up

Tried strengths in problem-solving and root causing issues, while continuously seeking ways to drive optimization, efficiency and the bottom line

Experience performing operational activities including batch processing, system backups, maintenance, monitor and provide Level 1 network and server support, monitor and respond to data center environmental alarms, monitor various application systems

Experience handling special requests for network configuration changes, system reboots, performing server and network switch reboots, file restores, web updates and terminal messaging

Knowledge of TCP/IP networks and troubleshooting tools; Knowledge of Linux operating system and associated tools

Able to work a rotating shift schedule that includes days, nights, weekends and holidays as necessary

Ways to stand out from the crowd:

Strong networking background along with a strong familiarity with major routing/switching protocols and equipment is a bonus

Backgroud of InfiniBand technology

Knowledge of water cooling based networking systems

NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you.

The base salary range is 58,400 USD - 126,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and

benefits

.

NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. #J-18808-Ljbffr



  • Richmond, United States Nucleusteq Full time

    JD - Site Reliability Engineer Location: Richmond, VADuration: 4 months Description: Client’s Enterprise Data Machine Learning (EDML) employs innovative minds like yourself to design and develop software-systems that can meet the demand of our ever-growing customer base.Like a startup inside an enterprise, EDML focuses on using a customer-centric approach...


  • Richmond, United States Gridiron IT Full time

    GridIron IT is seeking 2 Senior Site Reliability Engineers local to Langley, AFB. Active Secret Clearance Required The Site Reliability Engineer (SRE) shall be able to build and maintain infrastructure as code on large scale multi-site deployments. The SRE shall utilize their experience to evaluate and assess new ways to scale platform capabilities. The...

  • Engineer I, II, III

    6 days ago


    Richmond, United States Dominion Energy Full time

    Engineer I, II, III - Reliability Engineer (On-site) At Dominion Energy we love our jobs. Thats right. Love. Every day we go to work filled with passion to be excellent, to creatively problem solve and to innovate. These are exciting days for energy companies, and Dominion Energy aims to shape the future of energy in America. We are looking at all of our...

  • Engineer I, II, III

    5 days ago


    Richmond, United States National Guard Employment Network Full time

    Job Description ATTENTION MILITARY AFFILIATED JOB SEEKERS - Our organization works with partner companies to source qualified talent for their open roles. The following position is available to Veterans, Transitioning Military, National Guard and Reserve Members, Military Spouses, Wounded Warriors, and their Caregivers. If you have the required skill set,...


  • Richmond, Virginia, United States CarMax Full time

    8116 - Midtown Office W. Broad Street, Richmond, Virginia, 23220CarMax, the way your career should be Who we are looking for:The Senior Technology Manager's primary responsibility is to partner with their business and technology peers to provide solutions and services that help deliver CarMax's strategic mission and plans. This position will direct and...


  • Richmond, United States Bank of America Full time

    Job Description: At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day. One of the keys to driving Responsible Growth is being a great place to work for...

  • Engineer I, II, III

    6 days ago


    Richmond, United States VetJobs Full time

    Job Description ATTENTION MILITARY AFFILIATED JOB SEEKERS - Our organization works with partner companies to source qualified talent for their open roles. The following position is available to Veterans, Transitioning Military, National Guard and Reserve Members, Military Spouses, Wounded Warriors, and their Caregivers. If you have the required skill set,...


  • Richmond, United States KBR Full time

    KBR is in search of a skilled Site Facilities Engineer to lead overall operations, maintenance, and performance of our government customer’s sites. You will manage a team of around 60 specialists, focusing on operational effectiveness, maintenance, safety, and environmental compliance. The ideal candidate will demonstrate strong communication skills,...

  • Site Engineer

    2 weeks ago


    Richmond, United States AMG Services Full time

    COMPANY OVERVIEW: AMG Inc. is a full-service engineering company who's been in business for more than 42 years. We provide services to industrial clients nationwide. We service many industries including Food Ingredient Processing, Agricultural Commodities Processing, Chemicals, Plastics, Biotech and Minerals, to name a few. Come be a part of the AMG Inc....


  • Richmond, United States Channel Personnel Services Inc Full time

    Job Description Job Description The Electrical Reliability Engineer II is responsible for electrical reliability projects supporting the Maintenance and Reliability organization at the Chesterfield Plant by applying Reliability Engineering principles, statistical data analysis and supporting work process. This is a fast-paced position that must thrive in a...


  • Richmond, United States Avature Full time

    WestRock (NYSE :WRK) is a global leader in sustainable paper and packaging solutions. We are materials scientists, packaging designers, mechanical engineers and manufacturing experts with a shared purpose: Innovate Boldly. Package Sustainably. Guided by our values of integrity, respect, accountability and excellence, we use leading science and technology to...


  • Richmond, Virginia, United States KBR Full time

    Title:Site Team Lead, Engineering and DesignKBR Sustainable Technology Solutions (STS) provides holistic and value-added solutions across the entire asset life cycle. These include world-class licensed process technologies, differentiated advisory services, deep technical domain expertise, energy transition solutions, high-end design capabilities, and smart...


  • Richmond, Virginia, United States DuPont Full time

    At DuPont, we are working on things that matter; whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world. If you would like to be a part of a premier multi-industrial company that is...


  • Richmond, Virginia, United States Dupont Full time

    At DuPont, we are working on things that matter; whether it's providing clean water to more than a billion people on the planet, producing materials that are essential in everyday technology devices from smartphones to electric vehicles, or protecting workers around the world. If you would like to be a part of a premier multi-industrial company that is...


  • Richmond, Virginia, United States Dupont Full time

    Chez DuPont, nous travaillons sur des choses qui comptent ; qu'il s'agisse de fournir de l'eau potable à plus d'un milliard de personnes sur la planète, de produire des matériaux essentiels aux appareils technologiques de tous les jours, tant pour des smartphones que pour des véhicules électriques, ou encore de protéger les travailleurs du monde entier...


  • Richmond, Virginia, United States Dupont Full time

    En DuPont, trabajamos en cosas que importan, ya sea en proporcionar agua limpia a más de mil millones de personas en el planeta, producir materiales esenciales en los dispositivos tecnológicos cotidianos (desde smartphones hasta vehículos eléctricos) o proteger a los trabajadores de todo el mundo. Si deseas ser parte de una empresa multindustrial líder...


  • Richmond, Virginia, United States Amentum Full time

    Amentum is seeking a Reliability Planning Analyst to support our team of multi-skilled technicians for our operations and maintenance team in Richmond, VA. Typical work schedule is Monday-Friday, 7:30am – 4:00pm; various hours may be required based on business demand.The Reliability Planning Analyst is responsible for the effective execution of all...


  • Richmond, Virginia, United States Amentum Full time

    Amentum is seeking a Reliability Planning Analyst to support our team of multi-skilled technicians for our operations and maintenance team in Richmond, VA. Typical work schedule is Monday-Friday, 7:30am – 4:00pm; various hours may be required based on business demand.The Reliability Planning Analyst is responsible for the effective execution of all...

  • Design Engineer

    1 month ago


    Richmond, United States Bohler Engineering Full time

    Overview At Bohler, we empower the ambitious to become the accomplished. This greater purpose connects us with like-minded professionals, fosters meaningful relationships, and generates the alignment necessary to produce an unrivaled consulting and employment experience. Our Richmond, VA office is looking for a Design Engineer who embodies this purpose. What...


  • Richmond, United States Amentum Full time

    Amentum is seeking a Reliability Planning Analyst to support our team of multi-skilled technicians for our operations and maintenance team in Richmond, VA.  Typical work schedule is Monday-Friday, 7:30am – 4:00pm; various hours may be required based on business demand.The Reliability Planning Analyst is responsible for the effective execution of all...