Incident Manager
3 weeks ago
In this incident management function, manage incidents to resolution in a 24/7/365 environment using the incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend and implement improved processes, provide timely status updates to stakeholders, assist with postmortem related activities and support various efforts related to operational improvements. Manage efforts to maintain application in production, including troubleshooting stoppages, repairing bugs, documenting application performance, and coordinating with technology infrastructure management.
KEY JOB FUNCTIONS
Manage IT production incidents to resolution in a 24/7/365 environment using the incident management processes and communicate management of incident status, impact and resolution actions.
Hands on experience managing and monitoring applications deployed on Amazon Web Services (AWS).
Troubleshooting and resolving incidents on the AWS cloud infrastructure.
Experience with building tools for monitoring and troubleshooting of system resources in an AWS environment. Ability to triage AWS related incidents using monitoring tools on AWS Cloud.
Experience with performance engineering of AWS Cloud applications.
Hands on experience working with AWS tools like EC2, ELB, RDS, Redshift, DynamoDB, Aurora, Route53, ECS, Lambada, S3, Batch, CloudWatch, CloudTrail, WAF etc.
Hands on experience with transaction level monitoring using Dynatrace and Splunk.
Ability to perform transaction level monitoring and troubleshooting in AWS cloud platform.
Eyes on glass monitoring of the health of applications as well as the underlying infrastructure.
Monitoring experience with tools like Extrahop, SolarWinds, Netcool suite, Catchpoint, MoogSoft.
Ability to analyze dashboards and reporting/monitoring tools to look at trends and patterns in application health and performance.
Proactively looking for hardware, software, and environmental alerts or malfunctions.
Effectively lead and guide Incident triage calls from a technical perspective analyzing different components of the infrastructure and application environment via the use of a variety of monitoring tools and processes.
Troubleshoot the incidents and identify root cause quickly using operations, wire data analytics, application performance management and event correlation monitoring tools.
Perform analysis of data, evaluating multiple application protocols including web, database, storage, and supporting infrastructure such as AWS, UNIX, DNS, LDAP, SSL, SMTP, and FTP.
Influence other technical teams on the calls and articulate troubleshooting steps effectively.
Lead required technical follow-up calls for critical incidents.
Assist with documentation of Root Cause Analysis (RCA) or Correction of Errors (COE) and data quality for all ECC communicated incidents.
Ensure appropriate functional and management escalation takes place as per the standards and procedures.
Follow up on items that could potentially negatively impact production operations, assist with postmortem related activities and support various efforts related to operational improvements.
Based on recommendations from management, implement new and improved processes, change processes, perform new tasks, create reports and address ad-hoc requests.
Participate in on-call rotation. Ability to work on any shifts as needed including weekends and night shifts.
Ability to report incident details and metrics to senior leadership.
EDUCATION
Bachelor's Degree or equivalent required.
MINIMUM EXPERIENCE
6+ years of related experience
SPECIALIZED KNOWLEDGE & SKILLS
6+ years of working experience with different IT Infrastructure components such as Unix/ Linux Servers, Wintel Servers, AWS, networks, firewalls, routers, load balancers, VPN, Apache, web logic, LDAP, Active Directory, Exchange, Oracle/MS SQL databases, SAN, Virtualization, Email systems, Enterprise monitoring and access management solutions for single sign on. Subject matter expertise is not required and experience with at least eight of the above is preferred.
Senior level hands-on working experience with Amazon Web Services (AWS).
Proven methodical approach to problem identification, monitoring, problem solving and resolution.
Ability to analyze different components of the infrastructure and application environments during Incident triage calls.
Aptitude to influence other technical teams on the incident calls and articulate troubleshooting steps effectively.
Experience and confidence working with all levels of management; excellent written and verbal skills.
Able to quickly and concisely communicate with senior management on technical issues in non-technical terms and to run large conference calls during Incident calls with a wide range of personnel and management levels.
Strong relationship management skills and aptitude to multi-task and work well in a high stress environment, both within teams and independently.
AWS Solution Architect Associate or higher certification
Monitoring and observability experience.
Experience with monitoring dashboards for incident detection and alerting.
Perform end-to-end analysis of transactions under an observability environment.
Troubleshoot incidents and identify root cause quickly using wire data analytics, application performance management and event correlation monitoring tools.
Diagnose and resolve incidents by providing factual data from the various monitoring and instrumentation systems.
Monitor applications and infrastructure using tools like Splunk, DynaTrace, OpenTel, Catchpoint, MoogSoft, xMatters, SignalFx, xMatters, SolarWinds, Extrahop etc.
Preferred Qualifications:
Understanding of tools like CloudFormation or Terraform
Management and troubleshooting of Middleware products on UNIX and Linux environments. Knowledge of Service Oriented Architecture (SOA), Java etc.
Understanding of Azure or Google Cloud.
Experience with OpenTel
-
Incident Management Specialist
3 weeks ago
Reston, United States Mindlance Full timeIn this incident management function, manage incidents to resolution in a 24/7/365 environment using the *** incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend...
-
Incident Management Specialist
4 weeks ago
Reston, Virginia, United States Insight Global Full timeJob Summary:We are seeking a highly skilled Incident Manager to join our team at Insight Global. As an Incident Manager, you will be responsible for leading incident triage, communication, and restoration of critical business services to customers and partners.Key Responsibilities:Drive effective triage leadership for all CBWT related technology and...
-
Incident Management Specialist
2 weeks ago
Reston, VA, United States Mindlance Full timeIn this incident management function, manage incidents to resolution in a 24/7/365 environment using the *** incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend...
-
Incident Response Security Specialist
1 month ago
Reston, Virginia, United States Oracle Full timeJob SummaryOracle is seeking a seasoned security analyst to join our SaaS Cloud Security team. As an Incident Response Security Specialist, you will play a key role in securing our large-scale distributed SaaS environment.Key ResponsibilitiesPerform hands-on activities including network and log analysis, malware analysis, and threat hunting.Assist with the...
-
Incident Manager
2 weeks ago
Reston, VA, United States Technology Ventures Full timeIn this incident management function, manage incidents to resolution in a 24/7/365 environment using the incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details obtained from monitoring tools and dashboards to aid troubleshooting, outline details of resolution activities, recommend and...
-
Incident Response Analyst
6 months ago
Reston, United States Oracle Full time*US Citizenship with preference for TS/SCI and FSP Are you interested in securing a large-scale distributed SaaS environment? Oracle's SaaS Cloud Security team is building new technologies that operate at high scale in our broadly distributed multi-tenant cloud environment. The Detections and Response Team plays a key role in enabling Oracle's Security...
-
Network Operations Center Manager
4 weeks ago
Reston, Virginia, United States Innova Solutions Full timeJob Title: Wireless NOC Incident ManagerAbout the Role:Innova Solutions is seeking a highly skilled Wireless NOC Incident Manager to join our team. The successful candidate will be responsible for managing the 24/7 Network Operations Center and leading a team of 5G Network Surveillance & Fault Isolation & Management teams.Manage the day-to-day operations of...
-
Technical Operations Center Manager
4 weeks ago
Reston, Virginia, United States Staffing Science Full timeJob Title: Technical Operations Center ManagerWe are seeking a Technical Operations Center Manager to oversee the day-to-day health, uptime, and reliability of applications, network infrastructure, and associated systems in a 24/7/365 environment. This role focuses on managing Tier 1 incident response and support, while ensuring the stability of critical...
-
Senior Technical Security Program Manager
1 month ago
Reston, Virginia, United States MSCCN Full timeJob SummaryWe are seeking a skilled and experienced Senior Technical Security Program Manager to join MSCCN. The successful candidate will have experience as an engineering TPM and will have a background in Governance, Risk and Compliance (GRC), Supply Chain Risk Management (SCRM), Program Protection Plan (PPP) development, Operational Security (OPSEC),...
-
IT Engineer
3 weeks ago
Reston, United States Mindlance Full timeTitle: IT Engineer - Incident ManagementLocation: Reston, VA (Hybrid)Duration: 12 monthsVideo InterviewJob Description:In this incident management function, manage incidents to resolution in a 24/7/365 environment using the clients incident management processes, effectively guide incident and triage calls from a technical perspective, share technical details...
-
Environmental Sustainability Manager
4 weeks ago
Reston, Virginia, United States Aries Clean Technologies Full timeJob Summary:The Environmental Sustainability Manager is responsible for overseeing the company's Environmental Health and Safety (EH&S) programs, ensuring compliance with regulatory requirements and promoting a culture of sustainability within the organization. Reporting to the Vice President of Risk and Services, this role will lead the development and...
-
Security Operations Center Analyst
3 weeks ago
Reston, United States Eviden Full timeSecurity Analyst - MDR (SOC)Experience Range: 2-3 years of relevant experience in cyber security.Required Qualifications:Strong analytical and technical skills in computer network defense operationsIncident response Handling (Detection, Analysis, Triage, Recommendations)Performing advance investigation of security incidents (reported by L1 & L2 Analyst)...
-
Safety and Risk Management Specialist
4 weeks ago
Reston, Virginia, United States Shaw Bakers Full timeSafety and Security Program ManagementThe Safety Manager at Shaw Bakers is responsible for developing, implementing, and managing comprehensive safety and security programs that ensure the well-being of all employees and visitors while protecting the company from external threats. This role entails evaluating existing safety components, implementing robust...
-
HSE and DOT Compliance Specialist
1 month ago
Reston, Virginia, United States TSS Full timeThe Safety and Regulatory Manager at TSS is responsible for ensuring compliance with Department of Transportation (DOT) regulations and promoting a safe working environment for all employees. This position will partner with operations and human resources to implement safety programs, conduct audits, and manage training to ensure the company's operations are...
-
Information Systems Security Manager
6 days ago
Reston, United States Draper Labs Full timeOverview: Draper is an independent, nonprofit research and development company headquartered in Cambridge, MA. The 2,000+ employees of Draper tackle important national challenges with a promise of delivering successful and usable solutions. From military defense and space exploration to biomedical engineering, lives often depend on the solutions we provide....
-
Operations Voice Manager
4 weeks ago
Reston, Virginia, United States Dynamic vSolutions Full timeDynamic V Solutions is seeking an experienced Tier 2 Helpdesk Lead CAB Manager for Tech Control to join our team supporting DISA JSP ETM in Arlington, VA. The successful candidate will have extensive experience managing help desk operations and delivering high-quality customer service in a fast-paced, mission-critical setting.Key...
-
Infrastructure Manager
1 month ago
Reston, Virginia, United States Commonwealth of Virginia Full timeJob Summary The Commonwealth of Virginia is seeking a highly skilled Infrastructure Manager to join our team. As a key member of our organization, you will be responsible for providing support to the District Maintenance Engineer in delivering the district maintenance program and management of the district infrastructure program. Key Responsibilities ...
-
Restaurant Manager
3 weeks ago
Reston, United States Devita Hancock Hospitality Full timeAbout the job Restaurant Manager • Description: Do you love working in a fast paced environment? • Do you always have a smile on your face and a great attitude? • Are you passionate about food and a leader that enjoys working side by side with your team members while providing excellent service to our guests • Do you enjoy having fun at work? • We...
-
Security Operations Specialist
3 weeks ago
Reston, United States Eviden Full timeSOC Lead Manager - MDR (SOC) Experience Range: 8+ years Key Responsibilities: - Represent the SOC findings in customer calls – highlight value added by SOC, understand the gaps (if any) from customer and work towards fixing those. Distinguish incidents as opposed to non-incidents, so as has to i) hold meaningful/intelligent conversation with customers and...
-
PT Flex Management Center Attendant
1 month ago
Reston, United States Comstock Companies Full timeDescriptionThe Remote Management Center Attendant assists the General Manager with the operation of a complex parking site with multiple employees. The sites could be manned and unmanned, automated, or manual parking sites.Key ResponsibilitiesAccurately process exception transactions (e.g., lost tickets, grace period tickets, etc.) according to company...