Site Reliability Engineer
2 weeks ago
* Provide a holistic IT Service Delivery view to dynamic provisioning, capacity planning, scheduled maintenance, system\platform performance metrics, change management, high quality of services, promoting first class automation and selection of rationally priced technical options
* Develop and configuration manage automation for deploying, operating, monitoring, and remediating failures and performance issues in systems deployed on-premise, private cloud, commercial cloud, and hybrid environments for government customers in all phases of integration and operations.
* Identify opportunities for RCA and develop processes to address gaps and promote quality system usage and management
* Select, implement, support, and migrate among providers of infrastructure; manage layered infrastructure at the physical and logical levels; provide expertise, automation, documentation, training and support to consumers of infrastructure as integrated with the application stack.
* Serve as a subject matter resource in automation of cloud cyber risk mitigation in (AWS, AWS GovCloud, and classified offerings).
* Document and automate processes discovered through your back-office engagement with software engineers and front office engagement with users, operators, security engineers, and other customers of SGSS and NRL software.
* Engage with and selectively manage internal and customer change control boards, policy mandates, and compliance frameworks. What You'll Be Doing : * With a software engineering approach to IT operations, design, implement, manage, and automate Application and Infrastructure security tools (containerized Scanning and VM tools) along with integrations to CI/CD pipelines, automated workflows, script-based integrations, etc.
* Identify appropriate Cloud based (i.e., AWS GovCloud) infrastructure to meet mission requirements, including the specification, acquisition, configuration, dynamic provisioning and maintenance of servers
* Bring structured engineering judgement and software engineering expertise to the tension between standardization and specialization - reliably deploying SGSS and NRL standardized software and infrastructure in specialized mission environments, monitoring and supporting its performance, and aiding the team in improving the overall quality of the delivered capability through monitoring, automation, and process improvement
* Specify and configure physical and virtual machines with RedHat Enterprise Linux with a heavy focus on stable and supported operating systems. This includes proactive and consistent maintenance to ensure systems \ platforms (i.e.,Open Stack, Kubernetes, etc.) are up and available 98% of time
* Perform current state analysis of an organization's system security controls and measures against DISA STIG standards, and provide recommendations for enhancement
* As the system stability advocate, implement configuration management automation (e.g., Ansible) to maintain configuration
* As the entrusted reliability liaison, assist the development team with requirements verification. The SRE's holistic view should include but is not limited to capacity planning, system\platform performance metrics, change management, high quality of services, promoting first class automation and reasonable cost implementation options
* As a technical change agent, assist the organization and technical lead(s) in identifying technical problems, perform root cause analysis and corrective actions follow-up, develop managerial summaries and technical steps for implementing software updates, 'fixes' and/or replacements
* Conduct post-incident reviews. Identify what's working and what's not. Develop new
evised response plans that improve the software development lifecycle, revise documentation, implement engineering processes that positively impact IT service delivery and builds customer confident post system maintenance & provisioning
* Fix Support Escalation Issues; serve on the tier 3 support team for integrated product support and proactive response to complex support challenges
* Develop and maintain Infrastructure-as-Code (IaC) with security embedded using such technologies as Terraform
* Document, train, and operate a software assurance capability at multiple security levels
* Document tribal knowledge and integrating into practical use - documentation, automation, monitoring and remediation.
* Support feedback from practical experience to software development, support, IT operations and on-call process improvement
* Develop workflow in Python or similar scripting languages if or as needed
* Build Software for Support Team; build and implement services to improve the quality of support team delivery; improve monitoring and alerting internally and at customer sites in integration, test, and ops What Required Skills You'll Bring : * Must have a minimum active DoD Secret security clearance with ability to obtain a TS/SCI
* Bachelor's Degree in relevant field (i.e., Computer Science, Software Engineer, Information Technology)
* Minimum of fifteen (15) years' experience in a professional relative field
* 15+ years of experience with core infrastructure capabilities: operating systems, networking, identity management and access control
* 12-15 years of experience with all design aspects of the data center support systems, to include AC/DC power, UPS, HVAC, carrier infrastructures, internal/external cable plant, and overall data center layout
* 12+ years of experience demonstrating the ability to communicate clearly, verbally and in writing, to supported staff, management, and government customers
* Understanding of all layers of software engineering and system architecture
* Strong understanding of RCA Methodologies
* Demonstrated history of teamwork and service skills
* Proficiency in securing systems on the application, network, and infrastructure layers
* 12+ years of experience with designing solutions in cloud-optimized, private cloud and hybrid environments
* 12+ years of experience supporting secure, scalable, and elastic applications on distributed architectures
* Expert understanding in infrastructure management process and tools like Terraform and AWS Cloud formation
* Experience with server configuration processes and tools such as Chef, Ansible, or Puppet
* Expert in creating, deploying, maintaining, and troubleshooting Docker or Podman images and orchestration with Kubernetes
* Proficiency with Linux, especially RHEL families, and Bash scripting
* Proficiency in implementing greenfield cloud infrastructure on AWS/Azure/GCP
* Understanding of CI/CD and related concepts
* Expert ability to execute advanced git actions like rebasing and squashing
* Ability to assist other engineers with source code management in git
* Basic understanding of software development and web application development concepts
* Ability to discuss technical tasks and team process topics with team members
* Ability to operate and manage work, strategically reason, and build relationships and influence others What Desired Skills You Might Bring : * Experience serving in security management in a classified IT development program
* Familiarity with specific tools: Gitlab, Ansible, Terraform, OpenStack, AWS GovCloud, RedHat Enterprise Virtualization
* Familiarity with CI/CD and tooling for CM, build, deployment, and code quality around C++ and Python
* AWS Certifications
* DoD 8750 certification at IAT Level II (CompTIA Security+; Cloud+, CASP+), will be required to attain and maintain as part of the job Space Ground Systems Solutions (SGSS), a wholly owned subsidiary under the Parsons Corporation, is passionate about making our nation the undisputed leader in Space because we understand that ensuring our security for future generations depends on it. We have emerged as a leader in the development of cutting-edge solutions for the Department of Defense and Intelligence
Community. Our tremendous success can be attributed to our people and our priorities. Do you want to be part of a team that i
-
Site Reliability Engineer
5 days ago
Alexandria, United States Innovative Computer Solutions Group, Inc Full timeJob Description Job Description Site Reliability Engineer (SRE) mandatory skills/qualifications: Must be a US Citizen • Must possess minimum 3+ years of actual experience in the industry in an SRE role • Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python SRE Required Skills Competencies and...
-
Site Reliability Engineer
4 days ago
Alexandria, United States Innovative Computer Solutions Group, Inc Full timeSite Reliability Engineer (SRE) mandatory skills/qualifications: Must be a US Citizen • Must possess minimum 3+ years of actual experience in the industry in an SRE role • Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python SRE Required Skills Competencies and demonstrated experience in the following...
-
Site Reliability Engineer
1 week ago
Alexandria, United States Innovative Computer Solutions Group, Inc Full timeJob DescriptionJob DescriptionSite Reliability Engineer (SRE) mandatory skills/qualifications:Must be a US Citizen Must possess minimum 3+ years of actual experience in the industry in an SRE role Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, PythonSRE Required SkillsCompetencies and demonstrated...
-
Site Reliability Engineer
2 weeks ago
Alexandria, United States Innovative Computer Solutions Group, Inc Full timeJob DescriptionJob DescriptionSite Reliability Engineer (SRE) mandatory skills/qualifications:Must be a US Citizen • Must possess minimum 3+ years of actual experience in the industry in an SRE role • Must possess minimum 10+ years of software engineer experience with skills in Angular, Node, Java, Python SRE Required SkillsCompetencies and...
-
Site Reliability Engineer-Cloud
6 days ago
Alexandria, United States SpaceGround System Solutions Inc Full timeJob Description Job Description Space Ground System Solutions (SGSS) has an immediate full-time opening for a Site Reliability Engineer (SRE) on its IT Support team located in Alexandria, VA. In this role, you will help continue expansion of satellite ground system software to hybrid and private cloud infrastructure. You will help manage, support and...
-
Site Reliability Engineer
4 days ago
Alexandria, United States Sterling 5 Full timeRole: SRE Location: Alexandria VA ( Hybrid) Visa: USC or GC only SRE mandatory skills/qualifications: Must possess a minimum 3+ years of experience in the industry in an SRE role Must possess a minimum of 10+ years of software engineer experience with skills in Angular, Node, Java, Python, etc. SRE Additional mandatory skill areas: Linux and Unix Cloud...
-
Site Reliability Engineer-Cloud
6 days ago
Alexandria, United States SpaceGround System Solutions Inc Full timeSpace Ground System Solutions (SGSS) has an immediate full-time opening for a Site Reliability Engineer (SRE) on its IT Support team located in Alexandria, VA. In this role, you will help continue expansion of satellite ground system software to hybrid and private cloud infrastructure. You will help manage, support and facilitate infrastructure operation for...
-
Site Reliability Engineer-Cloud
5 days ago
Alexandria, United States Space Ground System Solutions Full timeJob DescriptionJob DescriptionSpace Ground System Solutions (SGSS) has an immediate full-time opening for a Site Reliability Engineer (SRE) on its IT Support team located in Alexandria, VA. In this role, you will help continue expansion of satellite ground system software to hybrid and private cloud infrastructure. You will help manage, support and...
-
Site Reliability Engineer
4 days ago
Alexandria, United States Sterling 5 Full timeRole: SRE Location: Alexandria VA ( Hybrid) Visa: USC or GC only SRE mandatory skills/qualifications: Must possess a minimum 3+ years of experience in the industry in an SRE role Must possess a minimum of 10+ years of software engineer experience with skills in Angular, Node, Java, Python, etc. SRE Additional mandatory skill areas: Linux and Unix...
-
Site Reliability Engineer
7 days ago
Alexandria, United States Sterling 5, Inc. Full timeRole: SRELocation: Alexandria VA ( Hybrid)Visa: USC or GC only SRE mandatory skills/qualifications: Must possess a minimum 3+ years of experience in the industry in an SRE roleMust possess a minimum of 10+ years of software engineer experience with skills in Angular, Node, Java, Python, etc.SRE Additional mandatory skill areas: Linux and UnixCloud...
-
Site Reliability Engineer
6 days ago
Alexandria, United States Sterling 5, Inc. Full timeRole: SRELocation: Alexandria VA ( Hybrid)Visa: USC or GC only SRE mandatory skills/qualifications: Must possess a minimum 3+ years of experience in the industry in an SRE roleMust possess a minimum of 10+ years of software engineer experience with skills in Angular, Node, Java, Python, etc.SRE Additional mandatory skill areas: Linux and UnixCloud...
-
Site Reliability Systems Administrator
1 week ago
Alexandria, United States Booz Allen Hamilton Full timeDo you love finding ways to make systems more efficient? Do you find it impossible to simply maintain when you could improve? Engineering to make a system more resilient and efficient frees up time and money to build more capabilities. Whether you co Systems Administrator, Systems, Liability, Reliability Engineer, Reliability, Network Engineer, Technology
-
Senior Machine Learning Software Engineer
2 weeks ago
Alexandria, Virginia, United States SAIC Career Site Full timeDescription We are seeking a passionate and skilled Senior Machine Learning Software Engineer expert to join our high-performing development team. The ideal candidate will have strong background in data analysis, and machine learning techniques. You will be responsible for extracting insights from large datasets, documenting images, building predictive...
-
Senior Principal Automation Engineer
2 months ago
Alexandria, Virginia, United States SAIC Career Site Full timeDescription SAIC is looking for an Automation Specialist who is a pro with scripting and configuration management tools and languages like Ansible, Terraform, Python, PowerShell. This person should have an automation mindset with the aptitude to automate anything and everything - from day-to-day operations, to building automation pipelines, to host services...
-
DevOps Engineer
2 weeks ago
Alexandria, United States Clarivate Full timeWe are looking for a DevOps Engineer to join our global team. This role provides a challenging and interesting opportunity to work on innovative products, that serve both external and internal customers. We would love to speak with you if you have experience in and would love to work with tools such as Kubernetes, Terraform, Backstage, Jenkins, Spinnaker,...
-
Junior Systems Engineer
2 weeks ago
Alexandria, United States SAIC Full timeDescription SAIC is seeking a Junior Systems Engineer to support the Office of the Under Secretary of Research and Engineering (OUSD(R&E)), specifically within the Digital Engineering, Modeling & Simulation (DEM&S) directorate. The chosen candidate will play a pivotal role in advancing DEM&S goals in alignment with the National Defense Strategy under the...
-
Junior Systems Engineer
7 days ago
Alexandria, United States SAIC Full timeDescription SAIC is seeking a Junior Systems Engineer to support the Office of the Under Secretary of Research and Engineering (OUSD(R&E)), specifically within the Digital Engineering, Modeling & Simulation (DEM&S) directorate. The chosen candidate will play a pivotal role in advancing DEM&S goals in alignment with the National Defense Strategy under the...
-
Senior Product Support Analyst
3 weeks ago
Alexandria, United States Systems Planning and Analysis, Inc. Full timeOverview Systems Planning and Analysis, Inc. (SPA) delivers high-impact, technical solutions to complex national security issues. With over 50 years of business expertise and consistent growth, we are known for continuous innovation for our government customers, in both the US and abroad. Our exceptionally talented team is highly collaborative in spirit and...
-
Systems Engineer
1 week ago
Alexandria, United States Black Bear Technology Solutions, LLC Full timeJob Title: Systems Engineer Location: Alexandria, VA (onsite) Job Type: Full-Time Experience Level: Mid-Senior (5+ Years) Relevant certifications (e.g., Microsoft Certified: Azure Administrator Associate, Microsoft 365 Certified: Modern Desktop Administrator Associate, Security+, CASP+) are a plus About Us: Kwaan Bear Technology is a dynamic and innovative...
-
Site Buyer
8 hours ago
Alexandria, United States Marmon Holdings, Inc. Full timeUTLX Manufacturing LLCCome join a team where People make the difference! As a part of Marmon Holdings, Inc., a highly decentralized organization, we rely heavily on people with the aptitude, attitude, and entrepreneurial spirit to drive our success, and we're committed to attracting and retaining top talent.UTLX MANUFACTURING LLCCome join the team where...