Site Reliability Engineer
3 days ago
Role: Site Reliability Engineer
Location: Plano, TX
Job Type: Fulltime
* must be solid with SW development. Fluent with Python and solid experience with Docker, Kubernetes
What you will be doing
Sr Site Reliability Engineer with expertise in AWS Cloud Engineering, 5G RAN Engineering, Network Design and Engineering, 5G Core Engineering. As an integral part of the Site Reliability and Observability Engineering team, you will be responsible for understanding how the network, applications, tools, and processes relate together that enables the Network Operations Center (NOC) to quickly resolve network events. You will focus on increasing the network and service availability through automation, tools, and processes in your given area of expertise. You are just as comfortable designing, implementing, and troubleshooting technical issues escalated from Tier 1 and Tier 2 in the NOC as you are designing automation and orchestration solutions. As new or enhancements to tools, products, and services are introduced into the 5G Network, you will work closely with product owners to quickly understand and identify the benefits/drawbacks or use-case for the NOC; then work to integrate and operationalize that product/tool into the NOC Support teams.
In this role, you will:
- Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in);
- Manage Safe feature branching strategies and versioning control;
- Develop proper work-flow for team code review and deliver well vetted and tested products.
- Will oversee/author application testing procedures; SW deployment packaging and release coordination with customers; Monitoring of infrastructure, in/outbound processes, web services, application health;
- Implement feature tracking, bug fixes.
- Define standards that produce enterprise quality software that is robust, scalable, and maintainable for the entire lifecycle of the project and business.
- Develop and maintain a catalog of reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation, and operational needs
- Monitoring and analyze network performance, providing automation and orchestration insight for identifying or mitigating network and service-related events
- Analyze data to diagnose and identify root causes for network-specific events within our Domain-of-Responsibility
- Act as a Tier 3 escalation for issues from Tier 1 or Tier 2 related to our observability platform
- Collaborate with Vendors and internal technical teams to understand and incorporate technical solutions.
- Define and implement strategies for network automation to improve operational efficiencies
- Manage a (CI/CD) pipeline for network development and testing
- Participate in the documentation of application/network flows for various support needs
- Provide technical guidance, training and mentorship to members of the NOC & engineering teams we support with our platforms
- Develop and improve instrumentation for monitoring and logging the health and availability of services
- Participate in Major Incident bridges that involve multiple teams/participants and the resulting formal RCA reports.
The Skills and Experience You Bring to Dish Wireless:
Sr Site Reliability Engineer leads the solution to any problem or issue with an automation-first mindset, utilizing a crawl/walk/run approach towards implementation.
Requirements for the position (Must Haves)
- Bachelor’s Degree in Computer Science, IT-related field, or equivalent experience
- Require at least 3+ years of scripting experience in Python, Javascript,
- 3+ years of event-driven engineering with a strong preference for candidates with experience in AIOps using AI/ML platforms/tools
- 3+ years Experience utilizing Source Code Management, CI/CD tools, and Automation tools such as Git/Gitlab, Terraform, Ansible, Chef, Puppet, Jenkins
- 3+ years Experience building CI/CD pipelines, version control, and system testing with Gitlab and Jenkins.
- 3+ Years Experience OS level containerization virtualization/techniques using Docker, WindRiver, VMware, Kubernetes and Rancher for microservices deployment.
- 3+ Years Experience Familiar with cloud platforms such as AWS, Azure, and GCP
- 5+ years of technical, hands-on experience in one or more of the following areas: AWS Cloud Engineering, 5G ORAN, 5G Core, and/or Data and Transport Engineering
- A passion for taking ownership of your work and delivering results
- Habitual code branching, versioning, feature lifecycle management, testing, packaging and deployments
- Voracious need to document code and catalog data transformations
- Willingness to learn and teach complex technologies
- Excellent communication skills, and a team player
Preferred complementary skills for the Job
- 5+ years of experience using one or more platforms, such as DataDog, Grafana, ServiceNow, Solarwinds, Cisco Vitria/Matrix, Innoeye, Atlassian Stack: (Crucible, Bitbucket, JIRA, Confluence)
- Experience gaining insight from log files with LOKI, ElasticSearch, Prometheus, and Grafana.
- Experience implementing systems tracing with services such as Tempo, Jaeger, Opentracing etc.
- Intermediate understanding of utilizing RestAPIs, Apache Spark, Kafka
-
Site Reliability Engineer
4 weeks ago
Plano, Texas, United States Trident Consulting Full time{"h1": "Site Reliability Engineer", "p": "Trident Consulting is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for leading the development and implementation of geospatial application performance monitoring strategies. Key Responsibilities: * Lead the development and...
-
Site Reliability Engineer
4 weeks ago
Plano, Texas, United States Bank of America Full timeAbout the RoleAt Bank of America, we are committed to delivering exceptional service and support to our customers. As a Site Reliability Engineer, you will play a critical role in ensuring the reliability and efficiency of our enterprise security solutions, including Crowdstrike Falcon.Key ResponsibilitiesPartner with engineering and technology teams to...
-
Site Reliability Engineer
3 days ago
plano, United States Headway Tek Inc Full timeRole: Site Reliability EngineerLocation: Plano, TXJob Type: Fulltime* must be solid with SW development. Fluent with Python and solid experience with Docker, KubernetesWhat you will be doingSr Site Reliability Engineer with expertise in AWS Cloud Engineering, 5G RAN Engineering, Network Design and Engineering, 5G Core Engineering. As an integral part of the...
-
Principal Site Reliability Engineer
3 weeks ago
Plano, Texas, United States AT&T Full timeJob SummaryWe are seeking a highly skilled Principal Site Reliability Engineer to join our team at AT&T. As a key member of our Consumer Technology experience team, you will be responsible for delivering innovative and reliable technology solutions to power differentiated, simplified customer experiences.The ideal candidate will have a strong background in...
-
Platform Engineer
3 weeks ago
Plano, Texas, United States Capital One Full timeJob Title: Platform Engineer - Site Reliability EngineeringCapital One is seeking a highly skilled Platform Engineer to join our Site Reliability Engineering (SRE) team. As a Platform Engineer, you will be responsible for designing, developing, and deploying scalable and reliable cloud-based systems.Key Responsibilities:Collaborate with product owners to...
-
Site Reliability Engineer II
2 weeks ago
Plano, Texas, United States Pizza Hut Full timeWe're on a mission to build the most loved global brand and the fastest growing in every country. To achieve this, we need a talented Site Reliability Engineer II to join our dynamic Pizza Hut Incident Management team.As a Site Reliability Engineer II, you will establish frameworks, best practices, and scope management as we transition Incident Management...
-
Senior Site Reliability Engineer
4 weeks ago
Plano, Texas, United States Dexian - DISYS Full timeJob Title: Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Dexian - DISYS. As a key member of our engineering team, you will be responsible for designing, building, and maintaining cloud native applications and infrastructure.Key Responsibilities:Establish frameworks and best practices for...
-
Senior Site Reliability Engineer
2 weeks ago
Plano, Texas, United States MSRCOSMOS Full timeJob DescriptionMSRCOSMOS is seeking a highly skilled Senior Site Reliability Engineer to join our team. As a key member of our Site Reliability and Observability Engineering team, you will be responsible for ensuring the reliability and performance of our network and applications.Key Responsibilities:Design and implement automation solutions to improve...
-
Site Reliability Engineering Director
1 month ago
Plano, Texas, United States Toyota North America Full timeAbout the RoleWe are seeking a highly skilled Director of Site Reliability Engineering to join our team at Toyota North America. As a key member of our organization, you will be responsible for building and leading a high-performing SRE team that ensures the reliability, performance, and scalability of our systems and applications.Key ResponsibilitiesSupport...
-
Site Reliability Engineering Director
3 weeks ago
Plano, Texas, United States Toyota North America Full timeAbout the RoleWe are seeking a highly skilled and experienced Director of Site Reliability Engineering to lead our new SRE team at Toyota North America. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key...
-
Site Reliability Engineering Director
2 months ago
Plano, Texas, United States Toyota Full timeAbout the RoleWe are seeking a highly skilled Director of Site Reliability Engineering to lead our new SRE team at Toyota Financial Services. As a key member of our organization, you will be responsible for building and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.Key...
-
Site Reliability Engineering Director
2 weeks ago
Plano, Texas, United States Toyota Full timeJob SummaryWe are seeking a highly skilled Director of Site Reliability Engineering to lead our new SRE team at Toyota Financial Services. As a key member of our organization, you will be responsible for building and managing a team of engineers to ensure the reliability, performance, and scalability of our systems and applications.Key...
-
Site Reliability Engineering Director
2 weeks ago
Plano, Texas, United States Toyota Motor Sales, U.S.A., Inc. Full timeJob DescriptionToyota Financial Services is seeking a Director of Site Reliability Engineering to spearhead the launch of a new SRE team. The successful candidate will be responsible for building the team from the ground up and establishing robust processes to ensure the reliability, performance, and scalability of our systems and applications.Key...
-
Site Reliability Engineering Director
2 weeks ago
Plano, Texas, United States Toyota Full timeAbout ToyotaToyota is a world-renowned brand that is growing and leading the future of mobility through innovative, high-quality solutions designed to enhance lives and delight those we serve.Job SummaryWe are seeking a highly skilled and experienced Director of Site Reliability Engineering to spearhead our new SRE team. As a key member of our team, you will...
-
Site Reliability Engineering Director
3 weeks ago
Plano, Texas, United States Toyota North America Full timeAbout the RoleWe are seeking a highly experienced Site Reliability Engineering Director to lead our new SRE team at Toyota North America. As a key member of our organization, you will be responsible for building and managing a high-performing team that ensures the reliability, performance, and scalability of our systems and applications.Key...
-
Lead Site Reliability Engineer
18 hours ago
Plano, United States Cognizant Full timeAbout Cognizant’s Digital Engineering Practice: At Cognizant Digital Engineering, a small cross functional team comprised of a Product Manager, an Architect, Full-Stack Developers, UI/UX designers and Big Data analysts builds higher quality software faster siloed individuals working independently. Small, nimble engineering teams generate collective...
-
Plano, Texas, United States Capital One Full timeJob Title: Lead Platform Engineer, Site Reliability EngineeringCapital One is seeking a highly skilled Lead Platform Engineer, Site Reliability Engineering to join our team. As a key member of our engineering organization, you will be responsible for designing, developing, and deploying scalable and reliable cloud-based systems.Key...
-
Senior Site Reliability Engineer
3 weeks ago
Plano, Texas, United States Bank of America Full timeSenior Site Reliability EngineerAt Bank of America, we are committed to delivering exceptional customer experiences through the power of technology. As a Senior Site Reliability Engineer, you will play a critical role in ensuring the stability and performance of our cloud-based identity systems.Key Responsibilities:Collaborate with cross-functional teams to...
-
Plano, Texas, United States Capital One Full timeJob SummaryWe are seeking a highly skilled Senior Platform Engineer, Site Reliability Engineering to join our team at Capital One. As a key member of our engineering community, you will play a critical role in designing, developing, testing, and implementing technical solutions using a full-stack of development tools and technologies.Key Responsibilities*...
-
Senior Platform Engineer
2 weeks ago
Plano, Texas, United States Capital One Full timeAbout the Role:Capital One is seeking a skilled Platform Engineer to join our Site Reliability Engineering team. As a Platform Engineer, you will be responsible for designing, developing, and implementing technical solutions to ensure the reliability, scalability, and performance of our cloud-based infrastructure.Key Responsibilities:Work with product owners...