Site Reliability Engineer

23 hours ago

New York, New York, United States Kanak Elite Services Full time $140,000 - $170,000 per year

Title: Site Reliability Engineer (SRE) (Automation & Scheduling)
Location: Fully Remote (CST hours) - open to tier 2/3 markets (e.g., Omaha, Kansas, etc.)
Duration: 6 Months Contract to Hire
Interview Process
3 Rounds Total

Hiring Manager
Director of Back Office Systems
Team Member

Seeking a
Site Reliability Engineer Automation & Scheduling
to lead efforts in automation, observability, and platform reliability across our enterprise job orchestration and data systems. This hands-on role is ideal for a technically skilled individual with strong scripting abilities, a drive for continuous improvement, and direct experience with modern automation technologies.

You will take ownership of complex scheduling workflows, improve system resiliency, reduce operational overhead through scripting and intelligent automation, and support the reliability of critical platforms like AppWorx and Power BI.

Key Responsibilities
Job Scheduling & Automation:

Build and maintain scripts (PowerShell, Python, or Bash) to manage over 2,000 scheduled jobs, improve efficiency, and reduce manual intervention
Enhance monitoring, alerting, and observability to detect issues early and maintain high system availability
Lead root cause analysis and implement preventative measures for job failures and outages

Operational Innovation

Prototype and test automation and orchestration tools in isolated or lab environments, including the use of agent-based systems or orchestration frameworks
Apply Agentic AI or RPA solutions to operational use cases to drive down toil and increase responsiveness
Collaborate with IT teams to optimize system capacity, improve resiliency, and modernize legacy scheduling patterns

Data Platform Reliability (Power BI / Microsoft Fabric)

Support and administer data platform services such as Power BI Gateway and automated data refresh pipelines, with an emphasis on platform reliability and operational efficiency
Troubleshoot and resolve data refresh failures, optimize refresh cycles, and contribute to system observability
Identify and implement automation opportunities across evolving Microsoft Fabric components (e.g., Data Pipelines, Lakehouse, Real-Time Analytics), adapting responsibilities as platform capabilities expand
Contribute to monitoring and deployment improvements across the data ecosystem using scripting and automation tools

Documentation & Collaboration

Document job dependencies, workflows, and operational runbooks with clarity and rigor
Use tools like ServiceNow, LeanIX, Jira, and Asana to ensure job metadata and support documentation remain current
Partner with application owners and infrastructure teams to align job execution with business needs

Qualifications

Proficiency in PowerShell, Python, and Bash, with a focus on systems automation and scripting best practices
Experience managing enterprise job scheduling systems (AppWorx or similar) with attention to reliability and maintainability
Hands-on experience experimenting with Agentic AI or RPA platforms to automate operational workflows is strongly preferred. Candidates should be able to describe proof-of-concept efforts or prototype use cases they've built, even if in non-production environments
Familiarity with system monitoring and alerting tools; ability to build custom checks and observability dashboards
Strong documentation habits and ability to model complex dependencies and recovery steps
Power BI administration experience, including gateway and refresh management
Bachelor's degree in IT, Computer Science, or related field
3 5 years of experience in systems engineering, site reliability, or automation operations
ITSM/ITIL process understanding preferred; ServiceNow experience a plus

Key Attributes For Success

Strong ownership mindset with a proactive approach to solving reliability issues
Eagerness to learn and experiment with new tools, frameworks, and techniques
Ability to thrive in a fast-paced environment with shifting priorities
Effective communicator who can translate technical findings into actionable plans
Focused on outcomes, not effort; continuously looking to simplify and improve

Technical Must-Haves

PowerShell, Python, Bash (scripting experience)
AppWorx (Broadcom) strongly preferred - niche skill, especially in education verticals for scheduling & automation
SQL (working knowledge)
Power BI
Automation & scheduling background
Exposure to Microsoft-heavy environments, agentic AI / Power Automate a plus
Soft Skills:
Strong communication skills (must be very clear)
Curiosity, drive, adaptability
Able to thrive in fast-paced, growing environment

Thanks & Regards
Kartik Sharma
Recruitment Lead
Email :

LinkedIn : Karthik Sharma | LinkedIn

Kanak IT is an equal opportunity employer. We consider all applicants for employment without regard to citizenship, immigration status, race, gender, disability, or any other protected category.
We respect your Online Privacy. This is not an unsolicited mail, If you are not interested in receiving our e-mails then please reply with a "REMOVE" in the subject to and mention all the e-mail addresses to be removed with any e-mail addresses, which might be diverting the e-mail to you.

Site Reliability Engineer

4 days ago

New York, New York, United States CloudIngest Full time $120,000 - $180,000 per year

Site Reliability Engineer (SRE)focused on Dynatrace, OpenTelemetry, and Data Observability using tools like Splunk, Datadog, and New Relic..Location: Berkeley Heights, NJ |Onsite Work Setting(5 days/week in the office required).Role Overview: We're seeking a skilled Site Reliability Engineer with deep expertise in OpenTelemetry and data observability...
Site Reliability Engineer

5 days ago

New York, New York, United States Ampstek Full time

Title: SRELocation: New York, NY (Day 1 Onsite)Implementation: InfosysKindly share Must Have SkillsSRE experience Cloud knowledgeKubernetesApplication log Monitoring, Infrastructure log MonitoringDetailed Job DescriptionSite Reliability Engineer SRE1 SRE experience, Cloud knowledge, Application log Monitoring, Infrastructure log Monitoring, Kubernetes,...
Site Reliability Engineer

5 days ago

New York, New York, United States Cutover Full time $120,000 - $130,000 per year

An inclusive work environment is an empowering one. At Cutover, we lead with empathy and enable others to succeed through curiosity, kindness, and self-expression.Location: Remote, United States (candidates should be based in ET or -1 ET)2nd Shift: 2:00pm -11:00pm PST (10:00 PM - 7:00 AM UTC)Cutover provides enterprise technology operations teams with an...
Staff Site Reliability Engineer

5 days ago

New York, New York, United States Tabs Full time $200,000 - $240,000 per year

About The CompanyTabs is the leading AI-native revenue platform for modern finance and accounting teams. Tabs agents automates the entire contract-to-cash lifecycle, including billing, collections, revenue recognition, and reporting, to help teams eliminate manual work and accelerate cash flow.High-growth companies like Cursor and Statsig rely on Tabs to...
Senior Site Reliability Engineer

4 days ago

New York, New York, United States Uniswap Labs Full time $198,000 - $220,000 per year

Uniswap Labs builds products that help millions of people access DeFi simply and securely ‒ from the Uniswap Web App and Wallet to crypto infrastructure like the Uniswap Trading API, and Unichain. Uniswap Labs also contributes to the development of the Uniswap Protocol, which has processed over $2.9 trillion in volume across thousands of tokens on Ethereum...
Site Reliability Engineer

5 days ago

New York, New York, United States WalkMe Full time $100,000 - $140,000 per year

WalkMe, an SAP company, pioneered the Digital Adoption Platform (DAP) to enable business leaders to fully harness technology in today's complex digital landscape. By leveraging WalkMe's features—guidance, engagement, insights, and automation—employees boost efficiency, executives gain greater visibility into digital usage, and organizations maximize...
Manager, Site Reliability Engineering

4 days ago

New York, New York, United States YES Network Full time $120,000 - $150,000 per year

Manager, Site Reliability EngineeringYES Network for the Gotham Advanced Media and Entertainment ("G.A.M.E")Gotham Advanced Media and Entertainment ("G.A.M.E."), a joint venture of Yankees Entertainment and Sports Network ("YES") and MSG Networks ("MSGN"), is actively seeking a Manager, Site Reliability Engineering to join their team in the greater NYC...
Senior Site Reliability Engineer

2 days ago

New York, New York, United States StubHub Full time $200,000 - $250,000 per year

StubHub is on a mission to redefine the live event experience on a global scale. Whether someone is looking to attend their first event or their hundredth, we're here to delight them all the way from the moment they start looking for a ticket until they step through the gate. The same goes for our sellers. From fans selling a single ticket to the promoters...
Site Reliability Engineer

12 hours ago

New York, New York, United States Dev Full time $100,000 - $120,000 per year

Company Description Booking Job Description At , our mission is to make it easier for everyone to experience the world. And while that world might feel a little farther away right now, we're busy preparing for when the world is ready to travel once more. With strategic long-term investments into what we believe the future of travel can be, we are opening...
Staff Site Reliability Engineer

5 days ago

New York, New York, United States Altana Full time $170,000 - $220,000

AI can be a powerful tool for good in the world – at Altana we apply AI to the world's largest organized body of supply chain data to power a more resilient, more secure, and more sustainable model of global commerce. Our customers connect to the Altana network to build resilience for critical industries and infrastructure, automate and safeguard...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer