DataBricks Architect
2 days ago
LatentView Analytics is a leading global analytics and decision sciences provider, delivering solutions that help companies drive digital transformation and use data to gain a competitive advantage. With analytics solutions that provide a 360-degree view of the digital consumer, fuel machine learning capabilities and support artificial intelligence initiatives., LatentView Analytics enables leading global brands to predict new revenue streams, anticipate product trends and popularity, improve customer retention rates, optimize investment decisions, and turn unstructured data into valuable business assets.
Description:
We are seeking a talented and passionate Databricks Architect to join our growing data engineering team. The resource must act as a bridge between the offshore and the stakeholders. In this role, he/she must help architect, implement, and manage cloud-based data solutions using Databricks and associated technologies (Azure, AWS). The ideal candidate will have experience in developing, optimizing, and maintaining data pipelines, analytics workflows, and data lakes, leveraging Databricks to drive business insights.
Responsibilities:
Data Pipeline Development: Build and maintain scalable and optimized data pipelines using Apache Spark, Databricks, and other cloud-based tools to ensure efficient data flow across systems.
Cloud Infrastructure: Work with cloud providers (Azure, AWS) to design and implement cloud-native solutions for data storage, processing, and analytics. Experience with Databricks in a cloud-based environment is essential.
Collaboration with Data Scientists/Analysts: Collaborate with data scientists, analysts, and business stakeholders to transform business requirements into data solutions and deliver meaningful insights.
Optimization: Continuously optimize Spark and Databricks workflows for performance and cost efficiency.
ETL & Data Integration: Implement ETL processes to integrate data from various sources (SQL, NoSQL, REST APIs) into Databricks environments and data lakes.
Data Governance: Ensure data security, privacy, and compliance policies are followed, including managing data access, monitoring usage, and implementing data lineage.
Monitoring & Troubleshooting: Proactively monitor and troubleshoot data pipelines, performance bottlenecks, and integration issues.
Documentation & Best Practices: Document solutions, architectures, and workflows while enforcing coding best practices within the team.
Skills:
- Hands-on experience with Databricks (Apache Spark) for large-scale data processing and analytics. Strong experience working with cloud platforms such as Azure or AWS (Azure Databricks is highly preferred). Proficiency in SQL, Python, Scala, or Java for data processing and automation. Familiarity with data storage solutions like Delta Lake, Data Lakes, Azure Data Lake Storage, or AWS S3.
- Proficient in building data pipelines and using tools such as Apache Spark, Databricks, and Airflow.
- Experience with DevOps and CI/CD processes for data workflows. Familiarity with containerization technologies (e.g., Docker, Kubernetes) is a plus.
- Familiarity with machine learning pipelines and frameworks (e.g., MLflow, TensorFlow, scikit-learn) is a plus.
- Strong problem-solving skills and the ability to work independently as well as collaboratively.
- Excellent communication skills and the ability to articulate complex technical solutions to non-technical stakeholders.