Senior Data Engineer
Job Description
We're seeking a Senior Data Engineer to design, build, and maintain data infrastructure, fueling AI and analytics efforts. You'll construct the data bedrock for LLM applications, RAG systems, and AI products, alongside traditional data pipelines. Manage the entire data lifecycle, focusing on modern AI data patterns. Construct and oversee vector databases and RAG infrastructure, crafting efficient ETL/ELT pipelines, and upholding data integrity. Empower AI engineers and data scientists to confidently create AI solutions. Design scalable data pipelines for batch and real-time workloads. Implement ELT patterns using dbt, Spark, or Dataflow. Develop data ingestion pipelines from varied sources. Design comprehensive data quality frameworks and data platform architecture on cloud services, optimizing for performance and cost. Build feature stores for ML and data pipelines for ML workflows. Foster self-service data infrastructure.
Qualifications
1. 6+ years of data engineering experience, with 2+ years in AI/ML data infrastructure. 2. Expert Python skills and strong SQL proficiency. 3. Experience with the modern data stack: dbt, Spark, Airflow, and cloud data warehouses. 4. Hands-on experience with vector databases and RAG data pipelines. 5. Experience building data pipelines on AWS, Azure, or GCP. 6. Strong understanding of data modeling and analytical patterns. 7. Experience with data quality frameworks and tools. 8. Solid understanding of data governance. 9. Experience with version control (Git) and CI/CD for data pipelines. 10. Fluent English, written and spoken. 11. Proven experience in international projects. 12. Experience mentoring engineers is preferred. 13. Strong communication, stakeholder management, and problem-solving skills. 14. Experience building feature stores for ML. 15. Familiarity with data lakehouse architectures. 16. Experience with streaming data infrastructure. 17. Knowledge of embedding models and vector search optimization. 18. Experience in insurance, financial services, or healthcare data. 19. Familiarity with data observability platforms. 20. Experience with graph databases for AI. 21. Knowledge of document processing pipelines. 22. Familiarity with LLM-specific data patterns. 23. DevOps Experience | Proficiency with CI/CD pipelines, containerization, cloud platforms, and deployment automation is essential. 24. Infrastructure as Code | Expertise with IaC toolchains (Terraform, Pulumi, CloudFormation) is crucial. 25. Cloud Platforms | Working knowledge of AWS, Azure, or GCP. 26. Version Control & Collaboration | Proficiency with Git, code review, and collaborative development.
Benefits
- 100% Remote - Flexible working hours
Apply Now
