We are seeking a seasoned Data Engineer who isn't just a builder of pipelines, but an architect of intelligence. With 3 years of experience, you have moved beyond "getting things to work" to "making things scale." You will be the backbone of our AI initiatives, ensuring that our models are fed high-quality, high-velocity data.
As a proactive member of a fast-paced team, you will thrive in a startup-like environment where ownership is the default and ambiguity is seen as an opportunity to build structure.
Key Responsibilities
- Design and scale data architectures specifically tailored for Machine Learning (ML) lifecycles, including feature stores, vector databases, and model training pipelines.
- Architect, build, and maintain robust ETL/ELT pipelines that handle structured and unstructured data with a focus on low latency and high reliability.
- Identify bottlenecks in the data lifecycle before they impact the team. You don't wait for a ticket; you see a manual process and automate it.
- Work closely with Data Scientists and AI Researchers to understand model requirements and translate them into technical data specifications.
- Implement automated testing and monitoring for data integrity, ensuring that "garbage in, garbage out" is never a reality for our AI models.
- Build and maintain data lakes and feature stores on Google Cloud Storage.
- Implement real-time and batch processing architectures for AI-driven applications
Requirements
- Proficiency in Python and SQL. Experience with Java or Scala is a plus.
- Hands-on experience with frameworks like Apache Spark, Flink, or Kafka.
- Experience supporting AI projects (e.g., handling embeddings, managing datasets for LLM fine-tuning, or working with tools like LangChain or LlamaIndex).
- Comfortable with Terraform or Docker/Kubernetes to manage data environments.
- Communicative English
Nice to have:
- BigQuery: Optimization and partitioning strategies.
- Vertex AI: Experience building and deploying data pipelines within the Vertex ecosystem.
- Dataflow/Dataproc: Experience with managed Apache Beam or Spark services.
- Google Pub/Sub: Building real-time event-driven architectures.
- Understanding of REST APIs and microservices architecture
Benefits
- Hybrid Model from Kraków, Poland(3 days in office, 2 days remote)
- Professional training programs – including Udemy and other development plans
- Work with a team that’s recognized for its excellence. We’ve been featured in the Deloitte Technology Fast 50 & FT 1000 rankings. We’ve also received the Great Place To Work® certification for five years in a row