Senior Software Engineer/Senior Architect
We are seeking a talented and experienced Data Engineer to join our team as the Technical Lead. The ideal candidate will have experience designing, developing and maintaining data pipelines combining data infrastructure and AI training infrastructure, to create an end-to-end product.
About the role
Architect and design a Python package to help users create scalable pipelines, to go from ‘raw data’ to a trained AI model; including data filtering, data cleaning, data visualization, synthetic data creation, and integration into ML training (incl RL)
Work with large datasets to develop both generic models as well as fine-tuned AI models, especially LLMs, using the package
Continually improve the package by incorporating state-of-the-art techniques and frameworks
Experence
10+ years experience in data engineering or similar roles, with strong knowledge of designing and implementing complex AI and ML solutions, as a Senior Data Scientist, Machine Learning Engineer, or AI Engineer
Strong proficiency in building large-scale data processing pipelines with AI training, familiar with distributed workloads (e.g., multiprocessing, MPI, Ray, Dask, Spark)
Experience developing end-to-end pipelines for model training; from handling structured and unstructured data sources to cleaning and creating synthetic data to actual training
Experience with AI technologies across the training journey, intimate familiarity with using Pytorch/ Horovod/ TensorflowAbility to take extreme ownership over your work
Excellent problem-solving and communication skills
Active GitHub contributions are a big plus
Built Data pipelines for ML Training (Must, Ideally: Ray)
About the job
Apply for this position
Senior Software Engineer/Senior Architect
We are seeking a talented and experienced Data Engineer to join our team as the Technical Lead. The ideal candidate will have experience designing, developing and maintaining data pipelines combining data infrastructure and AI training infrastructure, to create an end-to-end product.
About the role
Architect and design a Python package to help users create scalable pipelines, to go from ‘raw data’ to a trained AI model; including data filtering, data cleaning, data visualization, synthetic data creation, and integration into ML training (incl RL)
Work with large datasets to develop both generic models as well as fine-tuned AI models, especially LLMs, using the package
Continually improve the package by incorporating state-of-the-art techniques and frameworks
Experence
10+ years experience in data engineering or similar roles, with strong knowledge of designing and implementing complex AI and ML solutions, as a Senior Data Scientist, Machine Learning Engineer, or AI Engineer
Strong proficiency in building large-scale data processing pipelines with AI training, familiar with distributed workloads (e.g., multiprocessing, MPI, Ray, Dask, Spark)
Experience developing end-to-end pipelines for model training; from handling structured and unstructured data sources to cleaning and creating synthetic data to actual training
Experience with AI technologies across the training journey, intimate familiarity with using Pytorch/ Horovod/ TensorflowAbility to take extreme ownership over your work
Excellent problem-solving and communication skills
Active GitHub contributions are a big plus
Built Data pipelines for ML Training (Must, Ideally: Ray)