AI - Intern
We are looking for 3rd year student to join as an AI Intern to work on cutting-edge AI-driven solutions for monitoring, optimizing, and securing our data platform. This internship provides an opportunity to apply AI/ML techniques to real-world Big Data and cloud-based challenges. You'll work at the intersection of artificial intelligence, data engineering, and distributed systems.
Key Responsibilities:
Design and build solutions leveraging AI to detect anomalies and deviations in data and provide real-time alerts to enable quick responses and mitigate risks. For example, automatically identify and flag potential occurrences of sensitive information in plain text format within diverse datasets. Work closely with data engineers and analysts to implement scalable AI-driven solutions, optimize model performance, and enhance data quality monitoring.
Leverage machine learning and AI techniques to forecast query execution time, considering factors such as query complexity, data volume, and system load, leading to improved query scheduling and prioritization in large-scale data platforms. This opportunity allows the application of advanced AI methodologies to real-world data challenges, driving performance optimization in modern data ecosystems.
Use ML models to recommend optimal resource allocation for data pipelines based on past usage trends. By analyzing historical usage patterns and current system states, the system anticipates future resource needs and optimizes allocation decisions. Integrate intelligent recommendations, improving pipeline efficiency, cost-effectiveness, and scalability.
Develop a failure prediction model using historical pipeline failure patterns to proactively mitigate issues. This proactive monitoring solution continuously analyzes performance metrics, log data, and system events to detect anomalies that signal potential failures. Integrate predictive insights, enabling proactive issue resolution and improving pipeline reliability.
Build Flask-based web application within Databricks to display real-time status of key data products. This application will provide a centralized dashboard with visual indicators (e.g., green/red status, last refresh time) to monitor data pipeline health and freshness.
Required Skills:
Programming proficiency in Python
Understanding of Machine Learning concepts and frameworks (PyTorch, TensorFlow)
Familiarity with LLM models and Gen AI will be a plus
Version control with Git
🔑 Good to have:
Familiarity with cloud platforms (AWS S3, EC2)
Experience with distributed data processing frameworks (e.g., Spark)
We will consider for employment all qualified applicants, including those with arrest records, conviction records, or other criminal histories, in a manner consistent with the requirements of any applicable state and local laws, including the National Vetting Bureau (Children and Vulnerable Persons) Act 2012, the Private Security Services Act 2004, and the Criminal Justice (Spent Convictions and Certain Disclosures) Act 2016.
About the job
Apply for this position
AI - Intern
We are looking for 3rd year student to join as an AI Intern to work on cutting-edge AI-driven solutions for monitoring, optimizing, and securing our data platform. This internship provides an opportunity to apply AI/ML techniques to real-world Big Data and cloud-based challenges. You'll work at the intersection of artificial intelligence, data engineering, and distributed systems.
Key Responsibilities:
Design and build solutions leveraging AI to detect anomalies and deviations in data and provide real-time alerts to enable quick responses and mitigate risks. For example, automatically identify and flag potential occurrences of sensitive information in plain text format within diverse datasets. Work closely with data engineers and analysts to implement scalable AI-driven solutions, optimize model performance, and enhance data quality monitoring.
Leverage machine learning and AI techniques to forecast query execution time, considering factors such as query complexity, data volume, and system load, leading to improved query scheduling and prioritization in large-scale data platforms. This opportunity allows the application of advanced AI methodologies to real-world data challenges, driving performance optimization in modern data ecosystems.
Use ML models to recommend optimal resource allocation for data pipelines based on past usage trends. By analyzing historical usage patterns and current system states, the system anticipates future resource needs and optimizes allocation decisions. Integrate intelligent recommendations, improving pipeline efficiency, cost-effectiveness, and scalability.
Develop a failure prediction model using historical pipeline failure patterns to proactively mitigate issues. This proactive monitoring solution continuously analyzes performance metrics, log data, and system events to detect anomalies that signal potential failures. Integrate predictive insights, enabling proactive issue resolution and improving pipeline reliability.
Build Flask-based web application within Databricks to display real-time status of key data products. This application will provide a centralized dashboard with visual indicators (e.g., green/red status, last refresh time) to monitor data pipeline health and freshness.
Required Skills:
Programming proficiency in Python
Understanding of Machine Learning concepts and frameworks (PyTorch, TensorFlow)
Familiarity with LLM models and Gen AI will be a plus
Version control with Git
🔑 Good to have:
Familiarity with cloud platforms (AWS S3, EC2)
Experience with distributed data processing frameworks (e.g., Spark)
We will consider for employment all qualified applicants, including those with arrest records, conviction records, or other criminal histories, in a manner consistent with the requirements of any applicable state and local laws, including the National Vetting Bureau (Children and Vulnerable Persons) Act 2012, the Private Security Services Act 2004, and the Criminal Justice (Spent Convictions and Certain Disclosures) Act 2016.