Role: Sr. AI/ML Engineer
Location: Boston MA (100% Onsite)
Role:
We are looking for a highly skilled Senior AI/ML Engineer with a strong background in designing deploying and operationalizing AI/ML services in production environments. You will be a key contributor in building and maintaining robust scalable systems that support machine learning workflows including Large Language Models (LLMs) and AI agent frameworks. This position requires deep expertise in MLOps distributed systems cloud infrastructure (particularly AWS) and modern software development practices. Candidate need to collaborate with cross functional teams drive outcomes by thinking backward from business objectives and deliver impactful results under specific timelines.
Key Responsibilities:
- Design & Implement AI/ML Solutions o Architect and develop end to end ML solutions from data ingestion to model deployment including LLM based applications. o Evaluate and select appropriate frameworks libraries and tools to meet both short term project goals and long term scalability.
- LLM & Prompt Engineering o Develop and optimize prompts for Large Language Models (e.g. Openai/Claude/Llama) to improve the quality and relevance of outputs. o Conduct experiments to evaluate LLM performance and apply prompt engineering best practices to ensure high impact results.
- AI Agent Frameworks o Incorporate AI agent frameworks (e.g. LangChain AgentGPT or similar) to enable autonomous or semi autonomous decision making within applications. o Integrate AI agents with existing systems ensuring robust communication and secure data handling.
- MLOps & Production Operations Set up and optimize CI/CD pipelines for ML models ensuring continuous integration testing and deployment. Monitor troubleshoot and refine production ML systems for performance cost efficiency and reliability.
- Cloud Development (AWS) o Leverage AWS services (e.g. EC2 S3 Lambda SageMaker EKS) to design and maintain scalable secure and cost efficient ML infrastructure. Implement best practices for cloud resource allocation scaling and maintenance.
- Software Engineering & Distributed Systems o Write clean maintainable and well documented code in Python and other modern languages (e.g. Go Java or Rust). Develop and maintain distributed systems focusing on reliability fault tolerance and performance. Work with databases (SQL/NoSQL) to handle large scale data processing and storage.
- Front End Integration Collaborate on front end projects using React/Next.js to build user interfaces or internal tools that interact with AI/ML services.
- Cross Team Collaboration o Work closely with product managers data scientists DevOps engineers and other stakeholders to define requirements and deliver high impact solutions. o Communicate technical decisions effectively balancing trade offs between short term needs and long term product vision.
- Autonomy & Time Management o Operate with minimal supervision proactively identifying issues and taking ownership to drive solutions. o Manage multiple priorities in a fast paced environment and effectively escalate blockers to ensure timely delivery.
- Continuous Learning & Adaptability o Stay updated with emerging AI/ML technologies LLM advancements and best practices sharing insights with the team. o Adapt quickly to new domains frameworks and technologies as project needs evolve.
Qualifications & Requirements:
- Experience: 8 years of professional software engineering experience including distributed systems and databases.
- Education: Bachelors or Masters degree in Computer Science Engineering or a related field (or equivalent industry experience).
Technical Skills:
- Required: AWS (or other major cloud provider) with hands on experience in deploying monitoring and scaling production services.
- Python (preferred) and proficiency in at least one other modern programming language (e.g. Go Java Rust).
- Strong understanding of MLOps concepts CI/CD pipelines containeriz