J
Asset Management - AI Systems Engineer – Associate/VP
JPMorgan ChaseShanghai, ChinaSenior
Apply Key Responsibilities
- Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
- GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
- Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).
Qualifications
1. Education & Experience:
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Computer Engineering, or a related field.
- 3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.
2. Core Engineering & Systems Skills:
- Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
- Deep understanding of Linux internals, networking, and distributed systems architecture.
- Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.
3. AI Infrastructure & Optimization Skills:
- Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
- Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
- Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).
4. Mindset & Soft Skills:
- A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
- Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).
Preferred
- Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
- Experience writing custom CUDA kernels or using Triton for operator fusion.
- Financial industry (Asset Management/Quant) experience is a plus.
- Language: Professional working proficiency in English to collaborate with global teams.
Similar Jobs
Experience Research Senior AssociateColumbus, OH, United States
Trading Services Manager III – Chicago, ILChicago, IL, United States
Trading Services Specialist Loan Documentation AssociateLONDON, LONDON, United Kingdom
Trading Services AssociateNewark, DE, United States
Experience Research Senior AssociateBengaluru, Karnataka, India
Securities Services - Buyside Trading Services Relationship Manager – Vice PresidentTokyo-To, Japan