Asset Management - AI Systems Engineer – Associate/VP

Shanghai, ChinaSenior

Key Responsibilities

Inference Platform & Optimization: Build and optimize enterprise LLM serving platforms (e.g., vLLM, TensorRT-LLM) using techniques like PagedAttention, continuous batching, and quantization (AWQ/FP8) for high throughput and low latency.
GPU Pooling & AI Infra: Design GPU pooling, virtualization, and scheduling solutions on Kubernetes to maximize hardware utilization. Manage distributed training clusters and high-performance networking (RDMA/NCCL).
Model Deployment & MLOps: Streamline the CI/CD pipeline for AI models. Implement automated benchmarking, zero-downtime deployment, and comprehensive observability (TTFT, TPS, GPU metrics).

Qualifications

1. Education & Experience:

Bachelor’s, Master’s, or Ph.D. in Computer Science, Computer Engineering, or a related field.
3+ years of experience in Backend Systems, Distributed Systems, or AI Infrastructure/MLOps, with at least 1-2 years specifically focused on LLM serving, GPU optimization, or ML Systems.

2. Core Engineering & Systems Skills:

Expert-level proficiency in Python and strong proficiency in Java (essential for inference engines and CUDA integration).
Deep understanding of Linux internals, networking, and distributed systems architecture.
Hands-on experience with container orchestration (Kubernetes, Docker) and building custom K8s operators or controllers.

3. AI Infrastructure & Optimization Skills:

Deep familiarity with LLM inference engines (vLLM, TensorRT-LLM, TGI) and understanding of their underlying architectural designs. Or
Solid understanding of GPU architecture (NVIDIA Ampere/Hopper), CUDA programming, and GPU memory management. Or
Experience with distributed training frameworks (DeepSpeed, Megatron-LM, Ray) and high-performance networking (RDMA, RoCE, InfiniBand).

4. Mindset & Soft Skills:

A "hacker" mindset with a passion for squeezing every drop of performance out of hardware.
Ability to collaborate effectively with AI Researchers (to understand their models) and Backend Engineers (to integrate AI into business systems).

Preferred

Contributions to open-source AI Infra projects (e.g., vLLM, Ray, PyTorch).
Experience writing custom CUDA kernels or using Triton for operator fusion.
Financial industry (Asset Management/Quant) experience is a plus.
Language: Professional working proficiency in English to collaborate with global teams.

Similar Jobs