SaidGig

GPU Kernel Optimization Engineer

$80–$100/hr

RemoteContracttechnology
Apply Now

About this role

Join a dynamic project with a leading AI lab as a GPU kernel optimization expert. This role is tailored for freelancers who possess strong C++ skills and practical GPU programming experience. You will play a crucial role in enhancing kernel performance through profiler-guided analysis, evaluating and optimizing GPU kernels across modern hardware environments. This contract-based position is ideal for specialists passionate about maximizing performance in advanced GPU architectures.

Key Responsibilities
  • Analyze and optimize GPU kernels for performance, efficiency, and hardware utilization.
  • Utilize profiler metrics such as L2 cache hit rate, L2 throughput, occupancy, and related signals to guide kernel improvements.
  • Review GPU kernel implementations and identify bottlenecks without requiring extensive background in the underlying algorithms.
  • Write, modify, and reason about C++17, Python, and GPU programming code.
  • Apply CUDA, HIP, shader programming, or related kernel programming expertise to improve performance outcomes.
  • Document optimization decisions clearly, including when specific profiler metrics are or are not useful.
Ideal Qualifications
  • Available to work at least 20 hours per week.
  • Fluent in core C++ features through C++17.
  • Working knowledge of Python and Git.
  • Fluent in at least one GPU programming model, such as CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming.
  • At least 1 year of professional or graduate-level research experience working with GPUs.
  • Strong understanding of GPU profiler performance metrics and how to use them to optimize kernels.
  • Ability to optimize GPU kernels without needing deep prior context on every algorithm.
  • Experience with CUDA, HIP, CUDA C++ Core Libraries, inline PTX assembly, or tensor core-level optimization is a plus.
  • Experience optimizing kernels for NVIDIA Blackwell hardware is a plus.
  • Familiarity with NSight Compute is a plus.
  • Prior experience with GPU hardware organizations such as NVIDIA, AMD, or Qualcomm is a plus.
  • Open-source contributions related to GPU kernel optimization are a plus.
Application Process

To apply, submit your resume or relevant technical background. Qualified applicants may be asked to complete a brief technical assessment or provide additional information.

Related Jobs