At IBM Software, we transform client challenges into solutions, building the world's leading AI-powered, cloud-native products that shape the future of business and society. We are building the next generation of watsonx.data—a GPU-accelerated, open data lakehouse engineered to deliver category-leading price-performance for analytics and AI workloads. Working in Software means joining a team fueled by curiosity and collaboration, where you'll work deep inside Presto and Spark to build connectors, custom UDFs/UDAFs, optimizer rules, and performance-critical execution paths that shape query throughput and latency for customers at petabyte scale. With a culture that values innovation, growth, and continuous learning, IBM Software places you at the heart of IBM's product and technology landscape. Here, you'll have the tools and opportunities to advance your career while creating software that changes the world.
As a Software Engineer with deep Presto and/or Spark internals expertise, you will design, develop, test, and deliver connectors, optimizer extensions, and engine-level performance improvements that power the watsonx.data query layer.
You will work in an Agile, collaborative environment to understand stakeholder requirements and directly impact query throughput, latency, and reliability at petabyte scale.
Your primary responsibilities will include
- Develop Engine Internals: Build and maintain Presto and/or Spark connectors, operator implementations, optimizer rules, and custom UDFs/UDAFs for open table formats (Iceberg, Delta, Hudi). Optimize Performance at Scale: Diagnose and resolve data skew, broadcast-join sizing, shuffle bottlenecks, and memory pressure; tune operator memory, spill thresholds, and off-heap usage using async-profiler and flamegraphs. Contribute to CI/CD & Benchmarks: Contribute to the automated CI/CD pipeline and maintain CI benchmarks that guard against regressions in query latency, throughput, and resource consumption. Support Production & Debug: Support Presto/Spark deployments on Kubernetes and bare metal, unit-test fixes for engine-related and customer-reported issues, and participate in on-call. Collaborate in Agile Environment: Partner with query optimization, storage, GPU acceleration, and AI/ML teams, conduct reviews with measurable acceptance criteria, and document connector interfaces and engine internals.
- Engine Development Experience: 6+ years of professional software engineering, including at least 2 years developing against Presto/Trino or Apache Spark internals. Language & Codebase Proficiency: Strong Java or Scala skills with comfort navigating and modifying a large, complex open-source codebase. Connectors & Optimizer Work: Hands-on experience building connectors, UDFs/UDAFs, or optimizer extensions, plus working knowledge of Spark/Presto query planning, physical execution, and the operator/stage model. Performance & Formats: Experience resolving data skew, shuffle bottlenecks, broadcast-join sizing, and memory pressure at scale; familiarity with an open table format (Iceberg, Delta, or Hudi); JVM tuning in production. Communication & Education: Clear written communication—able to file actionable bugs, write design docs, and explain engine trade-offs; Bachelor's degree in Computer Science, Engineering, or equivalent practical experience. OSS & GPU Acceleration: Committer or significant contributor to Apache Spark, Trino, or Presto, and experience integrating GPU-accelerated execution (RAPIDS Accelerator, cuDF) into query paths. Vectorization & Multi-Tenancy: Familiarity with vectorized execution and columnar formats (Arrow, ORC, Parquet), ML feature and inference pipelines on Spark/Presto, FinOps cost modeling, and multi-tenant deployments with fairness scheduling and workload isolation. United States Software Engineering Hybrid Professional Multiple Cities