ARU Internship Scheme 2026: Software Engineer Intern
Company name: Xonai Ltd
Pay rate: £13.45 per hour
Location: Remote
Hours: Full time – 37hrs per week
Term: 10 weeks
Start date: May 2026
Application deadline: 12th April 2026
Interview dates: April
Apache Spark powers data infrastructure worldwide, but at petabyte scale, performance bottlenecks translate directly into massive infrastructure costs. At Xonai, we're solving that with a novel engine purpose-built to dramatically accelerate Spark jobs at its core and without requiring data teams to change how they work. We've raised $4.5M in seed funding to build the best-in-class data infrastructure optimisation engine for the AI era.
The role
As a Software Engineer Intern for this role, you will collaborate with the founding team in the implementation of a novel accelerator for Apache Spark, the most widely used Big Data processing engine at petabyte-scale. In your 10-week internship at Xonai, you’ll drive state-of-the-art implementation of techniques that will directly enhance execution performance and the design of our custom compiler for petabyte-scale data processing. Your contributions to our core IP will directly impact data processing infrastructure transforming petabytes of data every day where Xonai is being deployed.
We have a number of open projects, and together we will decide which one best fits your skills and preference.
Project 1: Extend Xonai Custom Compiler for Optimised Columnar Processing
The Xonai accelerator replaces Spark row-based data format with columnar data format for faster execution speed. This project focuses on designing a special MLIR type that represents columnar data operations more efficiently inside Xonai’s custom compiler.
What you’ll do day to day
- Study the existing MLIR type system and how Xonai’s custom dialect is structured
- Understand how columnar data flows through the Spark acceleration pipeline
- Design a new MLIR type better suitable for database columnar data storage
- Update related operations in the custom dialect so they can recognise and manipulate the new type
- Write and run compiler tests to validate that the type behaves as expected
- Benchmark performance impacts of the new type on realistic Big Data workloads
- Work through iterative code reviews with senior compiler engineers
Project 2: Extend Xonai Custom Compiler for Optimised String Processing
The Xonai accelerator has multiple string processing APIs that can be consolidated into a compiler-native approach for simpler and more optimised string processing. This project creates a custom MLIR type and operations that better handle large-scale string workloads commonly seen in Spark.
What you’ll do day to day
- Explore existing string processing APIs and why Big Data workloads stress them
- Design a specialised MLIR type tailored for distributed string operations
- Adapt existing MLIR data merging operations to support the new MLIR type
- Ensure those operations lower correctly through the pipeline toward LLVM
- Write unit tests and end-to-end compiler integration tests
- Use sample string-heavy datasets to measure performance differences
- Collaborate with the engine team to see how the improved string logic influences end-to-end Spark jobs
Project 3: Extend Xonai Custom Compiler for Optimised Boolean Arrays
Boolean arrays are common in query filters, mask operations and nullability just to name a few. Storing them naively wastes huge amounts of memory at scale. This project focuses on optimising how boolean/bit arrays are represented and processed.
What you’ll do day to day
- Investigate how boolean arrays are currently stored within Xonai’s dialect
- Profile memory usage in real workloads to identify inefficiencies
- Design optimisations such as bitpacking or alternative memory layouts
- Update compiler operations to correctly read/write the new boolean array representation
- Ensure lowering to LLVM remains correct and efficient end-to-end
- Conduct memory and performance tradeoff experiments
- Document your changes clearly for future maintainability
Project 4: Implement SQL Functions in C++ for Invocation from the DSL via FFI
Modern data pipelines rely heavily on complex imperative aggregate and complex string processing functions . This project involves implementing these functions in C++ or interfacing with existing libraries and exposing them to Xonai’s domain-specific language.
What you’ll do day to day
- Identify target SQL functions and define their behaviour
- Implement the functions in optimised C++
- Build and maintain Foreign Function Interface (FFI) bindings
- Integrate these functions so they can be invoked seamlessly from within the DSL and Spark SQL
- Test correctness using real and synthetic datasets
- Compare results and performance against Spark built-in operations
Person spec
You may be a good fit if you:
- Are in your final year of an ARU degree in Computer Science or a related field
- Have experience with performance-driven personal, academic or professional projects
- Have experience with statically-typed compiled languages (in particular C++)
- Solve challenging problems independently and know when to pull others in
Strong candidates may have:
- Entrepreneurial spirit and previous internship experience in early stage startups
- Experience or familiarisation with modern compiler infrastructure such as LLVM
- Contributed to popular open-source projects (preferably in the domain of compilers)
Learning outcomes
At the end of the internship the successful intern should be able to demonstrate:
- Gained hands-on experience with MLIR, a modern compiler infrastructure in production use today
- Contributed to a real C++ codebase that processes petabyte-scale data in production
- Developed an understanding of how compiler techniques apply to Big Data query execution
- Owned a project end-to-end from design and implementation
- Worked directly with a founding team building a novel solution to optimise data infrastructure at scale
How to apply: Please click on the apply button and ensure your CV is up to date and showcases your relevant skills and experience for the role.
Important notice: When making your application, feel free to use AI to enhance your CV and cover letter, but make sure to include REAL-LIFE EXAMPLES that clearly demonstrate your skills, experience, and knowledge. Applications that appear overly generic or lack evidence will be referred to the Employability and Careers team for further support and will not progress to the next stage of recruitment.