Machine Learning Engineer

New Today

Neuphonic is building the future of on-device voice AI. We develop ultra-low-latency neural text-to-speech systems that enable super-realistic, human-like speech directly on devices. Our focus is on building efficient generative audio models that can run on CPU-constrained hardware, enabling real-time voice interaction without relying on large cloud infrastructure. Neuphonic was founded in April 2024 and is backed by leading venture capital firms in Europe. Our customers include OEM handset manufacturers, chip manufacturers, and consumer AI companies building the next generation of voice-enabled products. Our vision is a world where voice becomes the most natural interface for AI, enabling seamless, intuitive interactions that are accessible to everyone. To understand the technology you would be working on, please review our Hugging Face and GitHub repositories, as they will be part of the interview discussion:
https://huggingface.co/neuphonic https://github.com/neuphonic
Role We are looking for a Machine Learning Engineer to help advance the state of the art in speech synthesis. You will work on research and development across the full speech pipeline — from model architecture and training to dataset design and production deployment. The role combines applied research with real-world engineering, working closely with a small team pushing the boundaries of real-time speech systems. We are particularly interested in candidates with experience in text-to-speech systems, or multimodal machine learning involving speech and audio. Your work will include:
Researching and developing state-of-the‑art speech synthesis models Training and optimising models for high‑quality, low‑latency speech generation Building and curating high‑quality proprietary speech datasets Improving model quality, expressiveness, and latency Working closely with engineers to bring research models into production systems Exploring multimodal approaches to speech and conversational AI
This role is best suited to candidates who have worked on research‑grade machine learning models, rather than purely application‑level ML systems. You have
An MSc or PhD in machine learning, speech processing, computer science, or a closely related field Strong experience training and evaluating deep learning models using frameworks such as PyTorch, JAX, or TensorFlow Several years of research or industry experience developing machine learning models (this is not a graduate or entry‑level role)
In addition, you should have experience in one of the following areas:
Text‑to‑speech (TTS) or speech synthesis, including model architecture, training, or evaluation and/or multimodal machine learning involving audio, such as models combining speech, text, or audio modalities Experience working in research‑oriented ML environments, such as academic labs, advanced research teams, or deep‑tech startups Familiarity with state‑of‑the‑art approaches in speech generation, audio modelling, or multimodal systems Experience reading and implementing recent ML research papers Background from top universities, leading research groups, or equivalent research experience A strong interest in speech technology and conversational AI Help shape the company from the ground up – you’ll be joining as part of the founding team and will help define the culture and technical direction. Competitive salary and equity – we want everyone on the team to share meaningfully in the company’s success. Private health insurance – your health and wellbeing are important to us. Conference, travel, and development budget – we want to support your continued growth and ensure you have access to the best resources and research community.
#J-18808-Ljbffr
Location:
Greater London
Job Type:
FullTime

We found some similar jobs based on your search