Ayaan Haque

Hi! I'm Ayaan Haque, a researcher at OpenAI working on GPT Image Generation. I previously was at Luma AI working on training multimodal foundation models. I worked on the training team for Dream Machine, Luma's video generation foundation model.

Email · Twitter · GitHub · Scholar · LinkedIn

Foundation Models

Dream Machine

Dream Machine is a video generation model, built on a highly scalable and efficient transformer model trained directly on videos. Dream Machine is capable of generating physically accurate, consistent and eventful shots of 120 frames in under 120 seconds, and is a first step towards building a universal imagination engine.

Genie

Genie is a 3D foundation model that can generate high-fidelity 3D objects from text instructions. Genie generates objects in under 10 seconds, and can be refined into higher quality assets.

Selected Publications

Terminal Velocity Matching

Linqi Zhou, Mathias Parger, Ayaan Haque, Jiaming Song
ArXiv, 2025

Terminal Velocity Matching (TVM) is a scalable, single-stage generative training method that delivers diffusion-level quality with a 25× fewer inference steps, now trained at 10B+ scale.

Project Page (Blog) / ArXiv / Code

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions

Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa
UC Berkeley
ICCV, 2023 (Oral Presentation)

We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction.

Project Page / ArXiv / Oral / Code

Learning about learning 💯

Hi! I'm Ayaan Haque, a 21 year old researcher at OpenAI working on GPT Image Generation. I previously was at Luma AI (now a Series C company). I worked on the training team for Dream Machine, Luma's video generation foundation model! More recently, my work has focused on multimodal (video, audio, text) pre-training, real-time autoregressive diffusion models, model architecture design, and training algorithms. In the past, I was a student researcher at Google DeepMind working on autoregressive video generation. Before Dream Machine, I worked on Genie, Luma's 3D foundation model.

I started my MS in EECS at UC Berkeley with Prof. Alexei Efros and will graduate in Spring 2026. I completed my BS in EECS at UC Berkeley in Spring 2025. In the past, I worked on self-supervised and unsupervised representation learning. I interned at Samsung SDSA, and got my research career jumpstarted back in high school with Wang Group at Stanford.

In a past life, I was a builder and hacker (I'm a MLH Top-50 Hacker!), and now I'm exploring deep-tech startups. Other than that, I enjoy writing, watching/playing sports, eating out with friends, and just having a good time.

Email · Twitter · GitHub · Scholar · LinkedIn

Projects

I've just listed a few of my favorite projects, and the remaining are available on my Github.

Instruct-GS2GS

Instruction-based editing of Gaussian Splatting scenes
Nov 2023 - Jan 2024

We propose a method for editing 3D Gaussian Splatting (3DGS) scenes with text-instructions. This project is a follow-up to our previous work, Instruct-NeRF2NeRF. We improve the visual quality of edits, the training time, and rendering speed of the model.

Website / Paper / Github

Stack: Python, PyTorch

Nerfstudio

A collaboration friendly studio for NeRFs
Jan 2023 - May 2023

Implemented DreamFusion, CLIP-NeRF in Nerfstudio, an open-source project for NeRF development. Contributed (in small parts) to large-scale open-source project by reviewing code, writing documentation, and implementing research methods.

Website / Github

Stack: Python, PyTorch

Tickbird

Streamlined prescription analysis for visually impaired patients (Available on the App Store)
September 2019 - June 2020

Tickbird is an advanced Swift mobile app allowing visually impaired patients to aurally understand their prescriptions or the labels on their pill bottles.

Website / App Store / Demo / Github / Devpost

Stack: Swift, Xcode, IOS, Firebase, TesseractOCR, Ruby