Ayaan Haque

Hi! I'm Ayaan Haque, a 20 y/o researcher. I'm currently a student researcher at Google DeepMind working on video models. Previously, I spent over a year as a research scientist at Luma AI. I worked on the training team for Dream Machine, Luma's video generation foundation model! Before Dream Machine, I worked on Genie, Luma's 3D foundation model.

I'm also a student at UC Berkeley studying EECS, where I'm advised by Angjoo Kanazawa. In the past, I worked on self-supervised and unsupervised representation learning. I interned at Samsung SDSA, and got my research career jumpstarted back in high school with Wang Group at Stanford.

In a past life, I was a builder and hacker (I'm a MLH Top-50 Hacker!), and now I'm exploring deep-tech startups. Other than that, I enjoy writing, watching/playing sports, eating out with friends, and just having a good time. My ongoing goal and dream:

Twitter  /  Email  /  Google Scholar  /  Github  /  LinkedIn

profile photo
Learning about learning πŸ’―
Updates

  • [Sep 2024] Returning to school at Berkeley, joining Google DeepMind as a student researcher!
  • [June 2024] We released Dream Machine, Luma's video generation model, available to everyone now!
  • [Jan 2024] Joining Luma (now a Series B company) full-time!
  • [Nov 2023] We We released Genie, Luma's 3D foundation model!
  • [Oct 2023] Gave oral talk on Instruct-NeRF2NeRF at ICCV in Paris!
  • [July 2023] Instruct-NeRF2NeRF accepted to ICCV 2023 (Oral)!
  • [May 2023] Joining Luma AI, a Series A startup building the future of 3D!
  • [Mar 2023] Released new pre-print Instruct-NeRF2NeRF!
  • [Aug 2022] Starting my undergrad at UC Berkeley to study EECS!

Foundation Models
Dream Machine

Dream Machine is a video generation model, built on a highly scalable and efficient transformer model trained directly on videos. Dream Machine is capable of generating physically accurate, consistent and eventful shots of 120 frames in under 120 seconds, and is a first step towards building a universal imagination engine.

Genie

Genie is a 3D foundation model that can generate high-fidelity 3D objects from text instructions. Genie generates objects in under 10 seconds, and can be refined into higher quality assets.

Research

Since I've worked on a wide variety of topics (and am still exploring new topics), I've split my publications into currently relevant works and previous works. Only relevant works are listed below. For a full list (and chronological list) of my papers, visit my Google Scholar.

Instruct-NeRF2NeRF: Editing 3D Scenes with Instructions
Ayaan Haque, Matthew Tancik, Alexei A. Efros, Aleksander Holynski, Angjoo Kanazawa
UC Berkeley
ICCV, 2023 (Oral Presentation)
Project Page / ArXiv / Oral / Code

We propose a method for editing NeRF scenes with text-instructions. Given a NeRF of a scene and the collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D scene that respects the edit instruction.

Self-Supervised Contrastive Representation Learning for 3D Mesh Segmentation
Ayaan Haque, Hankyu Moon, Heng Hao, Sima Didari, Jae Oh Woo, Patrick Bangert
Samsung SDS Research America
AAAI, 2023
ArXiv

We introduce self-supervised MeshCNN, or SSL-MeshCNN, a novel mesh-specialized contrastive learning method to perform downstream segmentation with limited-labeled data. We create an augmentation policy tailored for meshes, enabling the network to learn efficient visual representations through contrastive pre-training.

EC-GAN: Low-Sample Classification using Semi-Supervised Algorithms and GANs
Ayaan Haque
Saratoga High School
AAAI, 2021 (Best Student Abstract Finalist, Oral Presentation)
Project Page / ArXiv / Oral / Code / Blog

We propose EC-GAN, which combines a Generative Adversarial Network with a classifier to leverage artifical GAN generations to increase the size of restricted, fully-supervised datasets using semi-supervised algorithms. Mentored by Microsoft Postdoc and Princeton University PhD Jordan T. Ash.

Previous Research
Deep Learning for Suicide and Depression Identification with Unsupervised Label Correction
Ayaan Haque*1, Viraaj Reddi*1, Tyler Giallanza2
Saratoga High School1, Princeton University2
ICANN, 2021
Project Page / ArXiv / Teaser Video / Code / Blog

We propose SDCNL to address the unexplored problem of classifying between depression and more severe suicidal tendencies using web-scraped data. Our method introduces a novel label correction method to remove inherent noise in web-scraped data using unsupervised learning combined with a deep-learning classifier based on pre-trained transformers.

MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images
Ayaan Haque*1, Abdullah-Al-Zubaer Imran*2,3, Adam Wang2, Demetri Terzopoulos3,4
Saratoga High School1, Stanford University2, University of California, Los Angeles3, VoxelCloud Inc.4
IEEE ISBI, 2021
Project Page / ArXiv / Code / Blog

We introduce MultiMix, a joint semi-supervised classification and segmentation model employing a confidence-based augmentation strategy for semi-supervised classification along with a novel saliency bridge module that guides segmentation and provides explainability for the joint tasks.

Noise2Quality: Non-Reference, Pixel-Wise Assessment of Low Dose CT Image Quality
Ayaan Haque1, 2, Adam Wang2, Abdullah-Al-Zubaer Imran2
Saratoga High School1, Stanford University2
SPIE Medical Imaging (SPIE), 2022
Project Page / Paper / Code

We propose Noise2Quality (N2Q), a novel, self-supervised IQA model which predicts SSIM Image Quality maps from low-dose CT. We propose a self-supervised regularization task of dose-level estimation creating a multi-tasking framework to improve performance.

Window Level is a Strong Denoising Surrogate
Ayaan Haque1, 2, Adam Wang2, Abdullah-Al-Zubaer Imran2
Saratoga High School1, Stanford University2
MICCAI MLMI, 2021
Project Page / ArXiv / Code / Blog

We introduce SSWL-IDN, a novel self-supervised CT denoising window-level prediction surrogate task. Our method is task-relevant and related to the downstream task, yielding improved performance over recent methods.

Generalized Multi-Task Learning from Substantially Unlabeled Multi-Source Medical Image Data
Ayaan Haque1, 2, Abdullah-Al-Zubaer Imran2,3, Adam Wang2, Demetri Terzopoulos3,4
Saratoga High School1, Stanford University2, University of California, Los Angeles3, VoxelCloud Inc.4
MELBA, 2021
Project Page / Journal Page / Paper / Code

We expand upon MultiMix (in ISBI 2021). Our extended manuscript contains a detailed explanation of the methods, saliency map visualizations from multiple datasets, and quantitative (performance metrics tables) and qualitative (mask predictions, Bland Altman plots, ROC curves, consistency plots).

Activities

Some activities I used to really enjoy.

Hackathons
Team Captain May 2019 - June 2022

  • πŸ† 33x Award Winner, 9x First Place, 22x Top 3, $10,000+ in earnings
  • Chosen for MLH Top 50 Hackers Class of 2021, one of five high schoolers
  • Shoutout Viraaj, Adithya, Ishaan, and Sajiv

Projects

I've just listed a few of my favorite projects, and the remaining are available on my Github. Check out this cool commit graph.

Instruct-GS2GS
Instruction-based editing of Gaussian Splatting scenes
Nov 2023 - Jan 2024
Website / Paper / Github

We propose a method for editing 3D Gaussian Splatting (3DGS) scenes with text-instructions. This project is a follow-up to our previous work, Instruct-NeRF2NeRF. We improve the visual quality of edits, the training time, and rendering speed of the model.

Stack: Python, PyTorch

Nerfstudio
A collaboration friendly studio for NeRFs
Jan 2023 - May 2023
Website / Github

Implemented DreamFusion, CLIP-NeRF in Nerfstudio, an open-source project for NeRF development. Contributed (in small parts) to large-scale open-source project by reviewing code, writing documentation, and implementing research methods.

Stack: Python, PyTorch

Tickbird
Streamlined prescription analysis for visually impaired patients (Available on the App Store)
September 2019 - June 2020
Website / App Store / Demo / Github / Devpost

Tickbird is an advanced Swift mobile app allowing visually impaired patients to aurally understand their prescriptions or the labels on their pill bottles.

Stack: Swift, Xcode, IOS, Firebase, TesseractOCR, Ruby

Writing

I used to write a lot on Medium, here are some of my favorite stories:

In Response to β€œWhat’s the F-ing Point?”
October 6th, 2021

A response to an article discussing our purpose in this world combined with a discussion of my own purpose

This article is a reponse to my friend's article, where he discusses critiques of our Saratoga society. In my article, I respond to his ideas and then share my own story of finding my purpose in life.

How Five High-Schoolers Won $9.5K From Hackathons in One Summer
Better Programming August 28th, 2020

Coding, winning prizes, and proving ourselves

Authored by Ayaan Haque, Adithya Peruvemba, Viraaj Reddi, Sajiv Shah, and Ishaan Bhandari

This article travels through the journey of my team, Haleakala Hacksquad, and how we became great hackers.


Website template from Jon Barron