Sagnik Majumder

I am a PhD student in Computer Science at UT Austin, advised by Prof. Kristen Grauman. Before this, I received my MS in Computer Science at UT. I am broadly interested in computer vision and machine learning. My current line of research is embodied audio-visual understanding of 3D scenes with applications in mobile robotics and AR/VR.

Previously, I have worked with Prof. Visvanathan Ramesh at Goethe University on continual and meta learning for image recognition tasks. I have also had the pleasure of collaborating with Prof. Christoph Malsburg at the Frankfurt Institute for Advanced Studies for investigating visual models that draw motivation from Neuroscience.

Earlier, I graduated from BITS Pilani.

Industrial research internships: I am looking for industrial research internships in audio-visual/multi-modal learning for Summer 2025. Kindly reach out if you think I would be a good fit.

Research collaborations: I am open to collaborating on research projects, and also mentoring Master's and final (senior) year students on their thesis. Shoot me an email to discuss more.

CV | E-Mail | Google Scholar | Github | Twitter

Affiliations
                   
BITS Pilani
2014-2018
FIAS
Summer 2017
Goethe University
2018-2019
Meta AI
2022-present
UT Austin
2019-present

News
Mar 2024 Our paper Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos got accepted at CVPR 24!
Mar 2024 Our paper Ego-Exo4D got accepted at CVPR 24!
Mar 2023 Our paper Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations got accepted at CVPR 23!
Feb 2023 Co-organzing the SoundSpaces Challenge at the CVPR 2023 Embodied AI Workshop.
June 2023 Invited talk at CVPR 23 Sight and Sound Workshop, "Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations".
May 2023 Invited talk at JHU NSA Lab, "Efficiently understanding 3D scenes using sight and sound".
Mar 2023 Our paper Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations. got accepted at CVPR 23!
Feb 2023 Co-organzing the CVPR 2023 SoundSpaces Challenge and the Embodied AI Workshop.
Dec 2022 Starting as a visiting researcher at Meta AI Research.
Oct 2022 Invited talk at ECCV 22 AV4D Workshop, "Active Audio-Visual Separation of Dynamic Sound Sources".
Sept 2022 Our paper Few-Shot Audio-Visual Learning of Environment Acoustics got accepted at NeurIPS 22!
Sept 2022 Continuing as a student researcher at Meta Reality Labs Redmond this Fall.
July 2022 Our paper Active Audio-Visual Separation of Dynamic Sound Sources got accepted at ECCV 22!
June 2022 Invited talk at CVPR 22 Sight and Sound Workshop, "Active Audio-Visual Separation of Dynamic Sound Sources" (Slides).
June 2022 Joined Meta Reality Labs Redmond as a research scientist intern this summer.
April 2022 Invited talk at Meta AI Research, "Active Audio-Visual Separation of Dynamic Sound Sources" (Slides).
Feb 2022 Co-organzing the SoundSpaces Challenge at the CVPR 2022 Embodied AI Workshop.
Publications
sym

[NEW] Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
CVPR 2024
paper | project | code and data (coming soon!)

sym

[NEW] Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, ..., Sagnik Majumder, ..., Mike Zheng Shou, Michael Wray
CVPR 2024
paper | project

sym

[NEW] Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman*, Vamsi Krishna Ithapu*
*Equal contribution
CVPR 2023
paper | project | code and data

sym

Retrospectives on the Embodied AI Workshop
Matt Deitke, Dhruv Batra, Yonatan Bisk, ..., Sagnik Majumder, ..., Luca Weihs, Jiajun Wu
arXiv
paper

sym

Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder, Changan Chen*, Ziad Al-Halah*, Kristen Grauman
*Equal contribution
NeurIPS 2022
paper | project | code and data

sym

Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder, Kristen Grauman
ECCV 2022
paper | project | code and data

sym

Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
ICCV 2021
paper | project | code and data

sym

Learning to Set Waypoints for Audio-Visual Navigation
Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman
ICLR 2021
paper | project | code

sym

Model Agnostic Answer Reranking System for Adversarial Question Answering
Sagnik Majumder, Chinmoy Samant, Greg Durrett
EACL SRW 2021
paper

sym

Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset
Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh
CVPR 2021
paper | code and data

sym

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
Martin Mundt, Sagnik Majumder, Iuliia Pliushch, Visvanathan Ramesh
arXiv 2019
paper | code

sym

Open Set Recognition Through Deep Neural Network Uncertainty: Does Out-of-Distribution Detection Require Generative Classifiers?
Martin Mundt, Iuliia Pliushch, Sagnik Majumder, Visvanathan Ramesh
ICCV SDLCV Workshop 2019
paper | code

sym

Rethinking Layer-wise Feature Amounts in Convolutional Neural Network Architectures
Martin Mundt, Sagnik Majumder, Tobias Weis, Visvanathan Ramesh
NeurIPS CRACT Workshop 2018
paper | code

sym

Handwritten Digit Recognition by Elastic Matching
Sagnik Majumder, Christoph von der Malsburg, Aashish Richhariya, Surekha Bhanot
JCP 2018
paper | code


Template credits: Unnat, Changan and Jon