Sagnik Majumder

I am a PhD student in Computer Science at UT Austin, advised by Prof. Kristen Grauman. Before this, I received my MS in Computer Science at UT. I am broadly interested in computer vision and machine learning. My current line of research is embodied audio-visual understanding of 3D scenes with applications in mobile robotics and AR/VR.

Previously, I have worked with Prof. Visvanathan Ramesh at Goethe University on continual and meta learning for image recognition tasks. I have also had the pleasure of collaborating with Prof. Christoph Malsburg at the Frankfurt Institute for Advanced Studies for investigating visual models that draw motivation from Neuroscience.

Earlier, I graduated from BITS Pilani.

Industrial research internships: I am looking for industrial research internships in audio-visual/multi-modal learning for Summer 2025. Kindly reach out if you think I would be a good fit.

Research collaborations: I am open to collaborating on research projects, and also mentoring Master's and final (senior) year students on their thesis. Shoot me an email to discuss more.

CV | E-Mail | Google Scholar | Github | Twitter

Affiliations
                   
BITS Pilani
2014-2018
FIAS
Summer 2017
Goethe University
2018-2019
Meta AI
2022-present
UT Austin
2019-present

News
July 2024 ActiveRIR is selected as Oral at IROS 24.
June 2024 Invited talks at Sight and Sound and EgoVis workshops at CVPR on "Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos".
May 2024 ActiveRIR is covered by TechXplore.
May 2024 Ego-Exo4D is selected as Oral (0.8% selection rate) at CVPR 24.
Mar 2024 Two papers, Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos and Ego-Exo4D got accepted at CVPR 24.
June 2023 Invited talk at CVPR 23 Sight and Sound Workshop on "Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations".
May 2023 Invited talk at JHU NSA Lab on "Efficiently understanding 3D scenes using sight and sound".
Mar 2023 Our paper Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations. got accepted at CVPR 23.
Feb 2023 Co-organzing the CVPR 2023 SoundSpaces Challenge and the Embodied AI Workshop.
Dec 2022 Starting as a visiting researcher at Meta AI Research.
Oct 2022 Invited talk at ECCV 22 AV4D Workshop on "Active Audio-Visual Separation of Dynamic Sound Sources".
Sept 2022 Our paper Few-Shot Audio-Visual Learning of Environment Acoustics got accepted at NeurIPS 22.
Sept 2022 Continuing as a student researcher at Meta Reality Labs Redmond this Fall.
July 2022 Our paper Active Audio-Visual Separation of Dynamic Sound Sources got accepted at ECCV 22.
June 2022 Invited talk at CVPR 22 Sight and Sound Workshop on "Active Audio-Visual Separation of Dynamic Sound Sources" (Slides).
June 2022 Joined Meta Reality Labs Redmond as a research scientist intern this summer.
April 2022 Invited talk at Meta AI Research on "Active Audio-Visual Separation of Dynamic Sound Sources" (Slides).
Feb 2022 Co-organzing the SoundSpaces Challenge at the CVPR 2022 Embodied AI Workshop.
Publications
sym

[NEW] ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling
Arjun Somayazulu, Sagnik Majumder, Changan Chen, Kristen Grauman
IROS 2024 (Oral)
paper | project | Media coverage: TechXplore

sym

[NEW] Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
CVPR 2024
paper | project | code and data

sym

[NEW] Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, ..., Sagnik Majumder, ..., Mike Zheng Shou, Michael Wray
CVPR 2024 (Oral)
paper | project

sym

Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations
Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman*, Vamsi Krishna Ithapu*
*Equal contribution
CVPR 2023
paper | project | code and data

sym

Retrospectives on the Embodied AI Workshop
Matt Deitke, Dhruv Batra, Yonatan Bisk, ..., Sagnik Majumder, ..., Luca Weihs, Jiajun Wu
arXiv
paper

sym

Few-Shot Audio-Visual Learning of Environment Acoustics
Sagnik Majumder, Changan Chen*, Ziad Al-Halah*, Kristen Grauman
*Equal contribution
NeurIPS 2022
paper | project | code and data

sym

Active Audio-Visual Separation of Dynamic Sound Sources
Sagnik Majumder, Kristen Grauman
ECCV 2022
paper | project | code and data

sym

Move2Hear: Active Audio-Visual Source Separation
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
ICCV 2021
paper | project | code and data

sym

Learning to Set Waypoints for Audio-Visual Navigation
Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman
ICLR 2021
paper | project | code

sym

Model Agnostic Answer Reranking System for Adversarial Question Answering
Sagnik Majumder, Chinmoy Samant, Greg Durrett
EACL SRW 2021
paper

sym

Meta-learning Convolutional Neural Architectures for Multi-target Concrete Defect Classification with the COncrete DEfect BRidge IMage Dataset
Martin Mundt, Sagnik Majumder, Sreenivas Murali, Panagiotis Panetsos, Visvanathan Ramesh
CVPR 2021
paper | code and data

sym

Unified Probabilistic Deep Continual Learning through Generative Replay and Open Set Recognition
Martin Mundt, Sagnik Majumder, Iuliia Pliushch, Visvanathan Ramesh
arXiv 2019
paper | code

sym

Open Set Recognition Through Deep Neural Network Uncertainty: Does Out-of-Distribution Detection Require Generative Classifiers?
Martin Mundt, Iuliia Pliushch, Sagnik Majumder, Visvanathan Ramesh
ICCV SDLCV Workshop 2019
paper | code

sym

Rethinking Layer-wise Feature Amounts in Convolutional Neural Network Architectures
Martin Mundt, Sagnik Majumder, Tobias Weis, Visvanathan Ramesh
NeurIPS CRACT Workshop 2018
paper | code

sym

Handwritten Digit Recognition by Elastic Matching
Sagnik Majumder, Christoph von der Malsburg, Aashish Richhariya, Surekha Bhanot
JCP 2018
paper | code


Template credits: Unnat, Changan and Jon