Sateesh Kumar

I am a CS Ph.D. student at The University of Texas at Austin, advised by Prof. Roberto Martín-Martín and Prof. Georgios Pavlakos. My research focuses on the intersection of Robotics and Computer Vision. I I hold a Master's degree from The University of California, San Diego where I was advised by Prof. Xiaolong Wang and a Bachelor's degree from FAST NUCES Karachi.

In addition to my academic pursuits, I have gained industry experience as a Generative AI Researcher at TikTok and as a Computer Vision Research Engineer at Retrocausal.

Email / CV / Google Scholar / Twitter

Publications and Preprints

Papers are in reverse chronological order. '*' denotes equal contribution.

	The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering Haichao Yu, Yu Tian, Sateesh Kumar, Linjie Yang, Heng Wang International Conference on Computer Vision (ICCV) DataComp Workshop , 2023 (Ranked 1st in DataComp challenge) arXiv We introduce a three-stage filtering strategy for enhancing model performance. It focuses on single-modality filtering, cross-modality filtering, and data distribution alignment. The proposed approach significantly surpasses previous methods on the DataComp benchmark.
	Graph Inverse Reinforcement Learning from Diverse Videos Sateesh Kumar, Jonathan Zamora, Nicklas Hansen, Rishabh Jangir, Xiaolong Wang Conference on Robot Learning (CoRL) , 2022 (Oral) project page / arXiv GraphIRL is a self-supervised method for learning a task reward solely from videos. We build an object-centric graph abstraction from video demonstrations and then learn an embedding space that captures task progression in a self-supervised manner by exploiting the temporal cue in the videos.
	Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering Sateesh Kumar, Sanjay Haresh, Awais Ahmed, Andrey Konin , M. Zeeshan Zia, Quoc-Huy Tran CVPR, 2022 project page / arXiv We propose temporal optimal transport for jointly learning representations and performing online clustering in an unsupervised manner. The approach learns prototype vectors via backpropogation. The prototype vectors are initialized at random and act as cluster centroids.
	Learning by Aligning Video in Time Sateesh Kumar, Sanjay Haresh, Huseyin Coskun, Shahram N. Syed, Andrey Konin , M. Zeeshan Zia, Quoc-Huy Tran CVPR, 2021 project page / arXiv We propose alignment as pre-text task for self-supervised video representation learning. The proposed approach leverages differentiable dynamic time warping for learning global alignment across pairs of videos.
	Towards Anomaly Detection in Dashcam Videos Sateesh Kumar, Sanjay Haresh, M. Zeeshan Zia Quoc-Huy Tran IV, 2020 talk / arXiv We collect a video dataset of road-based anomalies. We propose an object-object interaction reasoning approach for detecting anomalies without additional supervision. We experiment with reconstruction based and one-class classification based approaches.

Website layout is from Jon Barron