2024-11-20 12:40:21 -08:00
< div align = "center" >
2024-11-20 00:34:15 -08:00
< img align = "left" width = "100" height = "100" src = "https://github.com/user-attachments/assets/1834fc25-42ef-4237-9feb-53a01c137e83" alt = "" >
# SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
2024-11-06 14:57:44 -08:00
2024-11-20 12:40:21 -08:00
[Cheng-Yen Yang ](https://yangchris11.github.io ), [Hsiang-Wei Huang ](https://hsiangwei0903.github.io/ ), [Wenhao Chai ](https://rese1f.github.io/ ), [Zhongyu Jiang ](https://zhyjiang.github.io/#/ ), [Jenq-Neng Hwang ](https://people.ece.uw.edu/hwang/ )
2024-11-21 11:50:29 -08:00
2024-11-21 14:35:35 -08:00
[Information Processing Lab, University of Washington ](https://ipl-uw.github.io/ )
2024-11-20 12:40:21 -08:00
< / div >
2024-11-21 11:50:29 -08:00
2024-11-21 13:59:53 -08:00
[](https://paperswithcode.com/sota/visual-object-tracking-on-lasot-ext?p=samurai-adapting-segment-anything-model-for-1)
[](https://paperswithcode.com/sota/visual-object-tracking-on-got-10k?p=samurai-adapting-segment-anything-model-for-1)
[](https://paperswithcode.com/sota/visual-object-tracking-on-needforspeed?p=samurai-adapting-segment-anything-model-for-1)
[](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=samurai-adapting-segment-anything-model-for-1)
[](https://paperswithcode.com/sota/visual-object-tracking-on-otb-2015?p=samurai-adapting-segment-anything-model-for-1)
2024-11-19 22:58:30 -08:00
2024-11-20 12:40:21 -08:00
[[Arxiv]](https://arxiv.org/abs/2411.11922) [[Project Page]](https://yangchris11.github.io/samurai/) [[Raw Results]](https://drive.google.com/drive/folders/1ssiDmsC7mw5AiItYQG4poiR1JgRq305y?usp=sharing)
2024-11-06 14:57:44 -08:00
This repository is the official implementation of SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory
2024-11-17 01:12:09 -08:00
https://github.com/user-attachments/assets/9d368ca7-2e9b-4fed-9da0-d2efbf620d88
2024-11-06 14:57:44 -08:00
2024-11-19 22:30:23 -08:00
## Getting Started
2024-11-17 01:11:26 -08:00
2024-11-19 22:30:23 -08:00
#### SAMURAI Installation
SAM 2 needs to be installed first before use. The code requires `python>=3.10` , as well as `torch>=2.3.1` and `torchvision>=0.18.1` . Please follow the instructions [here ](https://github.com/facebookresearch/sam2?tab=readme-ov-file ) to install both PyTorch and TorchVision dependencies. You can install **the SAMURAI version** of SAM 2 on a GPU machine using:
```
cd sam2
pip install -e .
pip install -e ".[notebooks]"
```
Please see [INSTALL.md ](https://github.com/facebookresearch/sam2/blob/main/INSTALL.md ) from the original SAM 2 repository for FAQs on potential issues and solutions.
```
2024-11-20 18:23:24 -08:00
pip install matplotlib==3.7 tikzplotlib jpeg4py opencv-python lmdb pandas scipy
2024-11-19 22:30:23 -08:00
```
#### SAM 2.1 Checkpoint Download
```
cd checkpoints & & \
./download_ckpts.sh & & \
cd ..
```
#### Data Preparation
Please prepare the data in the following format:
```
data/LaSOT
├── airplane/
│ ├── airplane-1/
│ │ ├── full_occlusion.txt
│ │ ├── groundtruth.txt
│ │ ├── img
│ │ ├── nlp.txt
│ │ └── out_of_view.txt
│ ├── airplane-2/
│ ├── airplane-3/
│ ├── ...
├── basketball
├── bear
├── bicycle
...
├── training_set.txt
└── testing_set.txt
```
#### Main Inference
```
python scripts/main_inference.py
```
2024-11-06 14:57:44 -08:00
2024-11-21 20:48:55 -08:00
## Demo on Custom Video
2024-11-21 20:46:19 -08:00
To run the demo with your custom video or frame directory, use the following examples:
**Note:** The `.txt` file contains a single line with the bounding box of the first frame in `xywh` format.
### Input is Video File
```
python scripts/demo.py --video_path < your_video.mp4 > --txt_path < path_to_first_frame_bbox.txt >
```
### Input is Frame Folder
```
2024-11-21 20:47:37 -08:00
python scripts/demo.py --video_path < your_frame_directory > --txt_path < path_to_first_frame_bbox.txt >
2024-11-21 20:46:19 -08:00
```
2024-11-06 14:57:44 -08:00
## Acknowledgment
SAMURAI is built on top of [SAM 2 ](https://github.com/facebookresearch/sam2?tab=readme-ov-file ) by Meta FAIR.
The VOT evaluation code is modifed from [VOT Toolkit ](https://github.com/votchallenge/toolkit ) by Luka Čehovin Zajc.
## Citation
Please consider citing our paper and the wonderful `SAM 2` if you found our work interesting and useful.
```
@article {ravi2024sam2,
title={SAM 2: Segment Anything in Images and Videos},
author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
journal={arXiv preprint arXiv:2408.00714},
url={https://arxiv.org/abs/2408.00714},
year={2024}
}
2024-11-19 20:02:45 -08:00
@misc {yang2024samurai,
title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory},
author={Cheng-Yen Yang and Hsiang-Wei Huang and Wenhao Chai and Zhongyu Jiang and Jenq-Neng Hwang},
year={2024},
eprint={2411.11922},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2411.11922},
2024-11-06 14:57:44 -08:00
}
```