Grounded-SAM-2/README.md

<div align="center">
<img align="left" width="100" height="100" src="https://github.com/user-attachments/assets/1834fc25-42ef-4237-9feb-53a01c137e83" alt="">

# SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

[Cheng-Yen Yang](https://yangchris11.github.io), [Hsiang-Wei Huang](https://hsiangwei0903.github.io/), [Wenhao Chai](https://rese1f.github.io/), [Zhongyu Jiang](https://zhyjiang.github.io/#/), [Jenq-Neng Hwang](https://people.ece.uw.edu/hwang/)

[Information Processing Lab, University of Washington](https://ipl-uw.github.io/) 
</div>


[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-lasot-ext)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot-ext?p=samurai-adapting-segment-anything-model-for-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-got-10k)](https://paperswithcode.com/sota/visual-object-tracking-on-got-10k?p=samurai-adapting-segment-anything-model-for-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-needforspeed)](https://paperswithcode.com/sota/visual-object-tracking-on-needforspeed?p=samurai-adapting-segment-anything-model-for-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=samurai-adapting-segment-anything-model-for-1)
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-otb-2015)](https://paperswithcode.com/sota/visual-object-tracking-on-otb-2015?p=samurai-adapting-segment-anything-model-for-1)

[[Arxiv]](https://arxiv.org/abs/2411.11922) [[Project Page]](https://yangchris11.github.io/samurai/) [[Raw Results]](https://drive.google.com/drive/folders/1ssiDmsC7mw5AiItYQG4poiR1JgRq305y?usp=sharing) 

This repository is the official implementation of SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory

https://github.com/user-attachments/assets/9d368ca7-2e9b-4fed-9da0-d2efbf620d88

## Getting Started

#### SAMURAI Installation 

SAM 2 needs to be installed first before use. The code requires `python>=3.10`, as well as `torch>=2.3.1` and `torchvision>=0.18.1`. Please follow the instructions [here](https://github.com/facebookresearch/sam2?tab=readme-ov-file) to install both PyTorch and TorchVision dependencies. You can install **the SAMURAI version** of SAM 2 on a GPU machine using:
```
cd sam2
pip install -e .
pip install -e ".[notebooks]"
```

Please see [INSTALL.md](https://github.com/facebookresearch/sam2/blob/main/INSTALL.md) from the original SAM 2 repository for FAQs on potential issues and solutions.
```
pip install matplotlib==3.7 tikzplotlib jpeg4py opencv-python lmdb pandas scipy
```

#### SAM 2.1 Checkpoint Download

```
cd checkpoints && \
./download_ckpts.sh && \
cd ..
```

#### Data Preparation

Please prepare the data in the following format:
```
data/LaSOT
├── airplane/
│   ├── airplane-1/
│   │   ├── full_occlusion.txt
│   │   ├── groundtruth.txt
│   │   ├── img
│   │   ├── nlp.txt
│   │   └── out_of_view.txt
│   ├── airplane-2/
│   ├── airplane-3/
│   ├── ...
├── basketball
├── bear
├── bicycle
...
├── training_set.txt
└── testing_set.txt
```

#### Main Inference
```
python scripts/main_inference.py 
```

## Demo on Custom Video

To run the demo with your custom video or frame directory, use the following examples:

**Note:** The `.txt` file contains a single line with the bounding box of the first frame in `xywh` format.

### Input is Video File

```
python scripts/demo.py --video_path <your_video.mp4> --txt_path <path_to_first_frame_bbox.txt>
```

### Input is Frame Folder
```
python scripts/demo.py --video_path <your_frame_directory> --txt_path <path_to_first_frame_bbox.txt>
```

## Acknowledgment

SAMURAI is built on top of [SAM 2](https://github.com/facebookresearch/sam2?tab=readme-ov-file) by Meta FAIR.

The VOT evaluation code is modifed from [VOT Toolkit](https://github.com/votchallenge/toolkit) by Luka Čehovin Zajc.

## Citation

Please consider citing our paper and the wonderful `SAM 2` if you found our work interesting and useful.
```
@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}

@misc{yang2024samurai,
      title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory}, 
      author={Cheng-Yen Yang and Hsiang-Wei Huang and Wenhao Chai and Zhongyu Jiang and Jenq-Neng Hwang},
      year={2024},
      eprint={2411.11922},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.11922}, 
}
```
Update README.md 2024-11-20 12:40:21 -08:00			`<div align="center">`
Update README.md 2024-11-20 00:34:15 -08:00			`<img align="left" width="100" height="100" src="https://github.com/user-attachments/assets/1834fc25-42ef-4237-9feb-53a01c137e83" alt="">`

			`# SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory`
Update README.md 2024-11-06 14:57:44 -08:00
Update README.md 2024-11-20 12:40:21 -08:00			`[Cheng-Yen Yang](https://yangchris11.github.io), [Hsiang-Wei Huang](https://hsiangwei0903.github.io/), [Wenhao Chai](https://rese1f.github.io/), [Zhongyu Jiang](https://zhyjiang.github.io/#/), [Jenq-Neng Hwang](https://people.ece.uw.edu/hwang/)`
Update README.md 2024-11-21 11:50:29 -08:00
add lab website 2024-11-21 14:35:35 -08:00			`[Information Processing Lab, University of Washington](https://ipl-uw.github.io/)`
Update README.md 2024-11-20 12:40:21 -08:00			`</div>`

Update README.md 2024-11-21 11:50:29 -08:00
fix badge 2024-11-21 13:59:53 -08:00			`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-lasot-ext)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot-ext?p=samurai-adapting-segment-anything-model-for-1)`
			`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-got-10k)](https://paperswithcode.com/sota/visual-object-tracking-on-got-10k?p=samurai-adapting-segment-anything-model-for-1)`
			`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-needforspeed)](https://paperswithcode.com/sota/visual-object-tracking-on-needforspeed?p=samurai-adapting-segment-anything-model-for-1)`
			`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=samurai-adapting-segment-anything-model-for-1)`
			`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/samurai-adapting-segment-anything-model-for-1/visual-object-tracking-on-otb-2015)](https://paperswithcode.com/sota/visual-object-tracking-on-otb-2015?p=samurai-adapting-segment-anything-model-for-1)`
Update README.md 2024-11-19 22:58:30 -08:00
Update README.md 2024-11-20 12:40:21 -08:00			`[[Arxiv]](https://arxiv.org/abs/2411.11922) [[Project Page]](https://yangchris11.github.io/samurai/) [[Raw Results]](https://drive.google.com/drive/folders/1ssiDmsC7mw5AiItYQG4poiR1JgRq305y?usp=sharing)`
Update README.md 2024-11-06 14:57:44 -08:00
			`This repository is the official implementation of SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory`

Update README.md 2024-11-17 01:12:09 -08:00			`https://github.com/user-attachments/assets/9d368ca7-2e9b-4fed-9da0-d2efbf620d88`
Update README.md 2024-11-06 14:57:44 -08:00
[update] main inference script 2024-11-19 22:30:23 -08:00			`## Getting Started`
Add video and readme 2024-11-17 01:11:26 -08:00
[update] main inference script 2024-11-19 22:30:23 -08:00			`#### SAMURAI Installation`

			SAM 2 needs to be installed first before use. The code requires `python>=3.10`, as well as `torch>=2.3.1` and `torchvision>=0.18.1`. Please follow the instructions [here](https://github.com/facebookresearch/sam2?tab=readme-ov-file) to install both PyTorch and TorchVision dependencies. You can install the SAMURAI version of SAM 2 on a GPU machine using:
			```
			`cd sam2`
			`pip install -e .`
			`pip install -e ".[notebooks]"`
			```

			`Please see [INSTALL.md](https://github.com/facebookresearch/sam2/blob/main/INSTALL.md) from the original SAM 2 repository for FAQs on potential issues and solutions.`
			```
Update README.md 2024-11-20 18:23:24 -08:00			`pip install matplotlib==3.7 tikzplotlib jpeg4py opencv-python lmdb pandas scipy`
[update] main inference script 2024-11-19 22:30:23 -08:00			```

			`#### SAM 2.1 Checkpoint Download`

			```
			`cd checkpoints && \`
			`./download_ckpts.sh && \`
			`cd ..`
			```

			`#### Data Preparation`

			`Please prepare the data in the following format:`
			```
			`data/LaSOT`
			`├── airplane/`
			`│ ├── airplane-1/`
			`│ │ ├── full_occlusion.txt`
			`│ │ ├── groundtruth.txt`
			`│ │ ├── img`
			`│ │ ├── nlp.txt`
			`│ │ └── out_of_view.txt`
			`│ ├── airplane-2/`
			`│ ├── airplane-3/`
			`│ ├── ...`
			`├── basketball`
			`├── bear`
			`├── bicycle`
			`...`
			`├── training_set.txt`
			`└── testing_set.txt`
			```

			`#### Main Inference`
			```
			`python scripts/main_inference.py`
			```
Update README.md 2024-11-06 14:57:44 -08:00
Update demo instruction 2024-11-21 20:48:55 -08:00			`## Demo on Custom Video`
Update demo instruction 2024-11-21 20:46:19 -08:00
			`To run the demo with your custom video or frame directory, use the following examples:`

			Note: The `.txt` file contains a single line with the bounding box of the first frame in `xywh` format.

			`### Input is Video File`

			```
			`python scripts/demo.py --video_path <your_video.mp4> --txt_path <path_to_first_frame_bbox.txt>`
			```

			`### Input is Frame Folder`
			```
Update demo instruction 2024-11-21 20:47:37 -08:00			`python scripts/demo.py --video_path <your_frame_directory> --txt_path <path_to_first_frame_bbox.txt>`
Update demo instruction 2024-11-21 20:46:19 -08:00			```

Update README.md 2024-11-06 14:57:44 -08:00			`## Acknowledgment`

			`SAMURAI is built on top of [SAM 2](https://github.com/facebookresearch/sam2?tab=readme-ov-file) by Meta FAIR.`

			`The VOT evaluation code is modifed from [VOT Toolkit](https://github.com/votchallenge/toolkit) by Luka Čehovin Zajc.`

			`## Citation`

			Please consider citing our paper and the wonderful `SAM 2` if you found our work interesting and useful.
			```
			`@article{ravi2024sam2,`
			`title={SAM 2: Segment Anything in Images and Videos},`
			`author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},`
			`journal={arXiv preprint arXiv:2408.00714},`
			`url={https://arxiv.org/abs/2408.00714},`
			`year={2024}`
			`}`

Update README.md 2024-11-19 20:02:45 -08:00			`@misc{yang2024samurai,`
			`title={SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory},`
			`author={Cheng-Yen Yang and Hsiang-Wei Huang and Wenhao Chai and Zhongyu Jiang and Jenq-Neng Hwang},`
			`year={2024},`
			`eprint={2411.11922},`
			`archivePrefix={arXiv},`
			`primaryClass={cs.CV},`
			`url={https://arxiv.org/abs/2411.11922},`
Update README.md 2024-11-06 14:57:44 -08:00			`}`
			```