Grounded-SAM-2/README.md

# SAM 2: Segment Anything in Images and Videos

**[AI at Meta, FAIR](https://ai.meta.com/research/)**

[Nikhila Ravi](https://nikhilaravi.com/), [Valentin Gabeur](https://gabeur.github.io/), [Yuan-Ting Hu](https://scholar.google.com/citations?user=E8DVVYQAAAAJ&hl=en), [Ronghang Hu](https://ronghanghu.com/), [Chaitanya Ryali](https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en), [Tengyu Ma](https://scholar.google.com/citations?user=VeTSl0wAAAAJ&hl=en), [Haitham Khedr](https://hkhedr.com/), [Roman Rädle](https://scholar.google.de/citations?user=Tpt57v0AAAAJ&hl=en), [Chloe Rolland](https://scholar.google.com/citations?hl=fr&user=n-SnMhoAAAAJ), [Laura Gustafson](https://scholar.google.com/citations?user=c8IpF9gAAAAJ&hl=en), [Eric Mintun](https://ericmintun.github.io/), [Junting Pan](https://junting.github.io/), [Kalyan Vasudev Alwala](https://scholar.google.co.in/citations?user=m34oaWEAAAAJ&hl=en), [Nicolas Carion](https://www.nicolascarion.com/), [Chao-Yuan Wu](https://chaoyuan.org/), [Ross Girshick](https://www.rossgirshick.info/), [Piotr Dollár](https://pdollar.github.io/), [Christoph Feichtenhofer](https://feichtenhofer.github.io/)

[[`Paper`](https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/)] [[`Project`](https://ai.meta.com/sam2)] [[`Demo`](https://sam2.metademolab.com/)] [[`Dataset`](https://ai.meta.com/datasets/segment-anything-video)] [[`Blog`](https://ai.meta.com/blog/segment-anything-2)] [[`BibTeX`](#citing-sam-2)]

![SAM 2 architecture](assets/model_diagram.png?raw=true)

**Segment Anything Model 2 (SAM 2)** is a foundation model towards solving promptable visual segmentation in images and videos. We extend SAM to video by considering images as a video with a single frame. The model design is a simple transformer architecture with streaming memory for real-time video processing. We build a model-in-the-loop data engine, which improves model and data via user interaction, to collect [**our SA-V dataset**](https://ai.meta.com/datasets/segment-anything-video), the largest video segmentation dataset to date. SAM 2 trained on our data provides strong performance across a wide range of tasks and visual domains.

![SA-V dataset](assets/sa_v_dataset.jpg?raw=true)

## Installation

SAM 2 needs to be installed first before use. The code requires `python>=3.10`, as well as `torch>=2.3.1` and `torchvision>=0.18.1`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. You can install SAM 2 on a GPU machine using:

```bash
git clone https://github.com/facebookresearch/segment-anything-2.git

cd segment-anything-2 & pip install -e .
```
If you are installing on Windows, it's strongly recommended to use [Windows Subsystem for Linux (WSL)](https://learn.microsoft.com/en-us/windows/wsl/install) with Ubuntu.

To use the SAM 2 predictor and run the example notebooks, `jupyter` and `matplotlib` are required and can be installed by:

```bash
pip install -e ".[notebooks]"
```

Note:
1. It's recommended to create a new Python environment via [Anaconda](https://www.anaconda.com/) for this installation and install PyTorch 2.3.1 (or higher) via `pip` following https://pytorch.org/. If you have a PyTorch version lower than 2.3.1 in your current environment, the installation command above will try to upgrade it to the latest PyTorch version using `pip`.
2. The step above requires compiling a custom CUDA kernel with the `nvcc` compiler. If it isn't already available on your machine, please install the [CUDA toolkits](https://developer.nvidia.com/cuda-toolkit-archive) with a version that matches your PyTorch CUDA version.
3. If you see a message like `Failed to build the SAM 2 CUDA extension` during installation, you can ignore it and still use SAM 2 (some post-processing functionality may be limited, but it doesn't affect the results in most cases).

Please see [`INSTALL.md`](./INSTALL.md) for FAQs on potential issues and solutions.

## Getting Started

### Download Checkpoints

First, we need to download a model checkpoint. All the model checkpoints can be downloaded by running:

```bash
cd checkpoints && \
./download_ckpts.sh && \
cd ..
```

or individually from:

- [sam2.1_hiera_tiny.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt)
- [sam2.1_hiera_small.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt)
- [sam2.1_hiera_base_plus.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt)
- [sam2.1_hiera_large.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt)

(note that these are the improved checkpoints denoted as SAM 2.1; see [Model Description](#model-description) for details.)

Then SAM 2 can be used in a few lines as follows for image and video prediction.

### Image prediction

SAM 2 has all the capabilities of [SAM](https://github.com/facebookresearch/segment-anything) on static images, and we provide image prediction APIs that closely resemble SAM for image use cases. The `SAM2ImagePredictor` class has an easy interface for image prompting.

```python
import torch
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)
```

Please refer to the examples in [image_predictor_example.ipynb](./notebooks/image_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb)) for static image use cases.

SAM 2 also supports automatic mask generation on images just like SAM. Please see [automatic_mask_generator_example.ipynb](./notebooks/automatic_mask_generator_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/automatic_mask_generator_example.ipynb)) for automatic mask generation in images.

### Video prediction

For promptable segmentation and tracking in videos, we provide a video predictor with APIs for example to add prompts and propagate masklets throughout a video. SAM 2 supports video inference on multiple objects and uses an inference state to keep track of the interactions in each video.

```python
import torch
from sam2.build_sam import build_sam2_video_predictor

checkpoint = "./checkpoints/sam2.1_hiera_large.pt"
model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"
predictor = build_sam2_video_predictor(model_cfg, checkpoint)

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):

    # propagate the prompts to get masklets throughout the video
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...
```

Please refer to the examples in [video_predictor_example.ipynb](./notebooks/video_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/video_predictor_example.ipynb)) for details on how to add click or box prompts, make refinements, and track multiple objects in videos.

## Load from 🤗 Hugging Face

Alternatively, models can also be loaded from [Hugging Face](https://huggingface.co/models?search=facebook/sam2) (requires `pip install huggingface_hub`).

For image prediction:

```python
import torch
from sam2.sam2_image_predictor import SAM2ImagePredictor

predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    predictor.set_image(<your_image>)
    masks, _, _ = predictor.predict(<input_prompts>)
```

For video prediction:

```python
import torch
from sam2.sam2_video_predictor import SAM2VideoPredictor

predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large")

with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):
    state = predictor.init_state(<your_video>)

    # add new prompts and instantly get the output on the same frame
    frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):

    # propagate the prompts to get masklets throughout the video
    for frame_idx, object_ids, masks in predictor.propagate_in_video(state):
        ...
```

## Model Description

### SAM 2.1 checkpoints

The table below shows the improved SAM 2.1 checkpoints released on September 29, 2024.
|      **Model**       | **Size (M)** |    **Speed (FPS)**     | **SA-V test (J&F)** | **MOSE val (J&F)** | **LVOS v2 (J&F)** |
| :------------------: | :----------: | :--------------------: | :-----------------: | :----------------: | :---------------: |
|   sam2.1_hiera_tiny <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_t.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt))    |     38.9     |          47.2          |        76.5         |        71.8        |       77.3        |
|   sam2.1_hiera_small <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_s.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt))   |      46      | 43.3 (53.0 compiled\*) |        76.6         |        73.5        |       78.3        |
| sam2.1_hiera_base_plus <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_b+.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt)) |     80.8     | 34.8 (43.8 compiled\*) |        78.2         |        73.7        |       78.2        |
|   sam2.1_hiera_large <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_l.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt))   |    224.4     | 24.2 (30.2 compiled\*) |        79.5         |        74.6        |       80.6        |

### SAM 2 checkpoints

The previous SAM 2 checkpoints released on July 29, 2024 can be found as follows:

|      **Model**       | **Size (M)** |    **Speed (FPS)**     | **SA-V test (J&F)** | **MOSE val (J&F)** | **LVOS v2 (J&F)** |
| :------------------: | :----------: | :--------------------: | :-----------------: | :----------------: | :---------------: |
|   sam2_hiera_tiny <br /> ([config](sam2/configs/sam2/sam2_hiera_t.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt))   |     38.9     |          47.2          |        75.0         |        70.9        |       75.3        |
|   sam2_hiera_small <br /> ([config](sam2/configs/sam2/sam2_hiera_s.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt))   |      46      | 43.3 (53.0 compiled\*) |        74.9         |        71.5        |       76.4        |
| sam2_hiera_base_plus <br /> ([config](sam2/configs/sam2/sam2_hiera_b+.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt)) |     80.8     | 34.8 (43.8 compiled\*) |        74.7         |        72.8        |       75.8        |
|   sam2_hiera_large <br /> ([config](sam2/configs/sam2/sam2_hiera_l.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt))   |    224.4     | 24.2 (30.2 compiled\*) |        76.0         |        74.6        |       79.8        |

\* Compile the model by setting `compile_image_encoder: True` in the config.

## Segment Anything Video Dataset

See [sav_dataset/README.md](sav_dataset/README.md) for details.

## Training SAM 2

You can train or fine-tune SAM 2 on custom datasets of images, videos, or both. Please check the training [README](training/README.md) on how to get started.

## License

The SAM 2 model checkpoints, SAM 2 demo code (front-end and back-end), and SAM 2 training code are licensed under [Apache 2.0](./LICENSE), however the [Inter Font](https://github.com/rsms/inter?tab=OFL-1.1-1-ov-file) and [Noto Color Emoji](https://github.com/googlefonts/noto-emoji) used in the SAM 2 demo code are made available under the [SIL Open Font License, version 1.1](https://openfontlicense.org/open-font-license-official-text/).

## Contributing

See [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).

## Contributors

The SAM 2 project was made possible with the help of many contributors (alphabetical):

Karen Bergan, Daniel Bolya, Alex Bosenberg, Kai Brown, Vispi Cassod, Christopher Chedeau, Ida Cheng, Luc Dahlin, Shoubhik Debnath, Rene Martinez Doehner, Grant Gardner, Sahir Gomez, Rishi Godugu, Baishan Guo, Caleb Ho, Andrew Huang, Somya Jain, Bob Kamma, Amanda Kallet, Jake Kinney, Alexander Kirillov, Shiva Koduvayur, Devansh Kukreja, Robert Kuo, Aohan Lin, Parth Malani, Jitendra Malik, Mallika Malhotra, Miguel Martin, Alexander Miller, Sasha Mitts, William Ngan, George Orlin, Joelle Pineau, Kate Saenko, Rodrick Shepard, Azita Shokrpour, David Soofian, Jonathan Torres, Jenny Truong, Sagar Vaze, Meng Wang, Claudette Ward, Pengchuan Zhang.

Third-party code: we use a GPU-based connected component algorithm adapted from [`cc_torch`](https://github.com/zsef123/Connected_components_PyTorch) (with its license in [`LICENSE_cctorch`](./LICENSE_cctorch)) as an optional post-processing step for the mask predictions.

## Citing SAM 2

If you use SAM 2 or the SA-V dataset in your research, please use the following BibTeX entry.

```bibtex
@article{ravi2024sam2,
  title={SAM 2: Segment Anything in Images and Videos},
  author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},
  journal={arXiv preprint arXiv:2408.00714},
  url={https://arxiv.org/abs/2408.00714},
  year={2024}
}
```
Initial commit 2024-07-29 21:54:20 +00:00			`# SAM 2: Segment Anything in Images and Videos`

			`[AI at Meta, FAIR](https://ai.meta.com/research/)`

			[Nikhila Ravi](https://nikhilaravi.com/), [Valentin Gabeur](https://gabeur.github.io/), [Yuan-Ting Hu](https://scholar.google.com/citations?user=E8DVVYQAAAAJ&hl=en), [Ronghang Hu](https://ronghanghu.com/), [Chaitanya Ryali](https://scholar.google.com/citations?user=4LWx24UAAAAJ&hl=en), [Tengyu Ma](https://scholar.google.com/citations?user=VeTSl0wAAAAJ&hl=en), [Haitham Khedr](https://hkhedr.com/), [Roman Rädle](https://scholar.google.de/citations?user=Tpt57v0AAAAJ&hl=en), [Chloe Rolland](https://scholar.google.com/citations?hl=fr&user=n-SnMhoAAAAJ), [Laura Gustafson](https://scholar.google.com/citations?user=c8IpF9gAAAAJ&hl=en), [Eric Mintun](https://ericmintun.github.io/), [Junting Pan](https://junting.github.io/), [Kalyan Vasudev Alwala](https://scholar.google.co.in/citations?user=m34oaWEAAAAJ&hl=en), [Nicolas Carion](https://www.nicolascarion.com/), [Chao-Yuan Wu](https://chaoyuan.org/), [Ross Girshick](https://www.rossgirshick.info/), [Piotr Dollár](https://pdollar.github.io/), [Christoph Feichtenhofer](https://feichtenhofer.github.io/)

Update README.md 2024-08-02 12:58:23 -07:00			[[`Paper`](https://ai.meta.com/research/publications/sam-2-segment-anything-in-images-and-videos/)] [[`Project`](https://ai.meta.com/sam2)] [[`Demo`](https://sam2.metademolab.com/)] [[`Dataset`](https://ai.meta.com/datasets/segment-anything-video)] [[`Blog`](https://ai.meta.com/blog/segment-anything-2)] [[`BibTeX`](#citing-sam-2)]
Initial commit 2024-07-29 21:54:20 +00:00
			`![SAM 2 architecture](assets/model_diagram.png?raw=true)`

Fix: Hyphenate to "model-in-the-loop" The phrase "model-in-the-loop" is now hyphenated to align with standard practices in technical literature, where hyphenation of compound adjectives clarifies their function as a single descriptor. 2024-07-30 08:35:59 -04:00			Segment Anything Model 2 (SAM 2) is a foundation model towards solving promptable visual segmentation in images and videos. We extend SAM to video by considering images as a video with a single frame. The model design is a simple transformer architecture with streaming memory for real-time video processing. We build a model-in-the-loop data engine, which improves model and data via user interaction, to collect [our SA-V dataset](https://ai.meta.com/datasets/segment-anything-video), the largest video segmentation dataset to date. SAM 2 trained on our data provides strong performance across a wide range of tasks and visual domains.
Initial commit 2024-07-29 21:54:20 +00:00
			`![SA-V dataset](assets/sa_v_dataset.jpg?raw=true)`

			`## Installation`

[doc] add `INSTALL.md` as an installation FAQ page 2024-08-02 21:44:28 +00:00			SAM 2 needs to be installed first before use. The code requires `python>=3.10`, as well as `torch>=2.3.1` and `torchvision>=0.18.1`. Please follow the instructions [here](https://pytorch.org/get-started/locally/) to install both PyTorch and TorchVision dependencies. You can install SAM 2 on a GPU machine using:
Initial commit 2024-07-29 21:54:20 +00:00
			```bash
Change git repo url from SSH to HTTPS The change is made for researchers to easly clone the project to check out from systems and platforms with SSH not in sync with github. Eg : Google Colab, Remote GPU Servers etc 2024-07-31 13:55:06 +05:30			`git clone https://github.com/facebookresearch/segment-anything-2.git`
Initial commit 2024-07-29 21:54:20 +00:00
improving warning message and adding further tips for installation (#204) 2024-08-12 11:37:41 -07:00			`cd segment-anything-2 & pip install -e .`
Initial commit 2024-07-29 21:54:20 +00:00			```
improving warning message and adding further tips for installation (#204) 2024-08-12 11:37:41 -07:00			`If you are installing on Windows, it's strongly recommended to use [Windows Subsystem for Linux (WSL)](https://learn.microsoft.com/en-us/windows/wsl/install) with Ubuntu.`
Initial commit 2024-07-29 21:54:20 +00:00
			To use the SAM 2 predictor and run the example notebooks, `jupyter` and `matplotlib` are required and can be installed by:

			```bash
SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`pip install -e ".[notebooks]"`
Initial commit 2024-07-29 21:54:20 +00:00			```

[doc] add `INSTALL.md` as an installation FAQ page 2024-08-02 21:44:28 +00:00			`Note:`
improving warning message and adding further tips for installation (#204) 2024-08-12 11:37:41 -07:00			1. It's recommended to create a new Python environment via [Anaconda](https://www.anaconda.com/) for this installation and install PyTorch 2.3.1 (or higher) via `pip` following https://pytorch.org/. If you have a PyTorch version lower than 2.3.1 in your current environment, the installation command above will try to upgrade it to the latest PyTorch version using `pip`.
[doc] add `INSTALL.md` as an installation FAQ page 2024-08-02 21:44:28 +00:00			2. The step above requires compiling a custom CUDA kernel with the `nvcc` compiler. If it isn't already available on your machine, please install the [CUDA toolkits](https://developer.nvidia.com/cuda-toolkit-archive) with a version that matches your PyTorch CUDA version.
improving warning message and adding further tips for installation (#204) 2024-08-12 11:37:41 -07:00			3. If you see a message like `Failed to build the SAM 2 CUDA extension` during installation, you can ignore it and still use SAM 2 (some post-processing functionality may be limited, but it doesn't affect the results in most cases).
[doc] add `INSTALL.md` as an installation FAQ page 2024-08-02 21:44:28 +00:00
			Please see [`INSTALL.md`](./INSTALL.md) for FAQs on potential issues and solutions.

Initial commit 2024-07-29 21:54:20 +00:00			`## Getting Started`

			`### Download Checkpoints`

			`First, we need to download a model checkpoint. All the model checkpoints can be downloaded by running:`

			```bash
improving warning message and adding further tips for installation (#204) 2024-08-12 11:37:41 -07:00			`cd checkpoints && \`
			`./download_ckpts.sh && \`
			`cd ..`
Initial commit 2024-07-29 21:54:20 +00:00			```

			`or individually from:`

SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`- [sam2.1_hiera_tiny.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt)`
			`- [sam2.1_hiera_small.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt)`
			`- [sam2.1_hiera_base_plus.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt)`
			`- [sam2.1_hiera_large.pt](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt)`

			`(note that these are the improved checkpoints denoted as SAM 2.1; see [Model Description](#model-description) for details.)`

Initial commit 2024-07-29 21:54:20 +00:00			`Then SAM 2 can be used in a few lines as follows for image and video prediction.`

			`### Image prediction`

			SAM 2 has all the capabilities of [SAM](https://github.com/facebookresearch/segment-anything) on static images, and we provide image prediction APIs that closely resemble SAM for image use cases. The `SAM2ImagePredictor` class has an easy interface for image prompting.

Include original code snippet 2024-08-05 22:08:54 +02:00			```python
			`import torch`
			`from sam2.build_sam import build_sam2`
			`from sam2.sam2_image_predictor import SAM2ImagePredictor`

SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`checkpoint = "./checkpoints/sam2.1_hiera_large.pt"`
			`model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"`
Include original code snippet 2024-08-05 22:08:54 +02:00			`predictor = SAM2ImagePredictor(build_sam2(model_cfg, checkpoint))`

			`with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):`
			`predictor.set_image(<your_image>)`
			`masks, _, _ = predictor.predict(<input_prompts>)`
			```

add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (#176) 2024-08-08 11:03:22 -07:00			`Please refer to the examples in [image_predictor_example.ipynb](./notebooks/image_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb)) for static image use cases.`
Initial commit 2024-07-29 21:54:20 +00:00
add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (#176) 2024-08-08 11:03:22 -07:00			`SAM 2 also supports automatic mask generation on images just like SAM. Please see [automatic_mask_generator_example.ipynb](./notebooks/automatic_mask_generator_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/automatic_mask_generator_example.ipynb)) for automatic mask generation in images.`
Initial commit 2024-07-29 21:54:20 +00:00
			`### Video prediction`

			`For promptable segmentation and tracking in videos, we provide a video predictor with APIs for example to add prompts and propagate masklets throughout a video. SAM 2 supports video inference on multiple objects and uses an inference state to keep track of the interactions in each video.`

			```python
			`import torch`
Revert code snippet 2024-08-06 22:57:07 +02:00			`from sam2.build_sam import build_sam2_video_predictor`
Initial commit 2024-07-29 21:54:20 +00:00
SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`checkpoint = "./checkpoints/sam2.1_hiera_large.pt"`
			`model_cfg = "configs/sam2.1/sam2.1_hiera_l.yaml"`
Revert code snippet 2024-08-06 22:57:07 +02:00			`predictor = build_sam2_video_predictor(model_cfg, checkpoint)`
Initial commit 2024-07-29 21:54:20 +00:00
			`with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):`
			`state = predictor.init_state(<your_video>)`

			`# add new prompts and instantly get the output on the same frame`
Add interface for box prompt in SAM 2 video predictor (#174) This PR adds an example to provide box prompt in SAM 2 as inputs to the `add_new_points_or_box` API (renamed from`add_new_points`, which is kept for backward compatibility). If `box` is provided, we add it as the first two points with labels 2 and 3, along with the user-provided points (consistent with how SAM 2 is trained). The video predictor notebook `notebooks/video_predictor_example.ipynb` is updated to include segmenting from box prompt as an example. 2024-08-07 11:54:30 -07:00			`frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):`
Initial commit 2024-07-29 21:54:20 +00:00
			`# propagate the prompts to get masklets throughout the video`
			`for frame_idx, object_ids, masks in predictor.propagate_in_video(state):`
			`...`
			```

add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (#176) 2024-08-08 11:03:22 -07:00			`Please refer to the examples in [video_predictor_example.ipynb](./notebooks/video_predictor_example.ipynb) (also in Colab [here](https://colab.research.google.com/github/facebookresearch/segment-anything-2/blob/main/notebooks/video_predictor_example.ipynb)) for details on how to add click or box prompts, make refinements, and track multiple objects in videos.`
Move HF to separate section 2024-08-05 22:10:57 +02:00
Update README 2024-08-06 22:41:32 +02:00			`## Load from 🤗 Hugging Face`
Move HF to separate section 2024-08-05 22:10:57 +02:00
Add link 2024-08-05 22:12:15 +02:00			Alternatively, models can also be loaded from [Hugging Face](https://huggingface.co/models?search=facebook/sam2) (requires `pip install huggingface_hub`).
Move HF to separate section 2024-08-05 22:10:57 +02:00
			`For image prediction:`

			```python
			`import torch`
			`from sam2.sam2_image_predictor import SAM2ImagePredictor`

			`predictor = SAM2ImagePredictor.from_pretrained("facebook/sam2-hiera-large")`

			`with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):`
			`predictor.set_image(<your_image>)`
			`masks, _, _ = predictor.predict(<input_prompts>)`
			```

			`For video prediction:`
Include original code snippet 2024-08-05 22:08:54 +02:00
			```python
			`import torch`
			`from sam2.sam2_video_predictor import SAM2VideoPredictor`

			`predictor = SAM2VideoPredictor.from_pretrained("facebook/sam2-hiera-large")`

			`with torch.inference_mode(), torch.autocast("cuda", dtype=torch.bfloat16):`
Address comment 2024-08-07 17:48:12 +02:00			`state = predictor.init_state(<your_video>)`

			`# add new prompts and instantly get the output on the same frame`
Add interface for box prompt in SAM 2 video predictor (#174) This PR adds an example to provide box prompt in SAM 2 as inputs to the `add_new_points_or_box` API (renamed from`add_new_points`, which is kept for backward compatibility). If `box` is provided, we add it as the first two points with labels 2 and 3, along with the user-provided points (consistent with how SAM 2 is trained). The video predictor notebook `notebooks/video_predictor_example.ipynb` is updated to include segmenting from box prompt as an example. 2024-08-07 11:54:30 -07:00			`frame_idx, object_ids, masks = predictor.add_new_points_or_box(state, <your_prompts>):`
Address comment 2024-08-07 17:48:12 +02:00
			`# propagate the prompts to get masklets throughout the video`
			`for frame_idx, object_ids, masks in predictor.propagate_in_video(state):`
			`...`
Include original code snippet 2024-08-05 22:08:54 +02:00			```

Initial commit 2024-07-29 21:54:20 +00:00			`## Model Description`

SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`### SAM 2.1 checkpoints`
minor update README.md 2024-09-28 23:32:25 -07:00
SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`The table below shows the improved SAM 2.1 checkpoints released on September 29, 2024.`
			`\| Model \| Size (M) \| Speed (FPS) \| SA-V test (J&F) \| MOSE val (J&F) \| LVOS v2 (J&F) \|`
			`\| :------------------: \| :----------: \| :--------------------: \| :-----------------: \| :----------------: \| :---------------: \|`
			`\| sam2.1_hiera_tiny <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_t.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_tiny.pt)) \| 38.9 \| 47.2 \| 76.5 \| 71.8 \| 77.3 \|`
			`\| sam2.1_hiera_small <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_s.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_small.pt)) \| 46 \| 43.3 (53.0 compiled\*) \| 76.6 \| 73.5 \| 78.3 \|`
			`\| sam2.1_hiera_base_plus <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_b+.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_base_plus.pt)) \| 80.8 \| 34.8 (43.8 compiled\*) \| 78.2 \| 73.7 \| 78.2 \|`
			`\| sam2.1_hiera_large <br /> ([config](sam2/configs/sam2.1/sam2.1_hiera_l.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/092824/sam2.1_hiera_large.pt)) \| 224.4 \| 24.2 (30.2 compiled\*) \| 79.5 \| 74.6 \| 80.6 \|`

			`### SAM 2 checkpoints`
minor update README.md 2024-09-28 23:32:25 -07:00
SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`The previous SAM 2 checkpoints released on July 29, 2024 can be found as follows:`

Initial commit 2024-07-29 21:54:20 +00:00			`\| Model \| Size (M) \| Speed (FPS) \| SA-V test (J&F) \| MOSE val (J&F) \| LVOS v2 (J&F) \|`
			`\| :------------------: \| :----------: \| :--------------------: \| :-----------------: \| :----------------: \| :---------------: \|`
SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`\| sam2_hiera_tiny <br /> ([config](sam2/configs/sam2/sam2_hiera_t.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_tiny.pt)) \| 38.9 \| 47.2 \| 75.0 \| 70.9 \| 75.3 \|`
			`\| sam2_hiera_small <br /> ([config](sam2/configs/sam2/sam2_hiera_s.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_small.pt)) \| 46 \| 43.3 (53.0 compiled\*) \| 74.9 \| 71.5 \| 76.4 \|`
			`\| sam2_hiera_base_plus <br /> ([config](sam2/configs/sam2/sam2_hiera_b+.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_base_plus.pt)) \| 80.8 \| 34.8 (43.8 compiled\*) \| 74.7 \| 72.8 \| 75.8 \|`
			`\| sam2_hiera_large <br /> ([config](sam2/configs/sam2/sam2_hiera_l.yaml), [checkpoint](https://dl.fbaipublicfiles.com/segment_anything_2/072824/sam2_hiera_large.pt)) \| 224.4 \| 24.2 (30.2 compiled\*) \| 76.0 \| 74.6 \| 79.8 \|`
Initial commit 2024-07-29 21:54:20 +00:00
			\* Compile the model by setting `compile_image_encoder: True` in the config.

Fix typo in README: "Aything" corrected to "Anything" Corrected a typo in the Segment Anything Video Dataset section of the README file. The word "Aything" has been updated to "Anything." 2024-07-29 22:15:41 -04:00			`## Segment Anything Video Dataset`
Initial commit 2024-07-29 21:54:20 +00:00
			`See [sav_dataset/README.md](sav_dataset/README.md) for details.`

SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`## Training SAM 2`

			`You can train or fine-tune SAM 2 on custom datasets of images, videos, or both. Please check the training [README](training/README.md) on how to get started.`

Initial commit 2024-07-29 21:54:20 +00:00			`## License`

SAM2.1 SAM2.1 checkpoints + training code + Demo 2024-09-28 08:20:56 -07:00			`The SAM 2 model checkpoints, SAM 2 demo code (front-end and back-end), and SAM 2 training code are licensed under [Apache 2.0](./LICENSE), however the [Inter Font](https://github.com/rsms/inter?tab=OFL-1.1-1-ov-file) and [Noto Color Emoji](https://github.com/googlefonts/noto-emoji) used in the SAM 2 demo code are made available under the [SIL Open Font License, version 1.1](https://openfontlicense.org/open-font-license-official-text/).`
Initial commit 2024-07-29 21:54:20 +00:00
			`## Contributing`

			`See [contributing](CONTRIBUTING.md) and the [code of conduct](CODE_OF_CONDUCT.md).`

			`## Contributors`

			`The SAM 2 project was made possible with the help of many contributors (alphabetical):`

			Karen Bergan, Daniel Bolya, Alex Bosenberg, Kai Brown, Vispi Cassod, Christopher Chedeau, Ida Cheng, Luc Dahlin, Shoubhik Debnath, Rene Martinez Doehner, Grant Gardner, Sahir Gomez, Rishi Godugu, Baishan Guo, Caleb Ho, Andrew Huang, Somya Jain, Bob Kamma, Amanda Kallet, Jake Kinney, Alexander Kirillov, Shiva Koduvayur, Devansh Kukreja, Robert Kuo, Aohan Lin, Parth Malani, Jitendra Malik, Mallika Malhotra, Miguel Martin, Alexander Miller, Sasha Mitts, William Ngan, George Orlin, Joelle Pineau, Kate Saenko, Rodrick Shepard, Azita Shokrpour, David Soofian, Jonathan Torres, Jenny Truong, Sagar Vaze, Meng Wang, Claudette Ward, Pengchuan Zhang.

			Third-party code: we use a GPU-based connected component algorithm adapted from [`cc_torch`](https://github.com/zsef123/Connected_components_PyTorch) (with its license in [`LICENSE_cctorch`](./LICENSE_cctorch)) as an optional post-processing step for the mask predictions.

			`## Citing SAM 2`

			`If you use SAM 2 or the SA-V dataset in your research, please use the following BibTeX entry.`

			```bibtex
			`@article{ravi2024sam2,`
			`title={SAM 2: Segment Anything in Images and Videos},`
			`author={Ravi, Nikhila and Gabeur, Valentin and Hu, Yuan-Ting and Hu, Ronghang and Ryali, Chaitanya and Ma, Tengyu and Khedr, Haitham and R{\"a}dle, Roman and Rolland, Chloe and Gustafson, Laura and Mintun, Eric and Pan, Junting and Alwala, Kalyan Vasudev and Carion, Nicolas and Wu, Chao-Yuan and Girshick, Ross and Doll{\'a}r, Piotr and Feichtenhofer, Christoph},`
Update README.md 2024-08-02 12:56:06 -07:00			`journal={arXiv preprint arXiv:2408.00714},`
			`url={https://arxiv.org/abs/2408.00714},`
Initial commit 2024-07-29 21:54:20 +00:00			`year={2024}`
			`}`
			```