Files

Ronghang Hu 393ae336a7 SAM 2 Update 12/11/2024 -- full model compilation for a major VOS speedup and a new SAM2VideoPredictor to better handle multi-object tracking (#486 )

This PR provides new features and updates for SAM 2:

- We now support `torch.compile` of the entire SAM 2 model on videos, which can be turned on by setting `vos_optimized=True` in `build_sam2_video_predictor` (it uses the new `SAM2VideoPredictorVOS` predictor class in `sam2/sam2_video_predictor.py`).
* Compared to the previous setting (which only compiles the image encoder backbone), the new full model compilation gives a major speedup in inference FPS.
* In the VOS prediction script `tools/vos_inference.py`, you can specify this option in `tools/vos_inference.py` via the `--use_vos_optimized_video_predictor` flag.
* Note that turning on this flag might introduce a small variance in the predictions due to numerical differences caused by `torch.compile` of the full model.
* **PyTorch 2.5.1 is the minimum version for full support of this feature**. (Earlier PyTorch versions might run into compilation errors in some cases.) Therefore, we have updated the minimum PyTorch version to 2.5.1 accordingly in the installation scripts.
- We also update the implementation of the `SAM2VideoPredictor` class for the SAM 2 video prediction in `sam2/sam2_video_predictor.py`, which allows for independent per-object inference. Specifically, in the new `SAM2VideoPredictor`:
* Now **we handle the inference of each object independently** (as if we are opening a separate session for each object) while sharing their backbone features.
* This change allows us to relax the assumption of prompting for multi-object tracking. Previously (due to the batching behavior in inference), if a video frame receives clicks for only a subset of objects, the rest of the (non-prompted) objects are assumed to be non-existent in this frame (i.e., in such frames, the user is telling SAM 2 that the rest of the objects don't appear). Now, if a frame receives clicks for only a subset of objects, we do not make any assumptions about the remaining (non-prompted) objects (i.e., now each object is handled independently and is not affected by how other objects are prompted). As a result, **we allow adding new objects after tracking starts** after this change (which was previously a restriction on usage).
* We believe that the new version is a more natural inference behavior and therefore switched to it as the default behavior. The previous implementation of `SAM2VideoPredictor` is backed up to in `sam2/sam2_video_predictor_legacy.py`. All the VOS inference results using `tools/vos_inference.py` should remain the same after this change to the `SAM2VideoPredictor` class.

2024-12-11 15:00:55 -08:00

4.7 KiB

Raw Blame History

SAM 2 Demo

Welcome to the SAM 2 Demo! This project consists of a frontend built with React TypeScript and Vite and a backend service using Python Flask and Strawberry GraphQL. Both components can be run in Docker containers or locally on MPS (Metal Performance Shaders) or CPU. However, running the backend service on MPS or CPU devices may result in significantly slower performance (FPS).

Prerequisites

Before you begin, ensure you have the following installed on your system:

Docker and Docker Compose
[OPTIONAL] Node.js and Yarn for running frontend locally
[OPTIONAL] Anaconda for running backend locally

Installing Docker

To install Docker, follow these steps:

Go to the Docker website
Follow the installation instructions for your operating system.

[OPTIONAL] Installing Node.js and Yarn

To install Node.js and Yarn, follow these steps:

Go to the Node.js website.
Follow the installation instructions for your operating system.
Once Node.js is installed, open a terminal or command prompt and run the following command to install Yarn:

npm install -g yarn

[OPTIONAL] Installing Anaconda

To install Anaconda, follow these steps:

Go to the Anaconda website.
Follow the installation instructions for your operating system.

Quick Start

To get both the frontend and backend running quickly using Docker, you can use the following command:

docker compose up --build

Warning

On macOS, Docker containers only support running on CPU. MPS is not supported through Docker. If you want to run the demo backend service on MPS, you will need to run it locally (see "Running the Backend Locally" below).

This will build and start both services. You can access them at:

Frontend: http://localhost:7262
Backend: http://localhost:7263/graphql

Running Backend with MPS Support

MPS (Metal Performance Shaders) is not supported with Docker. To use MPS, you need to run the backend on your local machine.

Setting Up Your Environment

Create Conda environment

Create a new Conda environment for this project by running the following command or use your existing conda environment for SAM 2:
```
conda create --name sam2-demo python=3.10 --yes
```
This will create a new environment named sam2-demo with Python 3.10 as the interpreter.
Activate the Conda environment:
```
conda activate sam2-demo
```
Install ffmpeg
```
conda install -c conda-forge ffmpeg
```
Install SAM 2 demo dependencies:

Install project dependencies by running the following command in the SAM 2 checkout root directory:

pip install -e '.[interactive-demo]'

Running the Backend Locally

Download the SAM 2 checkpoints:

(cd ./checkpoints && ./download_ckpts.sh)

Use the following command to start the backend with MPS support:

cd demo/backend/server/

PYTORCH_ENABLE_MPS_FALLBACK=1 \
APP_ROOT="$(pwd)/../../../" \
API_URL=http://localhost:7263 \
MODEL_SIZE=base_plus \
DATA_PATH="$(pwd)/../../data" \
DEFAULT_VIDEO_PATH=gallery/05_default_juggle.mp4 \
gunicorn \
    --worker-class gthread app:app \
    --workers 1 \
    --threads 2 \
    --bind 0.0.0.0:7263 \
    --timeout 60

Options for the MODEL_SIZE argument are "tiny", "small", "base_plus" (default), and "large".

Warning

Running the backend service on MPS devices can cause fatal crashes with the Gunicorn worker due to insufficient MPS memory. Try switching to CPU devices by setting the SAM2_DEMO_FORCE_CPU_DEVICE=1 environment variable.

Starting the Frontend

If you wish to run the frontend separately (useful for development), follow these steps:

Navigate to demo frontend directory:
```
cd demo/frontend
```
Install dependencies:
```
yarn install
```
Start the development server:
```
yarn dev --port 7262
```

This will start the frontend development server on http://localhost:7262.

Docker Tips

To rebuild the Docker containers (useful if you've made changes to the Dockerfile or dependencies):
```
docker compose up --build
```
To stop the Docker containers:
```
docker compose down
```

Contributing

Contributions are welcome! Please read our contributing guidelines to get started.

License

See the LICENSE file for details.

By following these instructions, you should have a fully functional development environment for both the frontend and backend of the SAM 2 Demo. Happy coding!

4.7 KiB Raw Blame History