Grounded-SAM-2

Author	SHA1	Message	Date
Ronghang Hu	393ae336a7	SAM 2 Update 12/11/2024 -- full model compilation for a major VOS speedup and a new SAM2VideoPredictor to better handle multi-object tracking (#486 ) This PR provides new features and updates for SAM 2: - We now support `torch.compile` of the entire SAM 2 model on videos, which can be turned on by setting `vos_optimized=True` in `build_sam2_video_predictor` (it uses the new `SAM2VideoPredictorVOS` predictor class in `sam2/sam2_video_predictor.py`). * Compared to the previous setting (which only compiles the image encoder backbone), the new full model compilation gives a major speedup in inference FPS. * In the VOS prediction script `tools/vos_inference.py`, you can specify this option in `tools/vos_inference.py` via the `--use_vos_optimized_video_predictor` flag. * Note that turning on this flag might introduce a small variance in the predictions due to numerical differences caused by `torch.compile` of the full model. * PyTorch 2.5.1 is the minimum version for full support of this feature. (Earlier PyTorch versions might run into compilation errors in some cases.) Therefore, we have updated the minimum PyTorch version to 2.5.1 accordingly in the installation scripts. - We also update the implementation of the `SAM2VideoPredictor` class for the SAM 2 video prediction in `sam2/sam2_video_predictor.py`, which allows for independent per-object inference. Specifically, in the new `SAM2VideoPredictor`: * Now we handle the inference of each object independently (as if we are opening a separate session for each object) while sharing their backbone features. * This change allows us to relax the assumption of prompting for multi-object tracking. Previously (due to the batching behavior in inference), if a video frame receives clicks for only a subset of objects, the rest of the (non-prompted) objects are assumed to be non-existent in this frame (i.e., in such frames, the user is telling SAM 2 that the rest of the objects don't appear). Now, if a frame receives clicks for only a subset of objects, we do not make any assumptions about the remaining (non-prompted) objects (i.e., now each object is handled independently and is not affected by how other objects are prompted). As a result, we allow adding new objects after tracking starts after this change (which was previously a restriction on usage). * We believe that the new version is a more natural inference behavior and therefore switched to it as the default behavior. The previous implementation of `SAM2VideoPredictor` is backed up to in `sam2/sam2_video_predictor_legacy.py`. All the VOS inference results using `tools/vos_inference.py` should remain the same after this change to the `SAM2VideoPredictor` class.	2024-12-11 15:00:55 -08:00
Roman Rädle	ff9704fc0e	[sam2][demo][1/x] Fix file upload Summary: The Strawberry GraphQL library recently disabled multipart requests by default. This resulted in a video upload request returning "Unsupported content type" instead of uploading the video, processing it, and returning the video path. This issue was raised in #361. A forward fix is to add `multipart_uploads_enabled=True` to the endpoint view. Test Plan: Tested locally with cURL and upload succeeds Request ``` curl http://localhost:7263/graphql \ -F operations='{ "query": "mutation($file: Upload!){ uploadVideo(file: $file) { path } }", "variables": { "file": null } }' \ -F map='{ "file": ["variables.file"] }' \ -F file=@video.mov ``` Response ``` {"data": {"uploadVideo": {"path": "uploads/<HASH>.mp4"}}} ```	2024-10-08 14:58:28 -07:00
Haitham Khedr	8bf0920e66	Add MANIFEST.in (#353 )	2024-10-03 10:40:13 -07:00
Ronghang Hu	98fcb164bf	Update links after renaming the repo from `segment-anything-2` to `sam2` (#341 ) This PR update repo links after we renamed the repo from `segment-anything-2` to `sam2`. It also changes `NAME` in setup.py to `SAM-2` (which is already the named used in pip setup since python packages don't allow whitespace)	2024-09-30 20:27:44 -07:00
Haitham Khedr	aa9b8722d0	SAM2.1 SAM2.1 checkpoints + training code + Demo	2024-09-29 05:49:56 +00:00
Ronghang Hu	7e1596c0b6	open `README.md` with unicode (to support Hugging Face emoji); fix various typos (#218 ) (close #217, #66, #67, #69, #91, #126, #127, #145)	2024-08-14 09:06:25 -07:00
Ronghang Hu	dce7b5446f	improving warning message and adding further tips for installation (#204 )	2024-08-12 11:37:41 -07:00
Ronghang Hu	d421e0b040	add Colab support to the notebooks; pack config files in `sam2_configs` package during installation (#176 )	2024-08-08 11:03:22 -07:00
Ronghang Hu	6186d1529a	also catch errors during installation in case `CUDAExtension` cannot be loaded (#175 ) Previously we only catch build errors in `BuildExtension` in https://github.com/facebookresearch/segment-anything-2/pull/155. However, in some cases, the `CUDAExtension` instance might not load. So in this PR, we also catch such errors for `CUDAExtension`.	2024-08-07 12:26:11 -07:00
Ronghang Hu	6f7e700c37	Make it optional to build CUDA extension for SAM 2; also fallback to all available kernels if Flash Attention fails (#155 ) In this PR, we make it optional to build the SAM 2 CUDA extension, in observation that many users encounter difficulties with the CUDA compilation step. 1. During installation, we catch build errors and print a warning message. We also allow explicitly turning off the CUDA extension building with `SAM2_BUILD_CUDA=0`. 2. At runtime, we catch CUDA kernel errors from connected components and print a warning on skipping the post processing step. We also fall back to the all available kernels if the Flash Attention kernel fails.	2024-08-06 10:52:01 -07:00
Haitham Khedr	0c5f8c5432	Initial commit	2024-07-29 21:54:20 +00:00

11 Commits