diff --git a/README.md b/README.md index cb798a8..096a5f9 100644 --- a/README.md +++ b/README.md @@ -21,6 +21,7 @@ Grounded SAM 2 does not introduce significant methodological changes compared to - [Grounded SAM 2 Video Object Tracking Demo (with Grounding DINO 1.5 & 1.6)](#grounded-sam-2-video-object-tracking-demo-with-grounding-dino-15--16) - [Grounded SAM 2 Video Object Tracking with Custom Video Input (using Grounding DINO)](#grounded-sam-2-video-object-tracking-demo-with-custom-video-input-with-grounding-dino) - [Grounded SAM 2 Video Object Tracking with Custom Video Input (using Grounding DINO 1.5 & 1.6)](#grounded-sam-2-video-object-tracking-demo-with-custom-video-input-with-grounding-dino-15--16) + - [Grounded SAM 2 Video Object Tracking with Continues ID (using Grounding DINO)](#grounded-sam-2--video-object-tracking-with-continuous-id-with-grounding-dino) - [Citation](#citation) @@ -167,7 +168,10 @@ And we will automatically save the tracking visualization results in `OUTPUT_VID > [!WARNING] > We initialize the box prompts on the first frame of the input video. If you want to start from different frame, you can refine `ann_frame_idx` by yourself in our code. -### Grounded-SAM-2 Video Object Tracking with Continuous ID (with Grounding DINO) +### Grounded-SAM-2 Video Object Tracking with Continuous ID (with Grounding DINO) + +In above demos, we only prompt Grounded SAM 2 in specific frame, which may not be friendly to find new object during the whole video. In this demo, we try to **find new objects** and assign them with new ID across the whole video, this function is **still under develop**. it's not that stable now. + Users can upload their own video files and specify custom text prompts for grounding and tracking using the Grounding DINO and SAM 2 frameworks. To do this, execute the script: @@ -186,12 +190,8 @@ You can customize various parameters including: - `text_threshold`: text threshold for groundingdino model Note: This method supports only the mask type of text prompt. -The demo video is: -[![car tracking demo data](./assets/tracking_car_1.jpg)](./assets/tracking_car.mp4) - - After running our demo code, you can get the tracking results as follows: -[![car tracking result data](./assets/tracking_car_mask_1.jpg)](./assets/tracking_car_output.mp4) +[![car tracking result data](./assets/tracking_car_mask_1.jpg)](https://github.com/user-attachments/assets/141594a2-1451-4d2e-a91b-7941284c2c13) ### Citation diff --git a/assets/tracking_car_output.mp4 b/assets/tracking_car_output.mp4 deleted file mode 100644 index 05aa993..0000000 Binary files a/assets/tracking_car_output.mp4 and /dev/null differ diff --git a/assets/zebra_output.mp4 b/assets/zebra_output.mp4 deleted file mode 100644 index e980dd4..0000000 Binary files a/assets/zebra_output.mp4 and /dev/null differ diff --git a/grounded_sam2_tracking_demo_with_continuous_id.py b/grounded_sam2_tracking_demo_with_continuous_id.py index 3568ae3..f620c28 100644 --- a/grounded_sam2_tracking_demo_with_continuous_id.py +++ b/grounded_sam2_tracking_demo_with_continuous_id.py @@ -68,7 +68,7 @@ frame_names.sort(key=lambda p: int(os.path.splitext(p)[0])) # init video predictor state inference_state = video_predictor.init_state(video_path=video_dir) -step = 10 # the step to sample frames for groundedDino predictor +step = 25 # the step to sample frames for Grounding DINO predictor sam2_masks = MaskDictionatyModel() PROMPT_TYPE_FOR_VIDEO = "mask" # box, mask or point @@ -195,4 +195,4 @@ Step 6: Draw the results and save the video """ CommonUtils.draw_masks_and_box(video_dir, mask_data_dir, json_data_dir, result_dir) -create_video_from_images(result_dir, output_video_path, frame_rate=15) \ No newline at end of file +create_video_from_images(result_dir, output_video_path, frame_rate=30) \ No newline at end of file