From bd61f5009143302fa20c6dc21d982eb840d8c3b3 Mon Sep 17 00:00:00 2001 From: SlongLiu Date: Wed, 12 Apr 2023 18:40:11 +0800 Subject: [PATCH] update tips --- README.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 18664fb..25a17ec 100644 --- a/README.md +++ b/README.md @@ -57,12 +57,14 @@ Marrying Grounding DINO gd_gligen -## :star: Explanation/Tips for Grounding DINO Inputs and Outputs -- Grounding DINO accepts with a `(image, text)` pair as inputs. -- It will outputs `900` (by default) object boxes. Each box has a similarity scores across all input words. +## :star: Explanations/Tips for Grounding DINO Inputs and Outputs +- Grounding DINO accepts an `(image, text)` pair as inputs. +- It outputs `900` (by default) object boxes. Each box has similarity scores across all input words. (as shown in Figures below.) - We defaultly choose the boxes whose highest similarities are higher than a `box_threshold`. -- We clip the words whose similarities are higher than the `text_threshold` as predicted labels. -- If you want to obtain objects of certain phrases, like the `dogs` in the sentence `two dogs with a stick.`, you can select the boxes with highest text similarities with `dogs` as final outputs. +- We extract the words whose similarities are higher than the `text_threshold` as predicted labels. +- If you want to obtain objects of specific phrases, like the `dogs` in the sentence `two dogs with a stick.`, you can select the boxes with highest text similarities with `dogs` as final outputs. +- Note that each word can be split to **more than one** tokens with differetn tokenlizers. The number of words in a sentence may not equal to the number of text tokens. +- We suggest separating different category names with `.` for Grounding DINO. ![model_explain1](.asset/model_explan1.PNG) ![model_explain2](.asset/model_explan2.PNG)