Files
distillation/easydistill/mmkd/infer.log

175 lines
26 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

INFO 08-03 20:27:56 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 08-03 20:27:56 [__init__.py:239] Automatically detected platform cuda.
2025-08-03 20:27:58,078 - INFO - Generating distillation data from the teacher model!
2025-08-03 20:27:58,384 - INFO - Loading processor & vLLM model from Qwen/Qwen2.5-VL-32B-Instruct
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
2025-08-03 20:28:00,580 - INFO - Initial eos_token_id 151645 from tokenizer
2025-08-03 20:28:00,580 - INFO - processor.tokenizer eos_token: <|im_end|>, eos_token_id: 151645
INFO 08-03 20:28:09 [config.py:717] This model supports multiple tasks: {'reward', 'classify', 'score', 'generate', 'embed'}. Defaulting to 'generate'.
INFO 08-03 20:28:09 [config.py:2003] Chunked prefill is enabled with max_num_batched_tokens=16384.
INFO 08-03 20:28:11 [core.py:58] Initializing a V1 LLM engine (v0.8.5) with config: model='Qwen/Qwen2.5-VL-32B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-VL-32B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=16000, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=Qwen/Qwen2.5-VL-32B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 08-03 20:28:12 [utils.py:2522] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x72908ff5c0d0>
INFO 08-03 20:28:13 [parallel_state.py:1004] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0
INFO 08-03 20:28:13 [cuda.py:221] Using Flash Attention backend on V1 engine.
WARNING 08-03 20:28:20 [topk_topp_sampler.py:69] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 08-03 20:28:20 [gpu_model_runner.py:1329] Starting to load model Qwen/Qwen2.5-VL-32B-Instruct...
WARNING 08-03 20:28:20 [vision.py:93] Current `vllm-flash-attn` has a bug inside vision module, so we use xformers backend instead. You can run `pip install flash-attn` to use flash-attention backend.
INFO 08-03 20:28:20 [config.py:3614] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 264, 272, 280, 288, 296, 304, 312, 320, 328, 336, 344, 352, 360, 368, 376, 384, 392, 400, 408, 416, 424, 432, 440, 448, 456, 464, 472, 480, 488, 496, 504, 512] is overridden by config [512, 384, 256, 128, 4, 2, 1, 392, 264, 136, 8, 400, 272, 144, 16, 408, 280, 152, 24, 416, 288, 160, 32, 424, 296, 168, 40, 432, 304, 176, 48, 440, 312, 184, 56, 448, 320, 192, 64, 456, 328, 200, 72, 464, 336, 208, 80, 472, 344, 216, 88, 120, 480, 352, 248, 224, 96, 488, 504, 360, 232, 104, 496, 368, 240, 112, 376]
INFO 08-03 20:28:21 [weight_utils.py:265] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards: 0% Completed | 0/18 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 6% Completed | 1/18 [00:01<00:18, 1.10s/it]
Loading safetensors checkpoint shards: 11% Completed | 2/18 [00:01<00:12, 1.23it/s]
Loading safetensors checkpoint shards: 17% Completed | 3/18 [00:02<00:14, 1.02it/s]
Loading safetensors checkpoint shards: 22% Completed | 4/18 [00:04<00:14, 1.07s/it]
Loading safetensors checkpoint shards: 28% Completed | 5/18 [00:05<00:14, 1.12s/it]
Loading safetensors checkpoint shards: 33% Completed | 6/18 [00:06<00:13, 1.15s/it]
Loading safetensors checkpoint shards: 39% Completed | 7/18 [00:07<00:12, 1.17s/it]
Loading safetensors checkpoint shards: 44% Completed | 8/18 [00:08<00:11, 1.17s/it]
Loading safetensors checkpoint shards: 50% Completed | 9/18 [00:10<00:10, 1.18s/it]
Loading safetensors checkpoint shards: 56% Completed | 10/18 [00:11<00:09, 1.19s/it]
Loading safetensors checkpoint shards: 61% Completed | 11/18 [00:12<00:08, 1.19s/it]
Loading safetensors checkpoint shards: 67% Completed | 12/18 [00:13<00:07, 1.20s/it]
Loading safetensors checkpoint shards: 72% Completed | 13/18 [00:15<00:06, 1.23s/it]
Loading safetensors checkpoint shards: 78% Completed | 14/18 [00:16<00:04, 1.25s/it]
Loading safetensors checkpoint shards: 83% Completed | 15/18 [00:17<00:03, 1.26s/it]
Loading safetensors checkpoint shards: 89% Completed | 16/18 [00:18<00:02, 1.26s/it]
Loading safetensors checkpoint shards: 94% Completed | 17/18 [00:19<00:01, 1.17s/it]
Loading safetensors checkpoint shards: 100% Completed | 18/18 [00:21<00:00, 1.18s/it]
Loading safetensors checkpoint shards: 100% Completed | 18/18 [00:21<00:00, 1.17s/it]
INFO 08-03 20:28:42 [loader.py:458] Loading weights took 21.13 seconds
INFO 08-03 20:28:42 [gpu_model_runner.py:1347] Model loading took 62.4365 GiB and 21.912121 seconds
INFO 08-03 20:28:46 [gpu_model_runner.py:1620] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
INFO 08-03 20:29:09 [backends.py:420] Using cache directory: /home/nguyendc/.cache/vllm/torch_compile_cache/1fe259ecb1/rank_0_0 for vLLM's torch.compile
INFO 08-03 20:29:09 [backends.py:430] Dynamo bytecode transform time: 19.39 s
INFO 08-03 20:29:22 [backends.py:118] Directly load the compiled graph(s) for shape None from the cache, took 11.165 s
INFO 08-03 20:29:24 [monitor.py:33] torch.compile takes 19.39 s in total
INFO 08-03 20:29:29 [kv_cache_utils.py:634] GPU KV cache size: 38,016 tokens
INFO 08-03 20:29:29 [kv_cache_utils.py:637] Maximum concurrency for 16,000 tokens per request: 2.38x
INFO 08-03 20:30:08 [gpu_model_runner.py:1686] Graph capturing finished in 39 secs, took 0.96 GiB
INFO 08-03 20:30:08 [core.py:159] init engine (profile, create kv cache, warmup model) took 86.30 seconds
INFO 08-03 20:30:12 [core_client.py:439] Core engine process 0 ready.
2025-08-03 20:30:12,647 - INFO - Qwen2.5-VL vLLM model loaded successfully
Generating responses: 0%| | 0/40 [00:00<?, ?it/s]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.96s/it, est. speed input: 272.51 toks/s, output: 20.14 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.96s/it, est. speed input: 272.51 toks/s, output: 20.14 toks/s]
Generating responses: 2%|▎ | 1/40 [00:16<10:29, 16.13s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.37s/it, est. speed input: 333.81 toks/s, output: 20.48 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.37s/it, est. speed input: 333.81 toks/s, output: 20.48 toks/s]
Generating responses: 5%|▌ | 2/40 [00:23<06:58, 11.01s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.54s/it, est. speed input: 364.77 toks/s, output: 20.02 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.54s/it, est. speed input: 364.77 toks/s, output: 20.02 toks/s]
Generating responses: 8%|▊ | 3/40 [00:31<05:50, 9.47s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.43s/it, est. speed input: 343.57 toks/s, output: 20.31 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.43s/it, est. speed input: 343.57 toks/s, output: 20.31 toks/s]
Generating responses: 10%|█ | 4/40 [00:38<05:12, 8.69s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.27s/it, est. speed input: 564.53 toks/s, output: 18.27 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.27s/it, est. speed input: 564.53 toks/s, output: 18.27 toks/s]
Generating responses: 12%|█▎ | 5/40 [00:47<05:02, 8.64s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.35s/it, est. speed input: 307.93 toks/s, output: 20.56 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.35s/it, est. speed input: 307.93 toks/s, output: 20.56 toks/s]
Generating responses: 15%|█▌ | 6/40 [00:54<04:39, 8.21s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.26s/it, est. speed input: 565.20 toks/s, output: 18.29 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.26s/it, est. speed input: 565.20 toks/s, output: 18.29 toks/s]
Generating responses: 18%|█▊ | 7/40 [01:03<04:34, 8.32s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.53s/it, est. speed input: 363.87 toks/s, output: 20.05 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.53s/it, est. speed input: 363.87 toks/s, output: 20.05 toks/s]
Generating responses: 20%|██ | 8/40 [01:10<04:19, 8.10s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.25s/it, est. speed input: 565.62 toks/s, output: 18.30 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:08<00:00, 8.25s/it, est. speed input: 565.62 toks/s, output: 18.30 toks/s]
Generating responses: 22%|██▎ | 9/40 [01:19<04:15, 8.24s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.63s/it, est. speed input: 395.25 toks/s, output: 19.80 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:07<00:00, 7.63s/it, est. speed input: 395.25 toks/s, output: 19.80 toks/s]
Generating responses: 25%|██▌ | 10/40 [01:27<04:02, 8.08s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:17<00:00, 17.76s/it, est. speed input: 293.12 toks/s, output: 20.05 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:17<00:00, 17.76s/it, est. speed input: 293.12 toks/s, output: 20.05 toks/s]
Generating responses: 28%|██▊ | 11/40 [01:45<05:22, 11.13s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.45s/it, est. speed input: 276.12 toks/s, output: 20.48 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.45s/it, est. speed input: 276.12 toks/s, output: 20.48 toks/s]
Generating responses: 30%|███ | 12/40 [01:57<05:24, 11.57s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:15<00:00, 15.51s/it, est. speed input: 226.26 toks/s, output: 20.76 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:15<00:00, 15.51s/it, est. speed input: 226.26 toks/s, output: 20.76 toks/s]
Generating responses: 32%|███▎ | 13/40 [02:13<05:45, 12.81s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.27s/it, est. speed input: 278.40 toks/s, output: 20.45 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.27s/it, est. speed input: 278.40 toks/s, output: 20.45 toks/s]
Generating responses: 35%|███▌ | 14/40 [02:25<05:29, 12.69s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.43s/it, est. speed input: 279.05 toks/s, output: 20.35 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.43s/it, est. speed input: 279.05 toks/s, output: 20.35 toks/s]
Generating responses: 38%|███▊ | 15/40 [02:38<05:16, 12.66s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.17s/it, est. speed input: 282.32 toks/s, output: 20.38 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.17s/it, est. speed input: 282.32 toks/s, output: 20.38 toks/s]
Generating responses: 40%|████ | 16/40 [02:50<05:01, 12.55s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.67s/it, est. speed input: 293.00 toks/s, output: 20.30 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.67s/it, est. speed input: 293.00 toks/s, output: 20.30 toks/s]
Generating responses: 42%|████▎ | 17/40 [03:02<04:43, 12.33s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.65s/it, est. speed input: 293.52 toks/s, output: 20.34 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.65s/it, est. speed input: 293.52 toks/s, output: 20.34 toks/s]
Generating responses: 45%|████▌ | 18/40 [03:14<04:27, 12.17s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.07s/it, est. speed input: 402.59 toks/s, output: 19.30 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.07s/it, est. speed input: 402.59 toks/s, output: 19.30 toks/s]
Generating responses: 48%|████▊ | 19/40 [03:26<04:16, 12.23s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.97s/it, est. speed input: 264.77 toks/s, output: 20.50 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.98s/it, est. speed input: 264.77 toks/s, output: 20.50 toks/s]
Generating responses: 50%|█████ | 20/40 [03:39<04:09, 12.49s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.02s/it, est. speed input: 285.08 toks/s, output: 20.37 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.03s/it, est. speed input: 285.08 toks/s, output: 20.37 toks/s]
Generating responses: 52%|█████▎ | 21/40 [03:51<03:55, 12.39s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.66s/it, est. speed input: 358.37 toks/s, output: 19.62 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.66s/it, est. speed input: 358.37 toks/s, output: 19.62 toks/s]
Generating responses: 55%|█████▌ | 22/40 [04:05<03:51, 12.86s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.66s/it, est. speed input: 271.47 toks/s, output: 20.14 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.66s/it, est. speed input: 271.47 toks/s, output: 20.14 toks/s]
Generating responses: 57%|█████▊ | 23/40 [04:18<03:38, 12.84s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.27s/it, est. speed input: 279.72 toks/s, output: 20.37 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.27s/it, est. speed input: 279.72 toks/s, output: 20.37 toks/s]
Generating responses: 60%|██████ | 24/40 [04:31<03:23, 12.72s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.48s/it, est. speed input: 275.49 toks/s, output: 20.43 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.48s/it, est. speed input: 275.49 toks/s, output: 20.43 toks/s]
Generating responses: 62%|██████▎ | 25/40 [04:43<03:10, 12.69s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.66s/it, est. speed input: 293.27 toks/s, output: 20.24 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.66s/it, est. speed input: 293.27 toks/s, output: 20.24 toks/s]
Generating responses: 65%|██████▌ | 26/40 [04:55<02:53, 12.42s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.04s/it, est. speed input: 264.57 toks/s, output: 20.48 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.04s/it, est. speed input: 264.57 toks/s, output: 20.48 toks/s]
Generating responses: 68%|██████▊ | 27/40 [05:08<02:44, 12.65s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.62s/it, est. speed input: 294.71 toks/s, output: 20.31 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.62s/it, est. speed input: 294.71 toks/s, output: 20.31 toks/s]
Generating responses: 70%|███████ | 28/40 [05:20<02:28, 12.38s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.67s/it, est. speed input: 271.64 toks/s, output: 20.44 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.67s/it, est. speed input: 271.64 toks/s, output: 20.44 toks/s]
Generating responses: 72%|███████▎ | 29/40 [05:33<02:17, 12.51s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.62s/it, est. speed input: 279.70 toks/s, output: 20.36 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.62s/it, est. speed input: 279.70 toks/s, output: 20.36 toks/s]
Generating responses: 75%|███████▌ | 30/40 [05:46<02:05, 12.59s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:14<00:00, 14.51s/it, est. speed input: 243.13 toks/s, output: 20.61 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:14<00:00, 14.51s/it, est. speed input: 243.13 toks/s, output: 20.61 toks/s]
Generating responses: 78%|███████▊ | 31/40 [06:00<01:58, 13.21s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:14<00:00, 14.03s/it, est. speed input: 247.45 toks/s, output: 20.60 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:14<00:00, 14.03s/it, est. speed input: 247.45 toks/s, output: 20.60 toks/s]
Generating responses: 80%|████████ | 32/40 [06:14<01:47, 13.50s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.95s/it, est. speed input: 286.65 toks/s, output: 20.33 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.95s/it, est. speed input: 286.65 toks/s, output: 20.33 toks/s]
Generating responses: 82%|████████▎ | 33/40 [06:26<01:31, 13.08s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.84s/it, est. speed input: 289.14 toks/s, output: 20.35 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.84s/it, est. speed input: 289.14 toks/s, output: 20.35 toks/s]
Generating responses: 85%|████████▌ | 34/40 [06:38<01:16, 12.75s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.17s/it, est. speed input: 262.10 toks/s, output: 20.49 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:13<00:00, 13.17s/it, est. speed input: 262.10 toks/s, output: 20.49 toks/s]
Generating responses: 88%|████████▊ | 35/40 [06:52<01:04, 12.92s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.06s/it, est. speed input: 285.71 toks/s, output: 20.39 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:12<00:00, 12.07s/it, est. speed input: 285.71 toks/s, output: 20.39 toks/s]
Generating responses: 90%|█████████ | 36/40 [07:04<00:50, 12.70s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.67s/it, est. speed input: 293.09 toks/s, output: 20.31 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.67s/it, est. speed input: 293.09 toks/s, output: 20.31 toks/s]
Generating responses: 92%|█████████▎| 37/40 [07:16<00:37, 12.43s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.11s/it, est. speed input: 196.61 toks/s, output: 20.87 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.11s/it, est. speed input: 196.61 toks/s, output: 20.87 toks/s]
Generating responses: 95%|█████████▌| 38/40 [07:34<00:28, 14.18s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.12s/it, est. speed input: 196.61 toks/s, output: 20.92 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:18<00:00, 18.12s/it, est. speed input: 196.61 toks/s, output: 20.92 toks/s]
Generating responses: 98%|█████████▊| 39/40 [07:52<00:15, 15.40s/it]
Processed prompts: 0%| | 0/1 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.61s/it, est. speed input: 294.54 toks/s, output: 20.33 toks/s]
Processed prompts: 100%|██████████| 1/1 [00:11<00:00, 11.61s/it, est. speed input: 294.54 toks/s, output: 20.33 toks/s]
Generating responses: 100%|██████████| 40/40 [08:04<00:00, 14.31s/it]
Generating responses: 100%|██████████| 40/40 [08:04<00:00, 12.11s/it]
2025-08-03 20:38:17,297 - INFO - Data successfully written to ./mllm_demo_distill.json