159 lines
9.6 KiB
Plaintext
159 lines
9.6 KiB
Plaintext
|
nohup: ignoring input
|
||
|
Loading embeddings from /home/nguyendc/sonnh/embedding-clustering/extract/embeddings_factures_osteopathie_1k_qwen.json...
|
||
|
Loaded 2800 samples with embedding dimension 2048
|
||
|
|
||
|
======================================================================
|
||
|
RUNNING GAUSSIAN MIXTURE MODEL CLUSTERING WITH OPTIMIZED GRID SEARCH
|
||
|
======================================================================
|
||
|
Optimized parameter combinations:
|
||
|
- n_components: 11 values [2, 3, 4, 5, 6, 8, 10, 11, 14, 17, 20]
|
||
|
- covariance_types: 2 options ['full', 'diag']
|
||
|
- reg_covar: 3 values [1e-05, 0.0001, 0.001]
|
||
|
- n_init: 2 values [1, 5]
|
||
|
- init_params: 2 options ['kmeans', 'k-means++']
|
||
|
- max_iter: 2 values [100, 300]
|
||
|
Total combinations: 528 (optimized for speed)
|
||
|
Estimated runtime: 4.4 minutes
|
||
|
This should be much faster...
|
||
|
|
||
|
Progress: 50/528 (9.5%) - Best scores so far: BIC=17260132.61, Silhouette=0.376
|
||
|
Progress: 100/528 (18.9%) - Best scores so far: BIC=17260132.61, Silhouette=0.376
|
||
|
Progress: 150/528 (28.4%) - Best scores so far: BIC=17260132.61, Silhouette=0.376
|
||
|
Progress: 200/528 (37.9%) - Best scores so far: BIC=17260132.61, Silhouette=0.376
|
||
|
Progress: 250/528 (47.3%) - Best scores so far: BIC=17260132.61, Silhouette=0.376
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=kmeans: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089173.91, AIC=13040529.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=2, cov=diag, init=k-means++: BIC=13089203.91, AIC=13040559.00, silhouette=0.3697
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12693850.34, AIC=12620880.00, silhouette=0.3761
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12693850.34, AIC=12620880.00, silhouette=0.3761
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12699627.34, AIC=12626657.00, silhouette=0.3761
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12699627.34, AIC=12626657.00, silhouette=0.3761
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12718245.34, AIC=12645275.00, silhouette=0.3761
|
||
|
n_components=3, cov=diag, init=kmeans: BIC=12718245.34, AIC=12645275.00, silhouette=0.3761
|
||
|
Progress: 300/528 (56.8%) - Best scores so far: BIC=11770626.34, Silhouette=0.376
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11525150.76, AIC=11427855.00, silhouette=0.3090
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11525150.76, AIC=11427855.00, silhouette=0.3090
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11530927.76, AIC=11433632.00, silhouette=0.3090
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11530927.76, AIC=11433632.00, silhouette=0.3090
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11549555.76, AIC=11452260.00, silhouette=0.3090
|
||
|
n_components=4, cov=diag, init=kmeans: BIC=11549555.76, AIC=11452260.00, silhouette=0.3090
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10641753.18, AIC=10520132.00, silhouette=0.3119
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10641753.18, AIC=10520132.00, silhouette=0.3119
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10647529.18, AIC=10525908.00, silhouette=0.3119
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10647529.18, AIC=10525908.00, silhouette=0.3119
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10666196.18, AIC=10544575.00, silhouette=0.3119
|
||
|
n_components=5, cov=diag, init=kmeans: BIC=10666196.18, AIC=10544575.00, silhouette=0.3119
|
||
|
Progress: 350/528 (66.3%) - Best scores so far: BIC=9931250.18, Silhouette=0.376
|
||
|
Progress: 400/528 (75.8%) - Best scores so far: BIC=8401628.46, Silhouette=0.376
|
||
|
Progress: 450/528 (85.2%) - Best scores so far: BIC=7579813.73, Silhouette=0.376
|
||
|
Progress: 500/528 (94.7%) - Best scores so far: BIC=6988291.27, Silhouette=0.376
|
||
|
Progress: 528/528 (100.0%) - Best scores so far: BIC=6849987.05, Silhouette=0.376
|
||
|
|
||
|
======================================================================
|
||
|
GAUSSIAN MIXTURE MODEL GRID SEARCH ANALYSIS
|
||
|
======================================================================
|
||
|
Total parameter combinations tested: 413
|
||
|
Combinations with valid clustering: 413
|
||
|
|
||
|
Model Selection Metrics:
|
||
|
Best BIC score: 6849987.05
|
||
|
Best AIC score: -11119584.00
|
||
|
Best Log-Likelihood: 6594.97
|
||
|
|
||
|
Clustering Quality Metrics:
|
||
|
Best silhouette score: 0.3761
|
||
|
Mean silhouette score: 0.2317
|
||
|
Best Calinski-Harabasz score: 1331.69
|
||
|
Best Davies-Bouldin score: 0.7860
|
||
|
|
||
|
Top 5 results by BIC (lower is better):
|
||
|
n_comp=20, cov=diag: BIC=6849987.05, AIC=6363484.50
|
||
|
n_comp=20, cov=diag: BIC=6849987.05, AIC=6363484.50
|
||
|
n_comp=20, cov=diag: BIC=6849987.05, AIC=6363484.50
|
||
|
n_comp=20, cov=diag: BIC=6849987.05, AIC=6363484.50
|
||
|
n_comp=20, cov=diag: BIC=6855879.05, AIC=6369376.50
|
||
|
|
||
|
Top 5 results by AIC (lower is better):
|
||
|
n_comp=4, cov=full: BIC=38759701.15, AIC=-11119584.00
|
||
|
n_comp=4, cov=full: BIC=38759701.15, AIC=-11119584.00
|
||
|
n_comp=3, cov=full: BIC=26462676.38, AIC=-10946786.00
|
||
|
n_comp=3, cov=full: BIC=26462676.38, AIC=-10946786.00
|
||
|
n_comp=5, cov=full: BIC=54230057.92, AIC=-8119050.00
|
||
|
|
||
|
Top 5 results by Silhouette Score:
|
||
|
n_comp=3, cov=diag: silhouette=0.3761
|
||
|
n_comp=3, cov=diag: silhouette=0.3761
|
||
|
n_comp=3, cov=diag: silhouette=0.3761
|
||
|
n_comp=3, cov=diag: silhouette=0.3761
|
||
|
n_comp=3, cov=diag: silhouette=0.3761
|
||
|
|
||
|
Component count analysis (top 10 by BIC):
|
||
|
20.0 components: BIC=6849987.05, AIC=6363484.50, silhouette=0.1770
|
||
|
17.0 components: BIC=6988291.27, AIC=6574765.00, silhouette=0.2085
|
||
|
14.0 components: BIC=7179637.00, AIC=6839087.00, silhouette=0.2119
|
||
|
11.0 components: BIC=7579813.73, AIC=7312240.00, silhouette=0.2577
|
||
|
10.0 components: BIC=7737961.30, AIC=7494713.00, silhouette=0.2863
|
||
|
8.0 components: BIC=8401628.46, AIC=1051428.00, silhouette=0.2748
|
||
|
6.0 components: BIC=9102218.61, AIC=-6065602.00, silhouette=0.2707
|
||
|
5.0 components: BIC=9931250.18, AIC=-8119050.00, silhouette=0.3163
|
||
|
4.0 components: BIC=10865268.76, AIC=-11119584.00, silhouette=0.3110
|
||
|
3.0 components: BIC=11686081.34, AIC=-10946786.00, silhouette=0.3761
|
||
|
|
||
|
📁 SAVING DETAILED RESULTS...
|
||
|
==============================
|
||
|
Detailed grid search results saved to: gmm_grid_search_detailed_20250801_015245.json
|
||
|
Grid search summary CSV saved to: gmm_grid_search_summary_20250801_015245.csv
|
||
|
|
||
|
Best GMM result by BIC:
|
||
|
Parameters: {'n_components': 20, 'covariance_type': 'diag', 'reg_covar': 1e-05, 'n_init': 1, 'init_params': 'kmeans', 'max_iter': 100}
|
||
|
BIC score: 6849987.05
|
||
|
|
||
|
Best GMM result by AIC:
|
||
|
Parameters: {'n_components': 4, 'covariance_type': 'full', 'reg_covar': 0.0001, 'n_init': 5, 'init_params': 'kmeans', 'max_iter': 100}
|
||
|
AIC score: -11119584.00
|
||
|
|
||
|
Best GMM result by Silhouette:
|
||
|
Parameters: {'n_components': 3, 'covariance_type': 'diag', 'reg_covar': 1e-05, 'n_init': 1, 'init_params': 'kmeans', 'max_iter': 100}
|
||
|
Silhouette score: 0.3761
|
||
|
Visualization saved as 'gmm_clustering_results.png'
|
||
|
Final clustering results (bic) saved to: gmm_final_results_bic_20250801_015247.json
|
||
|
Traceback (most recent call last):
|
||
|
File "/home/nguyendc/sonnh/embedding-clustering/cluster/gmm_extensive.py", line 646, in <module>
|
||
|
main()
|
||
|
File "/home/nguyendc/sonnh/embedding-clustering/cluster/gmm_extensive.py", line 640, in main
|
||
|
clustering.save_clustering_results(results)
|
||
|
File "/home/nguyendc/sonnh/embedding-clustering/cluster/gmm_extensive.py", line 614, in save_clustering_results
|
||
|
json.dump({
|
||
|
File "/home/nguyendc/miniconda3/envs/cluster/lib/python3.10/json/__init__.py", line 179, in dump
|
||
|
for chunk in iterable:
|
||
|
File "/home/nguyendc/miniconda3/envs/cluster/lib/python3.10/json/encoder.py", line 431, in _iterencode
|
||
|
yield from _iterencode_dict(o, _current_indent_level)
|
||
|
File "/home/nguyendc/miniconda3/envs/cluster/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
|
||
|
yield from chunks
|
||
|
File "/home/nguyendc/miniconda3/envs/cluster/lib/python3.10/json/encoder.py", line 438, in _iterencode
|
||
|
o = _default(o)
|
||
|
File "/home/nguyendc/miniconda3/envs/cluster/lib/python3.10/json/encoder.py", line 179, in default
|
||
|
raise TypeError(f'Object of type {o.__class__.__name__} '
|
||
|
TypeError: Object of type float32 is not JSON serializable
|