반응형
7.3 Gensim을 이용한 토픽 모델링
7.3.2 혼란도와 토픽 응집도를 이용한 최적값 선택
from gensim.models import CoherenceModel
cm = CoherenceModel(model=model, corpus=corpus, coherence='u_mass')
coherence = cm.get_coherence()
print(coherence)
"""
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
-1.7493528544065975
"""
def show_coherence(corpus, dictionary, start=6, end=15):
iter_num = []
per_value = []
coh_value = []
for i in range(start, end + 1):
model = LdaModel(corpus=corpus, id2word=dictionary,
chunksize=1000, num_topics=i,
random_state=7)
iter_num.append(i)
pv = model.log_perplexity(corpus)
per_value.append(pv)
cm = CoherenceModel(model=model, corpus=corpus,
coherence='u_mass')
cv = cm.get_coherence()
coh_value.append(cv)
print(f'num_topics: {i}, perplexity: {pv:0.3f}, coherence: {cv:0.3f}')
plt.plot(iter_num, per_value, 'g-')
plt.xlabel("num_topics")
plt.ylabel("perplexity")
plt.show()
plt.plot(iter_num, coh_value, 'r--')
plt.xlabel("num_topics")
plt.ylabel("coherence")
plt.show()
show_coherence(corpus, dictionary, start=6, end=15)
"""
/usr/local/lib/python3.10/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
and should_run_async(code)
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 6, perplexity: -7.035, coherence: -1.701
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 7, perplexity: -7.023, coherence: -1.735
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 8, perplexity: -7.023, coherence: -1.547
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 9, perplexity: -7.007, coherence: -1.891
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 10, perplexity: -6.996, coherence: -1.888
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 11, perplexity: -7.027, coherence: -2.164
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 12, perplexity: -7.019, coherence: -2.018
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 13, perplexity: -7.025, coherence: -2.255
WARNING:gensim.models.ldamodel:too few updates, training might not converge; consider increasing the number of passes or iterations to improve accuracy
num_topics: 14, perplexity: -7.020, coherence: -2.082
num_topics: 15, perplexity: -7.019, coherence: -2.521
"""
※ 해당 내용은 <파이썬 텍스트 마이닝 완벽 가이드>의 내용을 토대로 학습하며 정리한 내용입니다.
반응형
'텍스트 마이닝' 카테고리의 다른 글
감성 분석 (1) (0) | 2023.07.16 |
---|---|
토픽 모델링으로 주제 찾기 (5) (0) | 2023.07.15 |
토픽 모델링으로 주제 찾기 (3) (0) | 2023.07.13 |
토픽 모델링으로 주제 찾기 (2) (0) | 2023.07.12 |
토픽 모델링으로 주제 찾기 (1) (0) | 2023.07.11 |