Extracting Topics with Jointly Embedded Topics

What is Top2vec?

  • Get the number of detected topics.
  • Get topics.
  • Get topic sizes.
  • Get hierarchical topics.
  • Search topics by keywords.
  • Search documents by topic.
  • Search documents by keywords.
  • Find similar words.
  • Find similar documents.

Model Description:-


as we see from the figure the words that represent the best are close enough

Step 2:-

Step 3:-

Red areas are the noises and the other colors are the labels from HDBSCAN

Step 4:-

Red outliers are not used to calculate centroid.

Install top2vec

pip install top2vec
from top2vec import Top2Vec
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)




I build product with passion. Follow me for product related blogs.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Deep Learning — Introduction

Estimating constrained precision matrices with TensorFlow


NFL Play Prediction Using Computer Vision

Combining Satellite Imagery and machine learning to predict poverty

Take a Deep Dive into NLP at ODSC APAC 2021

Handling Text Data Quality Issues: Can yuo h andle thsi?

Identify Fish using Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Patnaik

Abhishek Patnaik

I build product with passion. Follow me for product related blogs.

More from Medium

Text Preprocessing in NLP With Python — II

Term Frequency — Inverse Document Frequency

An image displaying a few sentences


AI Application to Demonstrate K-Means Clustering Using H2O Wave