Extracting Topics with Jointly Embedded Topics

What is Top2vec?

In top2vec, which leverages joint document and word semantic embedding to find topic vectors.

  • Get the number of detected topics.
  • Get topics.
  • Get topic sizes.
  • Get hierarchical topics.
  • Search topics by keywords.
  • Search documents by topic.
  • Search documents by keywords.
  • Find similar words.
  • Find similar documents.

Model Description:-

Let's discuss the model description of top2vec. The following 4 methods are required to build this model.

Step1:-

Creating Semantic Embeddings

as we see from the figure the words that represent the best are close enough

Step 2:-

Creating a low dimensional vector space

Step 3:-

Finding the dense cloud

Red areas are the noises and the other colors are the labels from HDBSCAN

Step 4:-

Calculate Centroids in Original Dimensional Space

Red outliers are not used to calculate centroid.

Install top2vec

pip install top2vec
from top2vec import Top2Vec
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Patnaik

Abhishek Patnaik

91 Followers

I build product with passion. Follow me for product related blogs.