Extracting Topics with Jointly Embedded Topics

What is Top2vec?

  • Get the number of detected topics.
  • Get topics.
  • Get topic sizes.
  • Get hierarchical topics.
  • Search topics by keywords.
  • Search documents by topic.
  • Search documents by keywords.
  • Find similar words.
  • Find similar documents.

Model Description:-

Step1:-

as we see from the figure the words that represent the best are close enough

Step 2:-

Step 3:-

Red areas are the noises and the other colors are the labels from HDBSCAN

Step 4:-

Red outliers are not used to calculate centroid.

Install top2vec

pip install top2vec
from top2vec import Top2Vec
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))

model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)

--

--

--

I build product with passion. Follow me for product related blogs.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Using XTREME For Evaluating Cross-lingual Generalization

Federated Learning 101 with FEDn

Opendoor’s algorithm

K-Nearest Neighbors Classification from Scratch with NumPy

Neural Machine Translation

“Job-Posting Prediction using Naive Bayes Classifier”

Neural Networks Project

Summary of BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Abhishek Patnaik

Abhishek Patnaik

I build product with passion. Follow me for product related blogs.

More from Medium

Creating a chatbot from scratch in Python using NLTK — Data Science

Classify text by language in a column of DataFrame with Python. NLP process.

Fraud Detection with Machine Learning Algorithms — Saxon AI

Creating an E-Commerce Product Category Classifier using Deep Learning — Part 2