How to sort refernces of the specific topic by a number of citations they have? #30
-
Hey! In the example of the [(https://nlesc.github.io/litstudy/example.html#Topic-modeling)] and Advanced Topis modeling, we can show top N papers that most stongly belong to the specific topic to check the results. Is there any implemented method that allows to show the papers in the particular topic, but sorted by the number of citations? So the most cited works will be shown first. Thank you for the help. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi. Unfortunately, there is not method in The difficult part is that the topic modeling gives a ``fuzzy'' assignment of topics to documents. This means that each document does not belong to one single topic, but instead, has weights towards several topics. For example, a document can be for 30% on topic A and 70% on topic B. There are two solutions: Solution 1The first solution is to get the best topic for each document (this means the topic with the highest weight) and then, for each topic, select the documents and sort them by citation count: # Select best topic for each document
best_topics = model.best_topic_for_documents()
# Iterate over topics
for topic_id in range(num_topics):
# Select documents that belong to topic `topic_id`
topic_docs = docs[best_topics == topic_id]
# Sort the documents by citation count
sorted_docs = sorted(docs, key=lambda doc: doc.citation_count, reverse=True) Solution 2The second solution is to set a threshold (for example, 10%) and then, for each topic, select the document that score above this threshold and sort them by citation count: # Threshold of 10%
threshold = 0.1
# Iterate over topics
for topic_id in range(num_topics):
# Select weights of documents towards this topic
topic_weights = model.doc2topic[:, topic_id]
# Select documents that belong to topic `topic_id`
topic_docs = docs[topic_weights > threshold]
# Sort the documents by citation count
sorted_docs = sorted(docs, key=lambda doc: doc.citation_count, reverse=True) |
Beta Was this translation helpful? Give feedback.
Hi.
Unfortunately, there is not method in
litstudy
, but it easy to do it by hand in python.The difficult part is that the topic modeling gives a ``fuzzy'' assignment of topics to documents. This means that each document does not belong to one single topic, but instead, has weights towards several topics. For example, a document can be for 30% on topic A and 70% on topic B.
There are two solutions:
Solution 1
The first solution is to get the best topic for each document (this means the topic with the highest weight) and then, for each topic, select the documents and sort them by citation count: