EXTRACTING TOPICS FROM A TV CHANNEL'S FACEBOOK PAGE USING CONTEXTUALIZED DOCUMENT EMBEDDING
Keywords: AraBERT, ELMO, Neural topic model, LDA, ProdLDA, Topic coherence
Abstract. Topic models extract meaningful words from text collection, allowing for a better understanding of data. However, the results are often not coherent enough, and thus harder to interpret. Adding more contextual knowledge to the model can enhance coherence. In recent years, neural network-based topic models become available, and the development level of the neural model has developed thanks to BERT-based representation. In this study, we suggest a model extract news on the Aljazeera Facebook page. Our approach combines the neural model (ProdLDA) and the Arabic Pre-training BERT transformer model (AraBERT). Therefore, the proposed model produces more expressive and consistent topics than ELMO using different topic model algorithms (ProdLDA and LDA) with 0.883 in topic coherence.