Johns Hopkins University
SCALE 2022
Research Scholar
Jun 2022 - Sept 2022- Using PyTorch, developed and trained a BERT based deep learning model that can better handle multiauthored documents
- Developed a system to measure whether a set of Reddit posts have multiple contributing authors through analyzing the spread and clustering of embeddings generated from a BERT based model
- Significantly beat the baseline through an accuracy of 96.5% when training and testing on a dataset that will only group posts that belong to the same subreddit and an accuracy of 99% when training and testing on a dataset that has no restrictions on how posts are grouped
- When training and testing on the more difficult dataset and limiting the number of embeddings collected, my model achieved an accuracy of 91.95% with only eight embeddings from each group
- Developed a model that is capable of detecting the number of authors that contributed to a set of documents
- When training and testing on a dataset that contains documents that were written by one, two or three authors, my model achieved an accuracy of 90.85%
- SCALE 2022 is not complete yet so we hope for more exciting results to report