Critique of TikTok Dataset & Collection
The Tiktok dataset was collected using ParseHub which is a Chrome extension web-scraper. The dataset includes columns of video links, usernames, comments, and posting dates, as a snapshot of online conversations using the term “situationship”. Looking through nearly 15,000 comments resulting from three highly commented-on videos gives us insight into how this term is used and perceived. The dataset allows us to monitor the occurrence and frequency of “situationship” within conversations for meaning and relevance. However, some limitations exist in our approach. One major limitation is that the scope of data collection can be very narrow. With an analysis focused on just three videos, we could be missing the bigger picture and broader, more varied conversations happening on TikTok. It is a small sample size that could contain biased or incomplete insights for not being representative enough of how different categories of users discuss the term. Moreover, the top three videos may bias our results toward viral content and hence not reflect the everyday use of “situationship”. A little of the contextual richness of social media interactions gets washed out in doing something as necessary as data cleaning, especially removing unnecessary words and emojis. Overall, many of the comments were nuanced by the presence of emojis and other colloquial expressions that would have been lost if there was an absence, hence a lesser understanding of how the term is used. On the bright side, some very useful tools for conducting natural language processing were available through Python for text analysis. Once cleaned, data organized in a CSV file enabled us to conduct sentiment analysis regarding “situationship” and determine the significant themes and frequencies.