NLP + Culture
Let’s keep an open mind and think of culture conceptually as an arbitrary thing. Culture is a set of “agreed upon” rules. Meaning “groups” set them and we all adhere to them. Now think of that as an arbitrary process somewhat. That would mean that it is defined by where you are, what time in history you are, and what your community looks like.
Let’s think about how that helps people survive throughout history. If you are constantly changing and updating how you view your world, you can do more with it. I have a more formal education in the English language. My mother was an English teacher. And I would say looking at how that language structure has changed throughout my lifetime (ie. emojis, shortening of sentences to phrases, etc.) is a reflection of our world. TL;DR tells a lot about our culture. And that isn’t a bad thing.
Let’s go ahead and temporarily remove ethical or political considerations around how we say things. And let’s think about things from a scientific point of view. Why wouldn’t we embrace the changes in our language, and thus embracing the changes in our culture and communities? I can confidently say, this means we are doing more with what we have than any other culture throughout history. We may not be reading as many books as those compared to a decade ago, but we are consuming in massive quantities more than any other contemporary throughout history.
So now, let’s think about the power of social media in terms of natural language processing (NLP). What exactly is embedded in these terms? NLP breaks down the words and counts them in terms of how often they occur in a sentence, a paragraph, a document, or a corpus (ie. Tfidvectorizer). We can also remove common words, or even use Amazon Turk to rate the sentiment score (ie. VADER). But, what else can be done?
- Stemming refers to breaking down a word to its base to get at the root. We could do an analysis of stemming and group words based on their root language. This will give further tracing advantage for where the main ideas are coming from; thus tracing them back to their cultural identity.
- The number 3 is a magical number across most cultures. At least, 30 seems to be the max number of people you can intimately know at any one time. Therefore, let’s group in variations of threes. This can be done in n-grams hyperparameter of settings. But instead of just tuning these hyperparameters, let’s actually analyze what those groups of 3 words look like.
- Let’s view sentiment in another fashion. We can break expressions in any language, across any spectrum of time, into things(nouns), actions(verbs), and stuff describing them(adjectives and adverbs). We can sort and group these adjectives and adverbs into positive, negative, or neutral (+1,-1,0) determined across users within the context of a platform.
I hope this stirs some ideas on how to combine the ever-changing context of cultures into NLP analysis. Viewing language as a window into culture and ideas is especially important in light of the misinformation that is circulated. For companies and businesses looking to create chatbots based on data alone, it will impede progressing conversation if confined to just mathematics alone, without cultural context. Chatbots must learn why their customers ask the way they do, not just what they ask.