RE: Lessons in Machine-Learnt Racism
Good article. Our machine learning algorithms have no concept of words that are more intended to cause injury and emotional distress than actual communication. The math models are constructed from a corpus of human-created text, and words will be chosen later by the model/algorithm if they appeared with any frequency in the corpus.
The lesson is that we shouldn't use populist communication channels (like Twitter and Facebook) to create a training corpus. There are too many people in the world who fall into low-brow behavior, and their low thoughts are infecting social media. Of course, it doesn't help that the social media companies have been utterly irresponsible not guarding against abusive language, wrapping themselves instead in the complacency of greed.
You might find my latest blog post of interest, because it addresses the lack of transparency in big data, and how biased data collection is causing problems.