Fight Linguistic Bias in NLP and Make Your First WordCloud

There is a large disparity at play in data science within the realm of Natural Language Processing (NLP). And as both an aspiring data scientist and a linguaphile, I’m concerned. The NLP sector of data science is overwhelmingly focused on English, so much so that Professor Emily M. Bender of the University of Washington felt the need to tweet this:

And thus began the #BenderRule. This was not a hashtag that Professor Bender created herself in order to become a trending topic, but rather a personal view of hers that struck a chord with many other linguists and computer scientists who felt Bender had simply stated it best. In response to the question “Is there a formal statement of the Bender rule?” …


Think Ethically from the Beginning

As an aspiring data scientist immersed in a data science bootcamp at the moment, “how can I work ethically from the get go?” That’s the question that pervaded my mind as I watched The Social Dilemma and listened to Tristan Harris and Cathy O’Neil, among many others, discuss the dangers of big data and these unmanned, black box algorithms that penetrate our subconscious minds and make decisions for us that we no longer understand. The stories of these whistleblowers lumped into a non-existent group informally known as the “Conscience of Silicon Valley” are disturbing. But they aren’t wrong, and they’re speaking with the authority of hindsight. So many of the data scientists and programmers interviewed throughout the documentary made mention of their lack of foresight into what these algorithms could become, or when an ethical dilemma was posed, the financial bottomline was the knockout punch. In other words, it was too late. …

Geography is making a comeback, and it owes a big “thank you” to Data Science

Image for post
Image for post
Photo by Annie Spratt on Unsplash

Geography, That’s Like Maps and Stuff, Right?

If your experience learning geography growing up was anything like mine, then you too can name (almost) all 50 states and their corresponding capitals. Perhaps if you’re not American you are able to name the different provinces of your nation and their capitals, and probably some more. Let’s give ourselves a pat on the back for that! It probably won’t shock you then, when I tell you I decided to major in the field of “naming US capitals,” more broadly known as “Geography,” I was greeted with a lot of blank stares. From what I can tell, most Americans are surprised you can still major in Geography at a university level. And believe me, so was I! …

Image for post
Image for post
Network graph of a GitHub repository

If you’re anything like me, getting started on your data science journey, GitHub loomed about you as this pervasive treasure trove of open source code. You weren’t exactly sure how to get involved, but you knew one day future employers would look to it as your portfolio of sorts. You aspired to one day join the ranks of GitHub and share some of your own elegant code for the world to pour over and glean some meaningful insight or inspiration from it. Just me?

Well if that was you, then like me, you didn’t really start to scratch the surface of the beauty of GitHub until you began to understand its use as a tool for team collaboration. When that sank in, it took you back to the first time you used a Google doc and your friend typed the words “Hi {your name here}” from another computer to your utter amazement. There is something still so incredible to me about the ability to collaborate with anyone — in any place — at any time! …


David Bruce

Data Scientist. Lover of language. Always learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store