[Crypto data science #1] Programmatical analysis of 200k tweets BEFORE and AFTER correction started. What has changed?

in #cryptocurrency7 years ago (edited)

Wondering how people in general feel about Bitcoin? Bullish or Bearish? Let's write a Twitter analyzer :) To find out, you obviously need a big amount of REAL DATA. And where can we usually find such data? Well, of course on social media...I still sometimes can't believe we're giving a private information about our lives out these daily....and for free. One of the hottest topics in IT last couple of years is machine learning and artificial intelligence. Combining a big amount of data from social media and knowledge in field of machine learning can bring some interesting insights into WHAT CHANGED IN BITCOIN SINCE DECEMBER BULLRUN. Let's get into it.

"We've harvested tens of millions of Facebook profiles to change how average American thinks and specifically targetted every individual according to his online actions"...Only several days ago former employee of Cambridge Analytica opened up how data from social media are being misused...

Comparison - Twitter users about bitcoin BEFORE and AFTER correction

Steps

  1. Define time interval
    Before correction
    Correction
    Bitcoin
    11.09.2017 - 16.12.2017
    17.12.2017 - 22.03.2018
    Ethereum
    15.11.2017 - 14.01.2018
    15.01.2018 - 22.03.201
  2. Get the data

    First problem was how to get the historical Twitter data, because their API is limited just to last two weeks. At the end, I've used this Git project which basically serves as a workaround to Twitter API and directly curl requests historical tweets. I've downloaded thousand tweets for every day in the interval and I've worked just with English tweets.

  3. Create a wordcloud from frequently used words

    I've created a basic script which converts each tweet to an array of words, which was requird as a parameter of the wordcloud constructor. I really could not believe constructor asked for such input which in my eyes is not correct at all. Trasforming tweets to array is just an extra step for a used but whatever...

    Bitcoin

    Comparison of wordclouds from before and after correction start:

    As you can see, both clouds contain many typical words common for crypto related tweets. To get more interesting output, I had to get rid of these. At first I've considered filtering out some % of mostly used words, but that might potentially affect the results, so at the end I've decided to just ignore specified words such as bitcoin, blockchain, cryptocurrency, crypto, ethereum, btc, https, twitter..After filtering these out, results look immediately much more interesting:

  4. Analysis. What probably can't go unnoticed is:
    • Now vs. Will - I think the fall/gain of these two words has been caused by sentences like "Future is happening NOW" or "Earn money with bitcoin NOW!" which were so popular during the whole bullrun. After the correction started, Tweets like "Bitcoin WILL rise again" or It is just a matter of time till Bitcoin WILL go higher again
    • Cash & Money - Both of these words massively gained on popularity after the correction. It's probably caused by the fact that many people switched back to FIAT currencies and locked their profits in cash and money
    • Coinbase vs. Binance - I think the initial popularity of the word Coinbase has been caused by sooo many new people entering the cryptoworld and most of these people probably used Coinbase to buy their first bitcoins - this was happening till mid December, during bitcoin bullrun. But in the second half of December and January, because of the crazy altcoin boom, people started moving more towards Binance which had a crazy rise up and is one of the biggest exchanges right now and people can buy most of the TOP100 altcoins there.
    • I feel there are some more interesting patterns in these wordclouds, but it might be possible that my eyes just see what brain thinks, so I'll let it on You, if you see there something more.
  5. Execute sentiment analysis

    I've trained 4 machine learning algorithms to evaluate tweet on scale from 0 to 1 regarding its positivity or negativity. To train the algorithms, I've used dataset called Sentiment140, which contains 1,6 million labeled tweets. After training phase, once algorithms were clever enough. I've feed to them our Bitcoin tweets and let them to evaluate these.

    After a brief look, the graph doesn't seem that much changed, but it's important to realize that Bitcoin is wordlwide phenomenon and everyone talks about it - therefore sentiment doesn't oscilate as crazy as some shitcoin would (which is talked about by 20 people only). Despite this, we can beautifully follow events of the last months and see them effecting sentiment and pushing it up-n-down between score 0.6 and 0.7:

    • During the whole bullrun towards the end of 2017, sentiment almost never got under the "moving average" line
    • We can nicely seen drastic drop in sentiment during the "Bitcoin Cash weekend", where it got to 50% evaluation and there has been a war on social media between these 2 camps
    • Sentiment got much worse already BEFORE correction! - COULD WE PREDICT THIS???
    • During the whole correction was sentiment of 3day groups almost always under moving average.
    • Only exception was the first bulltrap, where people immediately got excited, because we were "mooning" again:D
    • After 6th-7th February, there has been a huge sentiment boost, as we strongly bounced off 6k. Sentiment grew from under 0.6 to 0.7 (cca 20% gain)
    • The general trend is hardly to be seen on small time scale, but if we "zoom out", we can see an uptrend

    This graph looks bit weird but getting tweets for all those years would for sure kill my PC :D. On bigger timeframe, we can see there's an uptrend in average sentiment. Also, don't get angry at me, I'm actually not that negative on Bitcoin Cash, it's just for fun :D
  6. WOW, as always with coding, this took muuuuuch more time that originally estimated. I wanted to do ETHEREUM analysis as well but it'll have to be in another post cuz im straight up DEAD! Next coffee would kill me :)

    Last but not least, proof of originality of my work - wordcould of most popular hastags (hashcloud ?) over the whole 6 analyzed months:

    Thanks for reading!
    Martin

    ***


    You can find my latest posts here:

    1. DROPSHIPPING mania. How people passively earn thousands of $$$ and why I find it DISGUSTING...!!
    2. Are we getting more and more STUPID!? ...DEFENSE of 2 latest generations and the cloud storage
    3. [Skydive jump] How 10 minutes destroyed my next 2 weeks, but enhanced many of upcoming years
    4. 10minute freewrite SCHIZOPHRENIC entry - Same topic, 2 points of view, 2 entries...CHOOSE YOUR ATTITUDE!
    5. 12 hours in train - 1 weird comparison and yet another reason why I HATE alcohol

Sort:  

O nice, i really appreciate your work. Man you are right we are giving out our data to the public free of cost and anyone can use it.If someone uses it in a good manner, he/she can see the real benefit,s.

One question, are you the computer science graduate?

EDIT: WTF dude,don't lie.. you don't appreciate shit... I've just checked your stats - 90.20 % self upvoting....You crazy? Like this, Steemit won't get anywhere..gtfo really...also your question is horseshit, where would I learn to do such stuff if I wasnt computer science student ?!?...D.I.S.G.U.S.T.I.N.G

Really great information. Hopefully I can make it work for me. Or maybe I'll just keep sitting here strengthening my hands...

Haha sadly, my arms were strong as never before during the whole correction so far. Funny enough, I wish I was much weaker in the January and sold once the correction started forming :)

This was a really interesting post! Did not think of analyzing the cryptomarket with social media opinnions. This is a great way imo because the market is going with the hype and not really stable but very volatile through people's feelings. Bet it took you really much time to set everything up for this post

Upvoted and Resteemed!

Thx a lot!... you're a tech guy yourself so you know that stuff always takes more time than originally planned!

Biggest problem was actually to get the damn historical twitter data older than 2 weeks... They know these data are worth fortune so they don't give them out easily :D

Oh yes, you don't believe how often you want to repair something "simple" on motorcycles or other machines and it ends in chaos because it NEVER runs as planned.

But you got the data and made this great post. I bet there are many many possibilities you can use your program for analyzing the market. I will read them all

Great analysis @matkodurko. The video you included was also really interesting. Thank you for sharing all of this information. Resteeming;0)

Glad you liked it and thanks! :) Was a LOOOT of work, much more that I expected haha...yepp, the video is fucked up, crazy what's really going on "behind the curtain"

Edit: Heyy there was some weird prob and most of the text was missing, not sure if you've seen it all, so maybe check once again :D thx

The dedication in your work is evident, love the wordcloud script btw. As far as fakebook and other social networking sites go, I personally find it hard to believe people didn't expect their data to be used as it is. Certainly not saying I agree with what these companies are doing with their users' personal data and info, I just believe it is important we acknowledge our participation throughout this entire process of "changing culture", as it is described in the video.

Loading...

You got a 29.02% upvote from @steembloggers courtesy of @matkodurko!

Thanks for taking us step by step through the process! I hate when post show you a conclusion but fail to show you how they reached that conclusion. Please continue to share posts like these. I plan to start doing similar analysis I will be sure to share my finding with you and the community. If you are interested please let me know...

Very high level analysis i am very happy that i follow you, because of grappling related topic :D

Coin Marketplace

STEEM 0.25
TRX 0.19
JST 0.036
BTC 92920.99
ETH 3299.26
USDT 1.00
SBD 3.81