Python Script to Save All Your Posts As MarkDown or Html Files

gadrian (66)in SteemDevs • 5 years ago (edited)

I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.

I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)

If you are a dev and have been doing this for a while, you probably can write a more efficient script.

I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.

Features and options:

saves all your markdown posts as .md files
saves all your raw HTML posts as .html files
you can set a main sub-directory or sub-path in the current directory where the files will be placed
posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
you can save the posts for any account
you can save resteemed posts as well or not
you can add tags at the end of the post or not
title is automatically added as H1 at the beginning of the post

I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.

You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.

So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.

While I'm far from a Python or Steem dev expert, if you have questions let me know.

Feedback to improve from more experienced devs is welcomed as well. :)

import os
import sys
import json
from steem import Steem
s = Steem()

# script parameters
# =================

# author
author_name = 'testuser123'

# relative directory under which the posts will be saved (don't add a final "/"!)
main_save_dir = 'steem-posts-' + author_name

# structure of directories under which posts will be saved
# Options:
# primary-tag - posts are saved under their primary tag subdirectory
# year-month - posts are saved under the year-month of their creation date subdirectory
dir_struct_option = 'year-month'
print('Save posts by ' + dir_struct_option)

# bool flag to determine if tags are added at the end of the post or not
adding_tags_to_saved_post = True
print('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))

# bool flag to determine if to save resteemed posts of other authors as well
include_resteem_posts = False
print('Include resteemed posts? ' + str(include_resteem_posts))

# =====================
# end script parameters
#

#create main save directory (as a subdirectory or sub-path of the current directory)
try:
    os.makedirs(main_save_dir)
    print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)
except FileExistsError:
    print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)
except OSError:
    print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)

#save current dir
cur_dir_saved = os.curdir

# loops through all the posts of the given author
# we break out of the loop after we reach the last post of the author
i = 1
while True:
    
    #retrieve current blog post info
    #theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one
    try:
        blogs = s.get_blog(author_name, i, 1)
    except Exception:
        print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.')
        continue
    #is it empty? then we reached the end and we should break out of the loop
    if blogs == []: break

    #is it the author's post or a resteem?
    #if it's a resteem continue from the next iteration and resteems are not to be included
    if blogs[0]['comment']['author'] != author_name:
        if not include_resteem_posts:
            print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.')
            i += 1
            continue
        else:
            print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.')

    #choose the name of the subdir where to place the saved posts
    #(i.e. posts can be saved by primary-tag or date [year-month])
    if dir_struct_option == 'primary-tag':
        subdir_name = 'tags/' + blogs[0]['comment']['category']
    elif dir_struct_option == 'year-month':
        subdir_name = 'date/' + blogs[0]['comment']['created'][0:7]

    #attempt to create the subdir first
    if cur_dir_saved == '.':
        dir_name = main_save_dir + '/' + subdir_name
    elif cur_dir_saved == '/':
        dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name
    else:
        dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name

    #create the subdirectory/ies where we will place our files
    try:
        os.makedirs(dir_name)
        print('Directory ' + dir_name + ' created.')
    except FileExistsError:
        pass
    except OSError:
        print('Directory ' + dir_name + ' couldn\'t be created.')
        raise OSError

    #deserialize json_metadata
    json_metadata_str = blogs[0]['comment']['json_metadata']
    json_metadata_dict = json.loads(json_metadata_str)

    try:
        format = json_metadata_dict['format']
    except KeyError:
        print('Broken blog json before format key. Defaulting to "markdown+html".')
        format = 'markdown+html'

    #is the post markdown?
    if format == 'markdown+html' or format == 'markdown':
        #choose the filename as the blog post's permlink + ".md" extension
        filename = blogs[0]['comment']['permlink'] + '.md'
        
        if (adding_tags_to_saved_post):
            #get tags and create a string with them to add at the end of the post
            try:
                tags_str = '\n\n'
                for x in json_metadata_dict['tags']:
                    tags_str += '#' + x + ' '
            except KeyError:
                tags_str = ''
        else: tags_str = ''

        #get post body
        body = blogs[0]['comment']['body']

        #get post title
        title = blogs[0]['comment']['title']

        #format the body to also include title at the begining as H1 and tags (with #) at the end
        body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str
    #or is the post raw html?
    else:
        #choose the filename as the blog post's permlink + ".md" extension
        filename = blogs[0]['comment']['permlink'] + '.html'

        if (adding_tags_to_saved_post):
            #get tags and create a string with them to add at the end of the post
            try:
                tags_str = '\n\n'
                for x in json_metadata_dict['tags']:
                    tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> '
            except KeyError:
                tags_str = ''
        else: tags_str = ''

        #get post body
        body = blogs[0]['comment']['body']

        #get post title
        title = blogs[0]['comment']['title']

        #format the body to also include title at the begining as H1 and tags (with #) at the end
        body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str

    #write post to file (overwrite if exists)
    try:
        f = open(dir_name + '/' + filename, 'w')
        f.write(body_with_title_and_tags)
        f.close()
        print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.')
    except OSError:
        print('Something went wrong while attempting to write file ' + dir_name + '/' + filename)
        raise OSError

    i+=1

print('No (more) posts.')

Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.

#python #save-allposts #markdown #html #technology #neoxian #palnet

5 years ago in SteemDevs by gadrian (66)

$7.20

Sort:

Trending

[-]

petertag (60) 5 years ago

As a note, I use VS Code (because I'm a dev I guess) w/ an extension to preview .md files as I write them (basically like writing a post with preview), probably similar free apps to do it with that aren't as massive as VS Code though.

$0.02

2 votes

[-]

gadrian (66) 5 years ago

Yes, I used VS Code to write this Python script as well. Didn't try it for md though, but I will. Thanks for mentioning it.

$0.00

[-]

petertag (60) 5 years ago

Just checked it, I was using Markdown Preview Enhanced for the extension, looks like there are a few though. No problem, nice script man!

$0.00

[-]

gadrian (66) 5 years ago

Great, I'll check it out. Thanks again!

$0.00

[-]

sathyasankar (63) 5 years ago

Great.. I will try this out.

$0.00

1 vote

[-]

olaf123 (-12)(1)mutedSpammer 5 years ago

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

(Sorry for sending this comment. We are not looking for our self profit, our intentions is to preach the words of God in any means possible.)

Comment what you understand of our Youtube Video to receive our full votes. We have 30,000 #SteemPower. It's our little way to Thank you, our beloved friend.
Check our Discord Chat
Join our Official Community: https://steemit.com/created/hive-182074

$0.00

[-]

the-real-jesus (50) 5 years ago

My name is Jesus Christ and I do not condone this spamming in my name. Your spam is really fucking annoying @hiroyamagishi aka @overall-servant aka @olaf123 and your spam-bot army. This is not what my father, God, created the universe for. You must stop spamming immediately or I will make sure that you go to hell.

If anybody wants to support my eternal battling of these relentless religion spammers, please consider upvoting this comment or delegating to @the-real-jesus

$0.08

1 vote

[-]

hasanraza90 (25) 5 years ago (edited)

Thanks for explaining in such great depths, I work for web application development company like GoodCore and such information really comes handy at times. Thanks again

$0.00

[-]

steemitboard (66)mutedBot 5 years ago

$0.00

Reveal Comment

STEEM 0.16

TRX 0.23

JST 0.033

BTC 94214.44

ETH 2633.57

SBD 0.43

Python Script to Save All Your Posts As MarkDown or Html Files

Features and options:

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

Coin Marketplace