UPDATE: Photobucket ripper

pb-ripper

Important: new version released along with new PB design. Read more about it here: http://9v.lt/blog/photobucket-ripper-update/


First I wrote about a method to grab someone else’s pictures from photobucket and I provided a script to do so, even though the method was working, the script was rather primitive and not automated at all… well it was OK for my one time use. But now when I have more stuff to grab and I have to update the album constantly, it is not very nice as I have to do it all over again.
So I decided I’ll make a better, fully automated script and here it is.

To use the script you will need to install a “mechanize” library for Python (and of course python itself, lol), but don’t worry, I provided everything.
As a developer I like to keep things simple and with this script, ripping those pictures is really simple.
If the file already exists, it will skip it.

The script with required libraries can be downloaded here: PhotobucketGetter.zip

Now the usage is very simple. Run it with a -h parameter to see the help.

$ python PhotobucketGetter.py -h
usage: PhotobucketGetter.py [-h] -u  [-p] [-f] [-d] [-n] [-t]

Script to grab and save pics from a photobucket album automatically

optional arguments:
  -h, --help       show this help message and exit
  -u , --url       Album URL
  -p , --passwd    Album password (if any)
  -f , --filter    What to download (pic/vid/all)
  -d , --dir       Where to download (folder name)
  -n, --nofolder   If this is used, then downloaded files will not be put in
                   separate folders
  -t, --terminate  If to terminate on error or continue grabbing

So, if the album you want to rip doesn’t have a password, just input

PhotobucketGetter.py -u URL

if album has a password, add additional “-p mypasswd” parameter.

Other options are optional, add “-f pic” to download pictures, “-f vid” to download just videos, or “-f all” or don’t add it at all, to download everything.
“-d” parameter is to specify how to name the folder where everything will be put – default is “PhotobucketGetter”.

Now the script itself is a bit complicated, but easy to understand (if that makes sense :P)

'''
Script to grab pics from a photobucket album automatically
and save them locally. Provide password if the album is protected
and the album URL.
======
If you really like this script, then consider a small donation for
my sleepless efforts keeping this script working and up-to-date.
Go to my website: http://9v.lt and press a Donate button on the right :)
======
Author: Kulverstukas
Website: http://9v.lt
Shouts to Evilzone.org and Programisiai.lt
 
Version: 0.5
 
http://9v.lt/blog/update-photobucket-ripper/
'''
 
import os
import sys
import re
import urllib
import mechanize
from argparse import ArgumentParser
 
#============================================
class ImageMethods:
    def downloadImages(self, links, browser):
        errorCounter = 0
        linksList = [i.strip() for i in re.split(',\s{2,}', links)]
        print "* Found "+str(len(linksList))+" images..."
 
        print "* Compiling regex patterns and downloading the pictures..."
        picUrl = re.compile('url: "(https?://(.*?)\.photobucket\.com/albums/(.*?))",')
        picName = re.compile('title: "(.*?)"')
 
        counter = 1
        for link in linksList:
            name = picName.search(link).group(0).replace("title: \"","")[:-1]
            name = stripSymbols(name)
            if (noFolder == False):
                if (name == ""):
                    name = noNamePic
                if (os.path.exists(mainFolder+'/'+name) == False):
                    os.mkdir(mainFolder+'/'+name)
            picLink = picUrl.search(link)
            picLink = picLink.group(0).replace("url: \"", "")[:-2]
            fileName = os.path.basename(picLink)
            fullPath = ""
            if (noFolder):
                fullPath = "%s/%s" % (mainFolder, fileName)
                name = mainFolder
            else:
                fullPath = "%s/%s/%s" % (mainFolder, name, fileName)
            if (os.path.exists(fullPath)):
                print '%d. Retrieving "%s" into "%s" folder' % (counter, fileName, name)
                print "*** "+name+'/'+fileName+" exists. Skipping..."
            else:
                try:
                    size = CalculateSize().calculateSize(browser.open(picLink).info().get("Content-Length"))
                    print '%d. Retrieving "%s" into "%s" folder -- Size: %s' % (counter, fileName, name, size)
                    urllib.urlretrieve(picLink, fullPath)
                except KeyboardInterrupt:
                    print " Terminating..."
                    sys.exit(0)
                except Exception as e:
                    if (terminate):
                        print " Terminating with message: %s" % e
                        sys.exit(0)
                    else:
                        print " Error grabbing this image. Continuing..."
                        errorCounter += 1
            counter += 1
        return errorCounter
 
    def grabSlideshowData(self, htmlCode):
        data = re.search("PB\.Slideshow\.data \= \[\n.*\];", htmlCode)
        if (data == None):
            print "*** Something went wrong grabbing picture data. Terminating..."
            sys.exit(0)
        data = data.group(0).replace("PB.Slideshow.data = [", "").replace("];", "").strip()
 
        return data
#============================================
class VideoMethods:
    def downloadVideos(self, list, browser):
        errorCounter = 0
        counter = 1
        print "* Found "+str(len(list))+" videos..."
        for item in list:
            url = item[0]
            name = item[1]
            name = stripSymbols(name)
            if (noFolder == False):
                if (name == ""):
                    name = noNamePic
                if (os.path.exists(mainFolder+'/'+name) == False):
                    os.mkdir(mainFolder+'/'+name)
            fileName = os.path.basename(url)
            fullPath = ""
            if (noFolder):
                fullPath = "%s/%s" % (mainFolder, fileName)
                name = mainFolder
            else:
                fullPath = "%s/%s/%s" % (mainFolder, name, fileName)
            if (os.path.exists(fullPath)):
                print '%d. Retrieving "%s" into "%s" folder' % (counter, fileName, name)
                print "*** "+name+'/'+fileName+" exists. Skipping..."
            else:
                try:
                    size = CalculateSize().calculateSize(browser.open(url).info().get("Content-Length"))
                    print '%d. Retrieving "%s" into "%s" folder -- Size: %s' % (counter, fileName, name, size)
                    urllib.urlretrieve(url, fullPath)
                except KeyboardInterrupt:
                    print " Terminating..."
                    sys.exit(0)
                except Exception as e:
                    if (terminate):
                        print " Terminating with message: %s" % e
                        sys.exit(0)
                    else:
                        print " Error grabbing this video. Continuing..."
                        errorCounter += 1
            counter += 1
        return errorCounter
 
    def grabVideoLinks(self, html):
        list = []
        pattern = "<img src\=\""+album.replace(".", "\.")+".*/>"
        matchObj = re.search("http://(.*?)\.", pattern)
        pattern = pattern.replace(matchObj.group(0), "http://[\w\d]*\.")
        rawList = re.findall(pattern, html)
        videoName = ""
        videoUrl = ""
        for link in rawList:
            # grab the video name and trim crap from it
            videoName = re.search("title=\"(.*?)\"", link).group(0).replace("title=\"","")[:-1]
            # grab the video url and leave only the URL to video
            videoUrl = re.search("alt=\"(.*?)\"", link).group(0).replace("alt=\"","")
            videoUrl = videoUrl.replace(re.search("\s(.*?)\"", videoUrl).group(0), "")
            videoUrl = os.path.join(album, videoUrl)
            list.append((videoUrl, videoName))
 
        return list
#============================================
class CalculateSize:
    def calculateSize(self, bytes):
        abbrevs = ["kB", "mB", "gB"]
        if (bytes == None):
            size = "0 kB"
        else:
            bytes = float(bytes)
            if (bytes < 1024.0):
                size = "%d B" % (bytes)
            else:
                for abbrev in abbrevs:
                    if (bytes >= 1024.0):
                        bytes = bytes / 1024.0
                        size = "%.2f %s" % (bytes, abbrev)
        return size
#============================================
def stripSymbols(input):
    badSymbols = ['\\', '/', ':', '*', '?', '"', '<', '>', '|']
    replacement = '~';
    i = ''
    for i in badSymbols:
        input = input.replace(i, replacement);
    return input
#============================================
def begin():
    print '\n* Creating "%s" folder...' % mainFolder
    if (os.path.exists(mainFolder)):
        print "*** Folder exists. Skipping..."
    else:
        os.mkdir(mainFolder)
 
    print "* Initiating connection to Photobucket..."
    browser = mechanize.Browser()
    browser.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')]
    browser.set_handle_equiv(False)
    #browser.set_debug_http(True)
    browser.set_handle_robots(False)
    try:
        browser.open(album)
    except KeyboardInterrupt:
        print " Terminating..."
        sys.exit(0)
    except Exception as e:
        print " Terminating with message: %s" % e
        sys.exit(0)
 
    # see if the album has a password field
    rawHtml = ""
    for form in browser.forms():
        if (form.name == "frmLogin"):
            if (passwd == ""):
                print "*** Album requires password, none given. Terminating..."
                sys.exit(0)
            print "* Album requires password... using '"+passwd+"'"
            browser.select_form(name="frmLogin")
            browser.form["loginForm[password]"] = passwd
            print "* Submitting password..."
            browser.submit()
            break
 
    if ((filter == "pic") or (filter == "all")):
        print "* Reading image HTML code..."
        rawHtml = browser.open(album+slideshowFilter).read()
        imgMethods = ImageMethods()
        slideshowData = imgMethods.grabSlideshowData(rawHtml)
        errors = imgMethods.downloadImages(slideshowData, browser)
        if (terminate == False):
            print "     There were %d skipped images while grabbing" % errors
        print "     Done grabbing images!"
 
    if ((filter == "vid") or (filter == "all")):
        print "* Reading video HTML code..."
        rawHtml = browser.open(album+videoFilter).read()
        vidMethods = VideoMethods()
        links = vidMethods.grabVideoLinks(rawHtml)
        errors = vidMethods.downloadVideos(links, browser)
        if (terminate == False):
            print "     There were %d skipped videos while grabbing" % errors
        print "     Done grabbing videos!"
#=============================================
parser = ArgumentParser(description="Script to grab and save pics from a photobucket album automatically")
parser.add_argument('-u', '--url', help='Album URL', required=True, metavar="")
parser.add_argument('-p', '--passwd', help='Album password (if any)', metavar="")
parser.add_argument('-f', '--filter', help='What to download (pic/vid/all)', default="all", metavar="")
parser.add_argument('-d', '--dir', help='Where to download (folder name)', default="PhotobucketGetter", metavar="")
parser.add_argument('-n', '--nofolder', help='If this is used, then downloaded files will not be put in separate folders', action="store_true")
parser.add_argument('-t', '--terminate', help='If to terminate on error or continue grabbing', action="store_true")
args = parser.parse_args()
 
#====== global vars, change values here ======
noNamePic = 'NoName'
slideshowFilter = "?albumview=slideshow"
videoFilter = "?mediafilter=videos"
mainFolder = args.dir
album = args.url
passwd = args.passwd
filter = args.filter
noFolder = args.nofolder
terminate = args.terminate
if ((filter != 'pic') and (filter != 'vid') and (filter != 'all')):
    filter = 'all'
#=============================================
 
begin()

27 comments

  1. Kulverstukas says:

    Little update: there was a small error with Regex patterns and other small bugs. Now it should work flawlessly :)

  2. Kulverstukas says:

    Another little update: Script has been fixed to work with the new Photobucket Beta. New website caused the script to crash at first. This has been fixed by changing the User-Agent to Google bot’s.

  3. Kulverstukas says:

    Another update: changed the script a bit to compensate for small images (less than a kilobyte). Script was crashing when it had to calculate size for images less than a kilobyte.

  4. Kulverstukas says:

    Ok, the script is now updated once again. It will replace all forbidden symbols with “~”. Because PB allowed file names to contain symbols that Windows does not allow, so the script would crash.
    Also in this version 2 new options were added: -n and -t.
    Read the OP for more information :)

    Those two options were added after a user had reported a crash on some PB links he tried.
    For -n, Every file was put in a separate folder, and each folder just had one file, so I figured it would be much nicer to have all those files in one folder, instead of hundreds.
    For -t, I had a very weird error on one image to which I did not find a solution, so I implemented a workaround, which IMO is just as good…

    Download from the same location

  5. Six says:

    Very nice Kulverstukas. Thank you it’s appreciated.

  6. Six says:

    One suggestion. The ability to read in a text file containing album links would be nice. That and or being able to grab everything from the root of the account.

  7. Kulverstukas says:

    Thanks, I’ll look into this :)

  8. derp says:

    I noticed that it misses videos. -f all or -f vid or with the -f option omitted all failed to grab videos on a bucket.

    A little bug is that when you use the -n option to specific a folder to download into, it still says; “Retreiving (filename) into “NoName” folder”. Maybe because it creates a folder named, “NoName” instead of just the folder you want the files to download to.

  9. Kulverstukas says:

    It works fine for me… Might be a compatibility issue with some albums I haven’t tried. If you could give me the link of the album you’re trying to download, that would make it much easier for me to debug the problems :)
    Because it works well with albums I have as samples… If you don’t want to post it publicly then send it to my email.

    By the way, make sure you run the script with Python version no lower than 2.7. Versions up to 3 should be OK.

  10. leo says:

    Hi Kulverstukas, I was wondering if you could give me some minor assistance as I’m having trouble getting the script to work. Here’s the string I’m entering into terminal as well as the output:

    I’m not too sure what I’m doing wrong, is it my url? Thank you in advance, and thank you for taking the time to create such a script.

    python Photobucketgetter.py -u http://s1304.photobucket.com/user/artboy365/library/

    * Creating “PhotobucketGetter” folder…
    *** Folder exists. Skipping…
    * Initiating connection to Photobucket…
    * Reading image HTML code…
    *** Something went wrong grabbing picture data. Terminating…

  11. Kulverstukas says:

    Thanks for the feedback :) I’ll look into that as soon as I can. Probably today. Check this post for details.

    EDIT: Ok so, it seems Photobucket has transformed to normal from BETA and the script will not work anymore.
    I will try to re-write it to be compatible with new Photobucket when I have time to do so :/ thank you for reporting this!

  12. leo says:

    No problem at all, I hope it isn’t too bothersome for you.

  13. Kyle says:

    I’ve fixed the login portion. My regex-fu is not strong enough for the rest.

    The Slideshow data now shows up as “Pb.Data.add(‘libraryAlbumsPageCollectionData’, {” with the link to each image part of CollectionData and then fullsizeURL for the direct link.

    It would probably be easiest to grab everything part of libraryAlbumsPageCollectionData and parse it in to something like XML and then reference each attribute needed by name.
    —————————–

    for form in browser.forms():
    if (form.attrs[“id”] == “guestLoginForm”):
    if (passwd == “”):
    print “*** Album requires password, none given. Terminating…”
    sys.exit(0)
    print “* Album requires password… using ‘”+passwd+”‘”
    browser.select_form(predicate=lambda form: ‘id’ in form.attrs and form.attrs[‘id’] == “guestLoginForm”)
    browser.form[“visitorPassword”] = passwd
    print “* Submitting password…”
    browser.submit()
    break

  14. Kulverstukas says:

    Thank you Kyle :)

    Yes I too noticed that PB has put slideshow data in the album code, so it’s not hard to parse it all. I’ll probably take a crack at this next week :)

  15. Kulverstukas says:

    BUMP: Photobucket said they’ll implement the slideshows in the upcoming weeks so let’s wait and see how that will go. If it will be as it was then the script will start to work and I won’t have to rewrite it again.

    You can read about it here:
    http://photobucket.zendesk.com/entries/22494178-current-list-of-missing-features

  16. WYP says:

    They finally added slideshows!

  17. Kulverstukas says:

    Thank you for letting me know :) looks like they are doing it completely differently now. I’ll have to think of new methods to get all the images and videos…

  18. Pika says:

    So the current method wont work anymore? I tried to run it and I get the following error: “Terminating with message: can’t fetch relative reference: not viewing any documents”

  19. Pika says:

    Nevermind, got it to work. Didn’t know I had to put it “http://” in the beginning or else it wont establish a connection properly. Also, when I didnt input “-p password”, it didn’t mention that the program was terminated due to a missing password, it just said “Terminating…” Probably just something in my end, but maybe you should look into that just in case.

    Either way, great script, works like a charm.

  20. Kulverstukas says:

    Thank you, I’ll look into it. But I have released a new version of the script as the website was updated and this version became obselete. You can find the new version here: http://9v.lt/blog/photobucket-album-downloader/

  21. macromed says:

    So what is required for the updated version? I don’t understand what Kyle has written. Thanks for the script so far, I have tried it out on open profiles and it works great.

  22. Kulverstukas says:

    What Kyle has written was intended for me :) updated version works with the new photobucket design, because this script, version 0.5, is obsolete after the PB update.
    The script works when you download because I have replaced the download link with a new script. If you would open the script with notepad, you would see Version 0.7 in the script. Code posted here is for version 0.5 :)
    Heh I kinda forgot I have replaced the files :P so downloading from either blog post will give you an updated version.

  23. macromed says:

    I got it to work and it works just great, super fast too. Thanks a lot. Is this script able to collect images from private accounts if messed around with?

  24. Kulverstukas says:

    Yes it should work with any album, as long as you have a link to it and a password if it is protected :)

    Thanks for the feedback ;)

  25. Hokidoki says:

    Hi, I tried using the script and i’m getting a “terminating with message: can’t fetch relative reference: not viewing any document”

    Do you happen to have a solution for this?

    the URL: s297.photobucket.com/user/bowen_shade/library

    Let me know if you have any thoughts..

  26. John says:

    Can you PLEASE make a video tutorial for this? Yea I’m a noob and I don’t even know what Python is! I installed it but idk how to “install a ‘mechanize’ library for Python”. From there I am completely lost… Please help me lol

  27. Kulverstukas says:

    You should be able to google your way into making it work if you put more effort into it. However, most likely this won’t work anymore, because PB changed so much since then.

Leave a Reply

Your email address will not be published. Required fields are marked *