
Important: new version released along with new PB design. Read more about it here: http://9v.lt/blog/photobucket-ripper-update/
First I wrote about a method to grab someone else’s pictures from photobucket and I provided a script to do so, even though the method was working, the script was rather primitive and not automated at all… well it was OK for my one time use. But now when I have more stuff to grab and I have to update the album constantly, it is not very nice as I have to do it all over again.
So I decided I’ll make a better, fully automated script and here it is.
To use the script you will need to install a “mechanize” library for Python (and of course python itself, lol), but don’t worry, I provided everything.
As a developer I like to keep things simple and with this script, ripping those pictures is really simple.
If the file already exists, it will skip it.
The script with required libraries can be downloaded here: PhotobucketGetter.zip
Now the usage is very simple. Run it with a -h parameter to see the help.
$ python PhotobucketGetter.py -h usage: PhotobucketGetter.py [-h] -u [-p] [-f] [-d] [-n] [-t] Script to grab and save pics from a photobucket album automatically optional arguments: -h, --help show this help message and exit -u , --url Album URL -p , --passwd Album password (if any) -f , --filter What to download (pic/vid/all) -d , --dir Where to download (folder name) -n, --nofolder If this is used, then downloaded files will not be put in separate folders -t, --terminate If to terminate on error or continue grabbing
So, if the album you want to rip doesn’t have a password, just input
PhotobucketGetter.py -u URL
if album has a password, add additional “-p mypasswd” parameter.
Other options are optional, add “-f pic” to download pictures, “-f vid” to download just videos, or “-f all” or don’t add it at all, to download everything.
“-d” parameter is to specify how to name the folder where everything will be put – default is “PhotobucketGetter”.
Now the script itself is a bit complicated, but easy to understand (if that makes sense :P)
''' Script to grab pics from a photobucket album automatically and save them locally. Provide password if the album is protected and the album URL. ====== If you really like this script, then consider a small donation for my sleepless efforts keeping this script working and up-to-date. Go to my website: http://9v.lt and press a Donate button on the right :) ====== Author: Kulverstukas Website: http://9v.lt Shouts to Evilzone.org and Programisiai.lt Version: 0.5 http://9v.lt/blog/update-photobucket-ripper/ ''' import os import sys import re import urllib import mechanize from argparse import ArgumentParser #============================================ class ImageMethods: def downloadImages(self, links, browser): errorCounter = 0 linksList = [i.strip() for i in re.split(',\s{2,}', links)] print "* Found "+str(len(linksList))+" images..." print "* Compiling regex patterns and downloading the pictures..." picUrl = re.compile('url: "(https?://(.*?)\.photobucket\.com/albums/(.*?))",') picName = re.compile('title: "(.*?)"') counter = 1 for link in linksList: name = picName.search(link).group(0).replace("title: \"","")[:-1] name = stripSymbols(name) if (noFolder == False): if (name == ""): name = noNamePic if (os.path.exists(mainFolder+'/'+name) == False): os.mkdir(mainFolder+'/'+name) picLink = picUrl.search(link) picLink = picLink.group(0).replace("url: \"", "")[:-2] fileName = os.path.basename(picLink) fullPath = "" if (noFolder): fullPath = "%s/%s" % (mainFolder, fileName) name = mainFolder else: fullPath = "%s/%s/%s" % (mainFolder, name, fileName) if (os.path.exists(fullPath)): print '%d. Retrieving "%s" into "%s" folder' % (counter, fileName, name) print "*** "+name+'/'+fileName+" exists. Skipping..." else: try: size = CalculateSize().calculateSize(browser.open(picLink).info().get("Content-Length")) print '%d. Retrieving "%s" into "%s" folder -- Size: %s' % (counter, fileName, name, size) urllib.urlretrieve(picLink, fullPath) except KeyboardInterrupt: print " Terminating..." sys.exit(0) except Exception as e: if (terminate): print " Terminating with message: %s" % e sys.exit(0) else: print " Error grabbing this image. Continuing..." errorCounter += 1 counter += 1 return errorCounter def grabSlideshowData(self, htmlCode): data = re.search("PB\.Slideshow\.data \= \[\n.*\];", htmlCode) if (data == None): print "*** Something went wrong grabbing picture data. Terminating..." sys.exit(0) data = data.group(0).replace("PB.Slideshow.data = [", "").replace("];", "").strip() return data #============================================ class VideoMethods: def downloadVideos(self, list, browser): errorCounter = 0 counter = 1 print "* Found "+str(len(list))+" videos..." for item in list: url = item[0] name = item[1] name = stripSymbols(name) if (noFolder == False): if (name == ""): name = noNamePic if (os.path.exists(mainFolder+'/'+name) == False): os.mkdir(mainFolder+'/'+name) fileName = os.path.basename(url) fullPath = "" if (noFolder): fullPath = "%s/%s" % (mainFolder, fileName) name = mainFolder else: fullPath = "%s/%s/%s" % (mainFolder, name, fileName) if (os.path.exists(fullPath)): print '%d. Retrieving "%s" into "%s" folder' % (counter, fileName, name) print "*** "+name+'/'+fileName+" exists. Skipping..." else: try: size = CalculateSize().calculateSize(browser.open(url).info().get("Content-Length")) print '%d. Retrieving "%s" into "%s" folder -- Size: %s' % (counter, fileName, name, size) urllib.urlretrieve(url, fullPath) except KeyboardInterrupt: print " Terminating..." sys.exit(0) except Exception as e: if (terminate): print " Terminating with message: %s" % e sys.exit(0) else: print " Error grabbing this video. Continuing..." errorCounter += 1 counter += 1 return errorCounter def grabVideoLinks(self, html): list = [] pattern = "<img src\=\""+album.replace(".", "\.")+".*/>" matchObj = re.search("http://(.*?)\.", pattern) pattern = pattern.replace(matchObj.group(0), "http://[\w\d]*\.") rawList = re.findall(pattern, html) videoName = "" videoUrl = "" for link in rawList: # grab the video name and trim crap from it videoName = re.search("title=\"(.*?)\"", link).group(0).replace("title=\"","")[:-1] # grab the video url and leave only the URL to video videoUrl = re.search("alt=\"(.*?)\"", link).group(0).replace("alt=\"","") videoUrl = videoUrl.replace(re.search("\s(.*?)\"", videoUrl).group(0), "") videoUrl = os.path.join(album, videoUrl) list.append((videoUrl, videoName)) return list #============================================ class CalculateSize: def calculateSize(self, bytes): abbrevs = ["kB", "mB", "gB"] if (bytes == None): size = "0 kB" else: bytes = float(bytes) if (bytes < 1024.0): size = "%d B" % (bytes) else: for abbrev in abbrevs: if (bytes >= 1024.0): bytes = bytes / 1024.0 size = "%.2f %s" % (bytes, abbrev) return size #============================================ def stripSymbols(input): badSymbols = ['\\', '/', ':', '*', '?', '"', '<', '>', '|'] replacement = '~'; i = '' for i in badSymbols: input = input.replace(i, replacement); return input #============================================ def begin(): print '\n* Creating "%s" folder...' % mainFolder if (os.path.exists(mainFolder)): print "*** Folder exists. Skipping..." else: os.mkdir(mainFolder) print "* Initiating connection to Photobucket..." browser = mechanize.Browser() browser.addheaders = [('User-Agent', 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)')] browser.set_handle_equiv(False) #browser.set_debug_http(True) browser.set_handle_robots(False) try: browser.open(album) except KeyboardInterrupt: print " Terminating..." sys.exit(0) except Exception as e: print " Terminating with message: %s" % e sys.exit(0) # see if the album has a password field rawHtml = "" for form in browser.forms(): if (form.name == "frmLogin"): if (passwd == ""): print "*** Album requires password, none given. Terminating..." sys.exit(0) print "* Album requires password... using '"+passwd+"'" browser.select_form(name="frmLogin") browser.form["loginForm[password]"] = passwd print "* Submitting password..." browser.submit() break if ((filter == "pic") or (filter == "all")): print "* Reading image HTML code..." rawHtml = browser.open(album+slideshowFilter).read() imgMethods = ImageMethods() slideshowData = imgMethods.grabSlideshowData(rawHtml) errors = imgMethods.downloadImages(slideshowData, browser) if (terminate == False): print " There were %d skipped images while grabbing" % errors print " Done grabbing images!" if ((filter == "vid") or (filter == "all")): print "* Reading video HTML code..." rawHtml = browser.open(album+videoFilter).read() vidMethods = VideoMethods() links = vidMethods.grabVideoLinks(rawHtml) errors = vidMethods.downloadVideos(links, browser) if (terminate == False): print " There were %d skipped videos while grabbing" % errors print " Done grabbing videos!" #============================================= parser = ArgumentParser(description="Script to grab and save pics from a photobucket album automatically") parser.add_argument('-u', '--url', help='Album URL', required=True, metavar="") parser.add_argument('-p', '--passwd', help='Album password (if any)', metavar="") parser.add_argument('-f', '--filter', help='What to download (pic/vid/all)', default="all", metavar="") parser.add_argument('-d', '--dir', help='Where to download (folder name)', default="PhotobucketGetter", metavar="") parser.add_argument('-n', '--nofolder', help='If this is used, then downloaded files will not be put in separate folders', action="store_true") parser.add_argument('-t', '--terminate', help='If to terminate on error or continue grabbing', action="store_true") args = parser.parse_args() #====== global vars, change values here ====== noNamePic = 'NoName' slideshowFilter = "?albumview=slideshow" videoFilter = "?mediafilter=videos" mainFolder = args.dir album = args.url passwd = args.passwd filter = args.filter noFolder = args.nofolder terminate = args.terminate if ((filter != 'pic') and (filter != 'vid') and (filter != 'all')): filter = 'all' #============================================= begin()
Little update: there was a small error with Regex patterns and other small bugs. Now it should work flawlessly :)
Another little update: Script has been fixed to work with the new Photobucket Beta. New website caused the script to crash at first. This has been fixed by changing the User-Agent to Google bot’s.
Another update: changed the script a bit to compensate for small images (less than a kilobyte). Script was crashing when it had to calculate size for images less than a kilobyte.
Ok, the script is now updated once again. It will replace all forbidden symbols with “~”. Because PB allowed file names to contain symbols that Windows does not allow, so the script would crash.
Also in this version 2 new options were added: -n and -t.
Read the OP for more information :)
Those two options were added after a user had reported a crash on some PB links he tried.
For -n, Every file was put in a separate folder, and each folder just had one file, so I figured it would be much nicer to have all those files in one folder, instead of hundreds.
For -t, I had a very weird error on one image to which I did not find a solution, so I implemented a workaround, which IMO is just as good…
Download from the same location
Very nice Kulverstukas. Thank you it’s appreciated.
One suggestion. The ability to read in a text file containing album links would be nice. That and or being able to grab everything from the root of the account.
Thanks, I’ll look into this :)
I noticed that it misses videos. -f all or -f vid or with the -f option omitted all failed to grab videos on a bucket.
A little bug is that when you use the -n option to specific a folder to download into, it still says; “Retreiving (filename) into “NoName” folder”. Maybe because it creates a folder named, “NoName” instead of just the folder you want the files to download to.
It works fine for me… Might be a compatibility issue with some albums I haven’t tried. If you could give me the link of the album you’re trying to download, that would make it much easier for me to debug the problems :)
Because it works well with albums I have as samples… If you don’t want to post it publicly then send it to my email.
By the way, make sure you run the script with Python version no lower than 2.7. Versions up to 3 should be OK.
Hi Kulverstukas, I was wondering if you could give me some minor assistance as I’m having trouble getting the script to work. Here’s the string I’m entering into terminal as well as the output:
I’m not too sure what I’m doing wrong, is it my url? Thank you in advance, and thank you for taking the time to create such a script.
python Photobucketgetter.py -u http://s1304.photobucket.com/user/artboy365/library/
* Creating “PhotobucketGetter” folder…
*** Folder exists. Skipping…
* Initiating connection to Photobucket…
* Reading image HTML code…
*** Something went wrong grabbing picture data. Terminating…
Thanks for the feedback :) I’ll look into that as soon as I can. Probably today. Check this post for details.
EDIT: Ok so, it seems Photobucket has transformed to normal from BETA and the script will not work anymore.
I will try to re-write it to be compatible with new Photobucket when I have time to do so :/ thank you for reporting this!
No problem at all, I hope it isn’t too bothersome for you.
I’ve fixed the login portion. My regex-fu is not strong enough for the rest.
The Slideshow data now shows up as “Pb.Data.add(‘libraryAlbumsPageCollectionData’, {” with the link to each image part of CollectionData and then fullsizeURL for the direct link.
It would probably be easiest to grab everything part of libraryAlbumsPageCollectionData and parse it in to something like XML and then reference each attribute needed by name.
—————————–
for form in browser.forms():
if (form.attrs[“id”] == “guestLoginForm”):
if (passwd == “”):
print “*** Album requires password, none given. Terminating…”
sys.exit(0)
print “* Album requires password… using ‘”+passwd+”‘”
browser.select_form(predicate=lambda form: ‘id’ in form.attrs and form.attrs[‘id’] == “guestLoginForm”)
browser.form[“visitorPassword”] = passwd
print “* Submitting password…”
browser.submit()
break
Thank you Kyle :)
Yes I too noticed that PB has put slideshow data in the album code, so it’s not hard to parse it all. I’ll probably take a crack at this next week :)
BUMP: Photobucket said they’ll implement the slideshows in the upcoming weeks so let’s wait and see how that will go. If it will be as it was then the script will start to work and I won’t have to rewrite it again.
You can read about it here:
http://photobucket.zendesk.com/entries/22494178-current-list-of-missing-features
They finally added slideshows!
Thank you for letting me know :) looks like they are doing it completely differently now. I’ll have to think of new methods to get all the images and videos…
So the current method wont work anymore? I tried to run it and I get the following error: “Terminating with message: can’t fetch relative reference: not viewing any documents”
Nevermind, got it to work. Didn’t know I had to put it “http://” in the beginning or else it wont establish a connection properly. Also, when I didnt input “-p password”, it didn’t mention that the program was terminated due to a missing password, it just said “Terminating…” Probably just something in my end, but maybe you should look into that just in case.
Either way, great script, works like a charm.
Thank you, I’ll look into it. But I have released a new version of the script as the website was updated and this version became obselete. You can find the new version here: http://9v.lt/blog/photobucket-album-downloader/
So what is required for the updated version? I don’t understand what Kyle has written. Thanks for the script so far, I have tried it out on open profiles and it works great.
What Kyle has written was intended for me :) updated version works with the new photobucket design, because this script, version 0.5, is obsolete after the PB update.
The script works when you download because I have replaced the download link with a new script. If you would open the script with notepad, you would see Version 0.7 in the script. Code posted here is for version 0.5 :)
Heh I kinda forgot I have replaced the files :P so downloading from either blog post will give you an updated version.
I got it to work and it works just great, super fast too. Thanks a lot. Is this script able to collect images from private accounts if messed around with?
Yes it should work with any album, as long as you have a link to it and a password if it is protected :)
Thanks for the feedback ;)
Hi, I tried using the script and i’m getting a “terminating with message: can’t fetch relative reference: not viewing any document”
Do you happen to have a solution for this?
the URL: s297.photobucket.com/user/bowen_shade/library
Let me know if you have any thoughts..
Can you PLEASE make a video tutorial for this? Yea I’m a noob and I don’t even know what Python is! I installed it but idk how to “install a ‘mechanize’ library for Python”. From there I am completely lost… Please help me lol
You should be able to google your way into making it work if you put more effort into it. However, most likely this won’t work anymore, because PB changed so much since then.