Photobucket ripper update

pb-ripper

I wrote about mine in past posts (here and here) but ever since PB has updated their design, my implementation has become broken. The method I used was no longer working… overall the tool was very popular, was reposted a few times and I received lots much valued feedback for it :) but I didn’t continue because I didn’t need it myself, among other things.

However my dear friend Daxda was kind enough to find a new way to do it and he re-wrote the tool in a more error-tolerant manner, that I believe should work for much longer than mine.
Right now it’s still in development stage but it works more or less and is very simple to use. It’s called “PB Shovel” (https://github.com/Daapii/pb_shovel).

Here’s an output of the script:

usage: pb_shovel.py [-h] [-r] [-o OUTPUT_DIRECTORY] [--omit-existing]
                    [-v VERBOSE] (-f FILE | -u URLS [URLS ...])
                    [--images-only | --videos-only] [-n USERNAME]
                    [-p PASSWORD]

optional arguments:
  -h, --help            show this help message and exit
  -r, --recursive       Recursively extracts images and videos from all passed
                        sources.
  -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY
                        The directory the extracted images getting saved in.
  --omit-existing
  -v VERBOSE, --verbose VERBOSE
  -f FILE, --file FILE  A file containing one or more Photobucket links which
                        you want to download.
  -u URLS [URLS ...], --urls URLS [URLS ...]
                        One or more links which point to an album or image
                        which is hosted on Photobucket.
  --images-only         Do not download any other filetype besides image.
  --videos-only         Do not download any other filetype besides video.

Authentication:
  -n USERNAME, --username USERNAME
                        The username or email which is used to authenticate
                        with Photobucket.
  -p PASSWORD, --password PASSWORD
                        The matching password for your account.

For a private album the format is like this:

python pb_shovel.py -u "password@http://photobucket.com/example"

To run it, you will need 2 libraries called “Requests” and “BeautifulSoup”, both for your convenience been uploaded here, simply extract the archive into the “PB Shovel” folder and done.

Edit 2014.04.17: added an option to skip existing images. They will be skipped if image names match. If that option is not set then the script will not replace or skip images. To skip images just add “–omit-existing” to the command line.

Edit 2014.04.07: lots of things have changed and a lot of things have been added. It’s now more functional than it was. Supports downloading from your own account, downloading locked (private) archives, you can choose what types of files (video or image) to get and download whole albums, including sub-albums.

So, to use it, typically you run this command:

python pb_shovel.py -u "http://photobucket.com/example" -r -o PB_Shovel

If anything comes up, please post here so the tool could be improved :)

guest
68 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Old Windways
Old Windways
6 years ago

I tried running the tool to pull an album (starting with a small one to test), and got the following output in the command line:

Estimated files in current album: 3
Traceback (most recent call last):
File “pb_shovel.py”, line 355, in
pb.extract()
File “pb_shovel.py”, line 132, in extract
image_links = self._album(source)
File “pb_shovel.py”, line 332, in _album
obj[“likeCount”], obj[“commentCount”],
KeyError: ‘likeCount’

I’m not sure why it seems to be choking on likeCount…

Old Windways
Old Windways
6 years ago

I had to install beautifulsoup4, but it seems to be working now (both using a single link, and with a file listing 4 links)! Hooray!!

Sing
6 years ago

Hi, me again.

I noticed your script is not downloading the original size of photos.
The original size path should becomment image~original

By the way, the extension I made has a free version.
It limited to 100 photos/album and compressed size of photos(fullsizeUrl)
http://goo.gl/2aGxLC

The extension is totally Javascript, but it will not open source
until Photobucket officially released the download album function :)

Dude
Dude
6 years ago

Hi just stopped by to say thanks, you just saved me hours of boring work.
Keep rocking

Ripper
Ripper
6 years ago

Any way to skip already downloaded images?

KL
KL
6 years ago

Does PB Shovel work for albums that are private and don’t have input form for password? Do you have to get password for private albums?

phatWares
6 years ago

Just wanted to say “thanks” where it’s due. In this case, it’s thanks to YOU (and “Daxda” maybe) for leading me to how to login to guest-pass protected albums. I hadn’t really been sweating it much since most people tend not to have guest passes to use. But it’s a nice feature to have either way.
I’m not using Python, so it took a bit of translation. But your code was clean and well-commented which made life very easy. :-)
If you can compile this (so people don’t need to download a Python interpreter and various libraries) and put it inside a nice GUI … in other words dummy-proofing it for the general public … you might be able to compete with me. ;-)
Cheers!

phatWares
6 years ago

Oh, and if anyone out there DOES have the hacking wizardry skills to crack private albums and is interested in being buried in money, feel free to click on my name above to go to my home page and click on the “Contact” button. My old hacking partner decided to take his ball and go home (immigrated to the US and left bucket hacking behind). So I am searching for a new partner.
I’m a developer primarily and NOT a hacker. I’ll admit it; no shame in my game. ;-)

Jango
Jango
6 years ago

So I couldn’t actually get this to download any private albums. It would connect and all that but it would just say download files 0/0. I tried a bunch of private profiles but no luck. Does it no longer work?

kraz
kraz
6 years ago

Yea im having the same 0/0 issue. Not sure if private albums need a specific url or not. No issue with publics. This returned no results “http://s69.photobucket.com/user/xxxxx/library/”

zeu
zeu
6 years ago

Any words on an update for this problem of 0/0 downloads?

kraz
kraz
6 years ago

^yea I think guest pass isn’t working as a whole for pb.com. doesn’t work through the webpage either

jojo
jojo
6 years ago

Hey could anyone “shovel” this for me? http://s275.photobucket.com/user/vanessa60201/library/

leroy
leroy
6 years ago

does this still work I was using your old one just started getting back into PB. I only need it to download complete buckets because the download button sometimes is stuck in a loop

leroy
leroy
6 years ago

do you have a tutorial on how to make this work? not my field of expertise python wise

JJ
JJ
6 years ago

Just keep getting syntax errors from ‘pb_shovel.py’ :[

JJ
JJ
6 years ago

Wish I could help more but I’m not a tech person to be honest. One of the first times I’m using the cmd line to do anything. I installed python fine, According to your last post I should input this into the command line [python pb_shovel.py -u “http://photobucket.com/example” -r -o PB_Shovel] substituting of course, the actual photobucket address I want to use.

It says file (stdin) line 1.
Invalid syntax

leroy
leroy
6 years ago

Still can’t get it to work. I’m working on a mac and I downloaded canopy. Then I opened shovel.
In the top box I get a bunch of code and in the lower box is where I can type in the request.

I have one folder inthere I have the file pb°showel.py and I have 2 folders bs4 and request.

when I enter

python pb_shovel.py -u http://s1237.photobucket.com/user/fcknbrii/library/ -r -o PB_Shovel in the lower box I get syntaxerror: invalid syntax same for
python pb_shovel.py -u http://s1237.photobucket.com/user/fcknbrii/library/ and
-u http://s1237.photobucket.com/user/fcknbrii/library/

Any suggestions ?

leroy
leroy
6 years ago

comment image
comment image

this is what i’m getting (btw thanks for the fast reply)

sur
sur
6 years ago

I am getting the following error:

File “pb_shovel.py”, line 6, in from urlparse import urljoin, urlparse ImportError: No module named ‘urlparse’

leroy
leroy
6 years ago

do you know how to get it to work on a mac? I’m using python 2.7.8

Karma
Karma
6 years ago

Hi,
Is there an argument to download full size files? I’m only getting the compressed versions. I can click through on the webpage to the fullsizes. Thanks.

JJ
JJ
6 years ago

I’ve actually read and followed your instructions quite a few times. I always take my time to read over and I’m still stumped. I have installed the libs into the folder and still cant get it.

http://imgur.com/Gn6OxyA – Installation folder

http://imgur.com/Z9Tozk6 – The Invalid syntax error

JJ
JJ
6 years ago

Un-Installed 3.4 and Installed 2.7.

Now I’m getting this

http://imgur.com/WWbJDBi

This is from the command line, I stopped running python prior the command

JJ
JJ
6 years ago

Sorry. I did say earlier I had never used command prompt before, but I figured out what you meant. It’s now working, but I did not read the comments previously where people were 0/0 downloads out of private photobuckets without the password. But thanks for your patience. Much appreciated!

MTAK
MTAK
6 years ago

I’ve just tried this today and I’m having an odd issue. I am using the -r recursive switch and then just -u and the URL of a user — something like “http://s105.photobucket.com/user/BlahBlah/library/?sort=3&page=1”

When I do this, it says “Processing” the URL and then says a quick summary:
Images: 273
Sub-albums: 0
Videos: 17

It then loops forever doing:
Collected links: 24Images: 273
Sub-albums: 0
Videos: 17

Then Collected links: 48″… Then 72… And so on.

The problem is that is goes forever. It never seems to find the end of the album even when it passes the 290 mark (which is the 273 images plus the 17 videos). It never stops.

So, after I know it is past the 290 mark, I send a Cntl-C and it then says:

Downloaded files: 0/600 (where 600 is where I finally stopped it from looping)

It will then download the entire thing — so it “works” — but it just gets confused once it gets past the 290 actual files.

When it finishes downloading the actual files, it gives an error:

‘page’ is not recognized as an internal or external command.

Basically, it looks like it gets confused looking for some keyword and then eventually tries to actually execute the word “page”.

Also, taking a quick look at the code, it seems to be looping looking for “end of album” to exist in the HTML. But, looking at the PB source, I don’t see that that ever occurs (which would explain the endless looping).

Maybe an update is needed?

EDIT: Upon further review, it looks to me like it actually only does the FIRST page of downloads — then tries to do them over and over again. So, something is definitely broken with the paging — it doesn’t find the “next” page to get more images — and it doesn’t find the “last” page to know when to stop looking.