{"id":1492,"date":"2012-10-23T12:06:04","date_gmt":"2012-10-23T09:06:04","guid":{"rendered":"http:\/\/9v.lt\/blog\/?p=1492"},"modified":"2022-01-19T08:34:41","modified_gmt":"2022-01-19T06:34:41","slug":"update-photobucket-ripper","status":"publish","type":"post","link":"https:\/\/9v.lt\/blog\/update-photobucket-ripper\/","title":{"rendered":"UPDATE: Photobucket ripper"},"content":{"rendered":"<p><em><strong>Important<\/strong>: new version released along with new PB design. Read more about it here: <a href=\"http:\/\/9v.lt\/blog\/photobucket-ripper-update\/\">http:\/\/9v.lt\/blog\/photobucket-ripper-update\/<\/a><\/p>\n<p><\/em><br \/>\nFirst I wrote about a <a href=\"http:\/\/9v.lt\/blog\/downloading-someone-elses-images-from-photobucket\/\" title=\"Downloading someone elses images from Photobucket\" target=\"_blank\" rel=\"noopener\">method to grab someone else&#8217;s pictures from photobucket<\/a> and I provided a script to do so, even though the method was working, the script was rather primitive and not automated at all&#8230; well it was OK for my one time use. But now when I have more stuff to grab and I have to update the album constantly, it is not very nice as I have to do it all over again.<br \/>\nSo I decided I&#8217;ll make a better, fully automated script and here it is.<br \/>\n<!--more--><br \/>\nTo use the script you will need to install a &#8220;<a href=\"http:\/\/wwwsearch.sourceforge.net\/mechanize\/\" target=\"_blank\" rel=\"noopener\">mechanize<\/a>&#8221; library for Python (and of course python itself, lol), but don&#8217;t worry, I provided everything.<br \/>\nAs a developer I like to keep things simple and with this script, ripping those pictures is really simple.<br \/>\nIf the file already exists, it will skip it.<\/p>\n<p>The script with required libraries can be downloaded here: <a href=\"http:\/\/9v.lt\/projects\/python\/PhotobucketGetter.zip\" target=\"_blank\" rel=\"noopener\">PhotobucketGetter.zip<\/a><\/p>\n<p>Now the usage is very simple. Run it with a -h parameter to see the help.<\/p>\n<pre>\r\n$ python PhotobucketGetter.py -h\r\nusage: PhotobucketGetter.py [-h] -u  [-p] [-f] [-d] [-n] [-t]\r\n\r\nScript to grab and save pics from a photobucket album automatically\r\n\r\noptional arguments:\r\n  -h, --help       show this help message and exit\r\n  -u , --url       Album URL\r\n  -p , --passwd    Album password (if any)\r\n  -f , --filter    What to download (pic\/vid\/all)\r\n  -d , --dir       Where to download (folder name)\r\n  -n, --nofolder   If this is used, then downloaded files will not be put in\r\n                   separate folders\r\n  -t, --terminate  If to terminate on error or continue grabbing\r\n<\/pre>\n<p>So, if the album you want to rip doesn&#8217;t have a password, just input<\/p>\n<pre>PhotobucketGetter.py -u URL<\/pre>\n<p>if album has a password, add additional &#8220;-p mypasswd&#8221; parameter.<\/p>\n<p>Other options are optional, add &#8220;-f pic&#8221; to download pictures, &#8220;-f vid&#8221; to download just videos, or &#8220;-f all&#8221; or don&#8217;t add it at all, to download everything.<br \/>\n&#8220;-d&#8221; parameter is to specify how to name the folder where everything will be put &#8211; default is &#8220;PhotobucketGetter&#8221;.<\/p>\n<p>Now the script itself is a bit complicated, but easy to understand (if that makes sense :P)<\/p>\n<pre lang=\"python\">\r\n'''\r\nScript to grab pics from a photobucket album automatically\r\nand save them locally. Provide password if the album is protected\r\nand the album URL.\r\n======\r\nIf you really like this script, then consider a small donation for\r\nmy sleepless efforts keeping this script working and up-to-date.\r\nGo to my website: http:\/\/9v.lt and press a Donate button on the right :)\r\n======\r\nAuthor: Kulverstukas\r\nWebsite: http:\/\/9v.lt\r\nShouts to Evilzone.org and Programisiai.lt\r\n \r\nVersion: 0.5\r\n \r\n<blockquote class=\"wp-embedded-content\" data-secret=\"19loLV03Au\"><a href=\"https:\/\/9v.lt\/blog\/update-photobucket-ripper\/\">UPDATE: Photobucket ripper<\/a><\/blockquote><iframe loading=\"lazy\" class=\"wp-embedded-content\" sandbox=\"allow-scripts\" security=\"restricted\" style=\"position: absolute; clip: rect(1px, 1px, 1px, 1px);\" title=\"&#8220;UPDATE: Photobucket ripper&#8221; &#8212; Kulverstukas&#039;s blog\" src=\"https:\/\/9v.lt\/blog\/update-photobucket-ripper\/embed\/#?secret=teNKslWgAU#?secret=19loLV03Au\" data-secret=\"19loLV03Au\" width=\"600\" height=\"338\" frameborder=\"0\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\"><\/iframe>\r\n'''\r\n \r\nimport os\r\nimport sys\r\nimport re\r\nimport urllib\r\nimport mechanize\r\nfrom argparse import ArgumentParser\r\n \r\n#============================================\r\nclass ImageMethods:\r\n    def downloadImages(self, links, browser):\r\n        errorCounter = 0\r\n        linksList = [i.strip() for i in re.split(',\\s{2,}', links)]\r\n        print \"* Found \"+str(len(linksList))+\" images...\"\r\n   \r\n        print \"* Compiling regex patterns and downloading the pictures...\"\r\n        picUrl = re.compile('url: \"(https?:\/\/(.*?)\\.photobucket\\.com\/albums\/(.*?))\",')\r\n        picName = re.compile('title: \"(.*?)\"')\r\n   \r\n        counter = 1\r\n        for link in linksList:\r\n            name = picName.search(link).group(0).replace(\"title: \\\"\",\"\")[:-1]\r\n            name = stripSymbols(name)\r\n            if (noFolder == False):\r\n                if (name == \"\"):\r\n                    name = noNamePic\r\n                if (os.path.exists(mainFolder+'\/'+name) == False):\r\n                    os.mkdir(mainFolder+'\/'+name)\r\n            picLink = picUrl.search(link)\r\n            picLink = picLink.group(0).replace(\"url: \\\"\", \"\")[:-2]\r\n            fileName = os.path.basename(picLink)\r\n            fullPath = \"\"\r\n            if (noFolder):\r\n                fullPath = \"%s\/%s\" % (mainFolder, fileName)\r\n                name = mainFolder\r\n            else:\r\n                fullPath = \"%s\/%s\/%s\" % (mainFolder, name, fileName)\r\n            if (os.path.exists(fullPath)):\r\n                print '%d. Retrieving \"%s\" into \"%s\" folder' % (counter, fileName, name)\r\n                print \"*** \"+name+'\/'+fileName+\" exists. Skipping...\"\r\n            else:\r\n                try:\r\n                    size = CalculateSize().calculateSize(browser.open(picLink).info().get(\"Content-Length\"))\r\n                    print '%d. Retrieving \"%s\" into \"%s\" folder -- Size: %s' % (counter, fileName, name, size)\r\n                    urllib.urlretrieve(picLink, fullPath)\r\n                except KeyboardInterrupt:\r\n                    print \" Terminating...\"\r\n                    sys.exit(0)\r\n                except Exception as e:\r\n                    if (terminate):\r\n                        print \" Terminating with message: %s\" % e\r\n                        sys.exit(0)\r\n                    else:\r\n                        print \" Error grabbing this image. Continuing...\"\r\n                        errorCounter += 1\r\n            counter += 1\r\n        return errorCounter\r\n \r\n    def grabSlideshowData(self, htmlCode):\r\n        data = re.search(\"PB\\.Slideshow\\.data \\= \\[\\n.*\\];\", htmlCode)\r\n        if (data == None):\r\n            print \"*** Something went wrong grabbing picture data. Terminating...\"\r\n            sys.exit(0)\r\n        data = data.group(0).replace(\"PB.Slideshow.data = [\", \"\").replace(\"];\", \"\").strip()\r\n       \r\n        return data\r\n#============================================\r\nclass VideoMethods:\r\n    def downloadVideos(self, list, browser):\r\n        errorCounter = 0\r\n        counter = 1\r\n        print \"* Found \"+str(len(list))+\" videos...\"\r\n        for item in list:\r\n            url = item[0]\r\n            name = item[1]\r\n            name = stripSymbols(name)\r\n            if (noFolder == False):\r\n                if (name == \"\"):\r\n                    name = noNamePic\r\n                if (os.path.exists(mainFolder+'\/'+name) == False):\r\n                    os.mkdir(mainFolder+'\/'+name)\r\n            fileName = os.path.basename(url)\r\n            fullPath = \"\"\r\n            if (noFolder):\r\n                fullPath = \"%s\/%s\" % (mainFolder, fileName)\r\n                name = mainFolder\r\n            else:\r\n                fullPath = \"%s\/%s\/%s\" % (mainFolder, name, fileName)\r\n            if (os.path.exists(fullPath)):\r\n                print '%d. Retrieving \"%s\" into \"%s\" folder' % (counter, fileName, name)\r\n                print \"*** \"+name+'\/'+fileName+\" exists. Skipping...\"\r\n            else:\r\n                try:\r\n                    size = CalculateSize().calculateSize(browser.open(url).info().get(\"Content-Length\"))\r\n                    print '%d. Retrieving \"%s\" into \"%s\" folder -- Size: %s' % (counter, fileName, name, size)\r\n                    urllib.urlretrieve(url, fullPath)\r\n                except KeyboardInterrupt:\r\n                    print \" Terminating...\"\r\n                    sys.exit(0)\r\n                except Exception as e:\r\n                    if (terminate):\r\n                        print \" Terminating with message: %s\" % e\r\n                        sys.exit(0)\r\n                    else:\r\n                        print \" Error grabbing this video. Continuing...\"\r\n                        errorCounter += 1\r\n            counter += 1\r\n        return errorCounter\r\n   \r\n    def grabVideoLinks(self, html):\r\n        list = []\r\n        pattern = \"<img src\\=\\\"\"+album.replace(\".\", \"\\.\")+\".*\/>\"\r\n        matchObj = re.search(\"http:\/\/(.*?)\\.\", pattern)\r\n        pattern = pattern.replace(matchObj.group(0), \"http:\/\/[\\w\\d]*\\.\")\r\n        rawList = re.findall(pattern, html)\r\n        videoName = \"\"\r\n        videoUrl = \"\"\r\n        for link in rawList:\r\n            # grab the video name and trim crap from it\r\n            videoName = re.search(\"title=\\\"(.*?)\\\"\", link).group(0).replace(\"title=\\\"\",\"\")[:-1]\r\n            # grab the video url and leave only the URL to video\r\n            videoUrl = re.search(\"alt=\\\"(.*?)\\\"\", link).group(0).replace(\"alt=\\\"\",\"\")\r\n            videoUrl = videoUrl.replace(re.search(\"\\s(.*?)\\\"\", videoUrl).group(0), \"\")\r\n            videoUrl = os.path.join(album, videoUrl)\r\n            list.append((videoUrl, videoName))\r\n           \r\n        return list\r\n#============================================\r\nclass CalculateSize:\r\n    def calculateSize(self, bytes):\r\n        abbrevs = [\"kB\", \"mB\", \"gB\"]\r\n        if (bytes == None):\r\n            size = \"0 kB\"\r\n        else:\r\n            bytes = float(bytes)\r\n            if (bytes < 1024.0):\r\n                size = \"%d B\" % (bytes)\r\n            else:\r\n                for abbrev in abbrevs:\r\n                    if (bytes >= 1024.0):\r\n                        bytes = bytes \/ 1024.0\r\n                        size = \"%.2f %s\" % (bytes, abbrev)\r\n        return size\r\n#============================================\r\ndef stripSymbols(input):\r\n    badSymbols = ['\\\\', '\/', ':', '*', '?', '\"', '<', '>', '|']\r\n    replacement = '~';\r\n    i = ''\r\n    for i in badSymbols:\r\n        input = input.replace(i, replacement);\r\n    return input\r\n#============================================\r\ndef begin():\r\n    print '\\n* Creating \"%s\" folder...' % mainFolder\r\n    if (os.path.exists(mainFolder)):\r\n        print \"*** Folder exists. Skipping...\"\r\n    else:\r\n        os.mkdir(mainFolder)\r\n   \r\n    print \"* Initiating connection to Photobucket...\"\r\n    browser = mechanize.Browser()\r\n    browser.addheaders = [('User-Agent', 'Mozilla\/5.0 (compatible; Googlebot\/2.1; +http:\/\/www.google.com\/bot.html)')]\r\n    browser.set_handle_equiv(False)\r\n    #browser.set_debug_http(True)\r\n    browser.set_handle_robots(False)\r\n    try:\r\n        browser.open(album)\r\n    except KeyboardInterrupt:\r\n        print \" Terminating...\"\r\n        sys.exit(0)\r\n    except Exception as e:\r\n        print \" Terminating with message: %s\" % e\r\n        sys.exit(0)\r\n   \r\n    # see if the album has a password field\r\n    rawHtml = \"\"\r\n    for form in browser.forms():\r\n        if (form.name == \"frmLogin\"):\r\n            if (passwd == \"\"):\r\n                print \"*** Album requires password, none given. Terminating...\"\r\n                sys.exit(0)\r\n            print \"* Album requires password... using '\"+passwd+\"'\"\r\n            browser.select_form(name=\"frmLogin\")\r\n            browser.form[\"loginForm[password]\"] = passwd\r\n            print \"* Submitting password...\"\r\n            browser.submit()\r\n            break\r\n   \r\n    if ((filter == \"pic\") or (filter == \"all\")):\r\n        print \"* Reading image HTML code...\"\r\n        rawHtml = browser.open(album+slideshowFilter).read()\r\n        imgMethods = ImageMethods()\r\n        slideshowData = imgMethods.grabSlideshowData(rawHtml)\r\n        errors = imgMethods.downloadImages(slideshowData, browser)\r\n        if (terminate == False):\r\n            print \"     There were %d skipped images while grabbing\" % errors\r\n        print \"     Done grabbing images!\"\r\n \r\n    if ((filter == \"vid\") or (filter == \"all\")):\r\n        print \"* Reading video HTML code...\"\r\n        rawHtml = browser.open(album+videoFilter).read()\r\n        vidMethods = VideoMethods()\r\n        links = vidMethods.grabVideoLinks(rawHtml)\r\n        errors = vidMethods.downloadVideos(links, browser)\r\n        if (terminate == False):\r\n            print \"     There were %d skipped videos while grabbing\" % errors\r\n        print \"     Done grabbing videos!\"\r\n#=============================================\r\nparser = ArgumentParser(description=\"Script to grab and save pics from a photobucket album automatically\")\r\nparser.add_argument('-u', '--url', help='Album URL', required=True, metavar=\"\")\r\nparser.add_argument('-p', '--passwd', help='Album password (if any)', metavar=\"\")\r\nparser.add_argument('-f', '--filter', help='What to download (pic\/vid\/all)', default=\"all\", metavar=\"\")\r\nparser.add_argument('-d', '--dir', help='Where to download (folder name)', default=\"PhotobucketGetter\", metavar=\"\")\r\nparser.add_argument('-n', '--nofolder', help='If this is used, then downloaded files will not be put in separate folders', action=\"store_true\")\r\nparser.add_argument('-t', '--terminate', help='If to terminate on error or continue grabbing', action=\"store_true\")\r\nargs = parser.parse_args()\r\n \r\n#====== global vars, change values here ======\r\nnoNamePic = 'NoName'\r\nslideshowFilter = \"?albumview=slideshow\"\r\nvideoFilter = \"?mediafilter=videos\"\r\nmainFolder = args.dir\r\nalbum = args.url\r\npasswd = args.passwd\r\nfilter = args.filter\r\nnoFolder = args.nofolder\r\nterminate = args.terminate\r\nif ((filter != 'pic') and (filter != 'vid') and (filter != 'all')):\r\n    filter = 'all'\r\n#=============================================\r\n \r\nbegin()\r\n\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Important: new version released along with new PB design. Read more about it here: http:\/\/9v.lt\/blog\/photobucket-ripper-update\/<\/p>\n","protected":false},"author":2,"featured_media":1656,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,750],"tags":[831,108,838],"class_list":["post-1492","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects","category-software-projects","tag-photobucket","tag-python","tag-ripper"],"_links":{"self":[{"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/posts\/1492","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/comments?post=1492"}],"version-history":[{"count":0,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/posts\/1492\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/media\/1656"}],"wp:attachment":[{"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/media?parent=1492"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/categories?post=1492"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/9v.lt\/blog\/wp-json\/wp\/v2\/tags?post=1492"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}