Offline backup mediawiki with httrack

I’ve had the need to restore the contents of a wiki which ran mediawiki, recently. Unfortunately there were no backups, and my only solution was to restore from an outdated version that was available in Google’s cache.

The problem was that I only had the HTML “output” version and copy-pasting it into the Wiki sources on restore time lost all formatting and links.

Thus I’ve come up with the following script which is con-ed to make systematic backups in the background, both of an offline viewable version of the wiki, in static HTML pages, and of the wiki pages’ sources, for eventual restoration.

It uses the marvelous httrack and wget tools.

Here we go :

#! /bin/sh

site=wiki.my.site
topurl=http://$site

backupdir=/home/me/backup-websites/$site

httrack -%i -w $topurl/index.php/Special:Allpages \
-O "$backupdir" -%P -N0 -s0 -p7 -S -a -K0 -%k -A25000 \
-F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F '' \
-%s -x -%x -%u \
"-$site/index.php/Special:*" \
"-$site/index.php?title=Special:*" \
"+$site/index.php/Special:Recentchanges" \
"-$site/index.php/Utilisateur:*" \
"-$site/index.php/Discussion_Utilisateur:*" \
"-$site/index.php/Aide:*" \
"+*.css" \
"-$site/index.php?title=*&oldid=*" \
"-$site/index.php?title=*&action=edit" \
"-$site/index.php?title=*&curid=*" \
"+$site/index.php?title=*&action=history" \
"-$site/index.php?title=*&action=history&*" \
"-$site/index.php?title=*&curid=*&action=history*" \
"-$site/index.php?title=*&limit=*&action=history"

for page in $(grep "link updated: $site/index.php/" $backupdir/hts-log.txt | sed "s,^.*link updated: $site/index.php/,," | sed 's/ ->.*//' | grep -v Special:)
do
wget -nv -O $backupdir/$site/index.php/${page}_raw.txt "$topurl/index.php?index=$page&action=raw"
done

Hope this helps,

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

4 Responses to Offline backup mediawiki with httrack

  1. Oliver says:

    I hope you don’t have anything against the fact that I changed your script to reflect a german MediaWiki installation and published it to github.

    http://github.com/oschrenk/scripts/blob/master/backup-mediawiki-de.sh

    I linked to this blogpost in the script, but if you feel I violated some copyright I will take it down.

  2. No problem at all, go ahead… it’s meant to be used ;)

  3. Pingback: Eric Blue’s Blog » Blog Archive » Knowlege To Go: Put Your Wiki On Your IPhone

  4. Robert says:

    Hi Olivier,

    thanks for sharing your script. This helps me on a solution for an offline wiki.
    The only problem i have, the attached files in the wiki are not available. The are just exported as FILENAME.html.

    I just want to create a offline know how database for our field engineers, to be used on their Andoid Tablets.

    Do you have an idea? What i could change on the script?

    Kind Regards,

    Robert

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>