WebLog Pro Olivier Berger

30/05/2008

Offline backup mediawiki with httrack

Filed under: Uncategorized — Tags: , , , — Olivier Berger @ 13:33

I’ve had the need to restore the contents of a wiki which ran mediawiki, recently. Unfortunately there were no backups, and my only solution was to restore from an outdated version that was available in Google’s cache.

The problem was that I only had the HTML “output” version and copy-pasting it into the Wiki sources on restore time lost all formatting and links.

Thus I’ve come up with the following script which is con-ed to make systematic backups in the background, both of an offline viewable version of the wiki, in static HTML pages, and of the wiki pages’ sources, for eventual restoration.

It uses the marvelous httrack and wget tools.

Here we go :

#! /bin/sh

site=wiki.my.site
topurl=http://$site

backupdir=/home/me/backup-websites/$site

httrack -%i -w $topurl/index.php/Special:Allpages \
-O "$backupdir" -%P -N0 -s0 -p7 -S -a -K0 -%k -A25000 \
-F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F '' \
-%s -x -%x -%u \
"-$site/index.php/Special:*" \
"-$site/index.php?title=Special:*" \
"+$site/index.php/Special:Recentchanges" \
"-$site/index.php/Utilisateur:*" \
"-$site/index.php/Discussion_Utilisateur:*" \
"-$site/index.php/Aide:*" \
"+*.css" \
"-$site/index.php?title=*&oldid=*" \
"-$site/index.php?title=*&action=edit" \
"-$site/index.php?title=*&curid=*" \
"+$site/index.php?title=*&action=history" \
"-$site/index.php?title=*&action=history&*" \
"-$site/index.php?title=*&curid=*&action=history*" \
"-$site/index.php?title=*&limit=*&action=history"

for page in $(grep "link updated: $site/index.php/" $backupdir/hts-log.txt | sed "s,^.*link updated: $site/index.php/,," | sed 's/ ->.*//' | grep -v Special:)
do
wget -nv -O $backupdir/$site/index.php/${page}_raw.txt "$topurl/index.php?index=$page&action=raw"
done

Hope this helps,

3 Comments »

  1. I hope you don’t have anything against the fact that I changed your script to reflect a german MediaWiki installation and published it to github.

    http://github.com/oschrenk/scripts/blob/master/backup-mediawiki-de.sh

    I linked to this blogpost in the script, but if you feel I violated some copyright I will take it down.

    Comment by Oliver — 27/11/2009 @ 3:28

  2. No problem at all, go ahead… it’s meant to be used ;)

    Comment by Olivier Berger — 2/12/2009 @ 16:02

  3. [...] it worked.  What I wanted was a command-line script to run the backup.  Luckily, I found a couple websites that have used HTTrack for this purpose and decided to use for my own needs.  Here is a copy of the script i used to [...]

    Pingback by Eric Blue’s Blog » Blog Archive » Knowlege To Go: Put Your Wiki On Your IPhone — 14/12/2009 @ 7:39

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress