User:Robamler
Wikipedia on DVD
I just had the thought that Wikipedia really should build a Wikipedia-on-DVD distribution. When I searched for projects that aim to build such a distribution, I couldn't find any. So I started to write a draft about why I think a Wikipedia-on-DVD distribution is important and how it could be achieved, when I accidentally saw that User:Magnus Manske is developing a similar system for Mandrakesoft (see also: Wikimedia and Mandrakesoft).
Though, as I have already written this draft now and I really think it includes some interesting ideas, I decided to publish it here on my user-page. If you have any ideas or if you don't agree with something on this page, please add a comment anywhere on this page. This page is still not much more than a list of relatively isolated thoughts. Please help to improve this.
PS: I'm not publishing this draft on my user-page because I want my name to be remembered in relation to other peoples work, but simply because I'm not sure whether other Wikipedians might dislike this draft. If you think this page should be moved somewhere else, please do it.
Motivation
Why is a Wikipedia-on-DVD distribution important?
- Ease of use, also for newcomers. Many people are used to access a computer encyclopedia by inserting a CD or DVD into their drives.
- Speed up both looking up and writing articles. An offline application would significantly speed up both the process of looking something up in the Wikipedia and for editing articles. (Currently, even if you only want to make a minor change, you have to visit the article, wait, click on "edit", wait, edit and click on "preview", wait, check results and click on "save", and wait again.)
- Make Wikipedia well-known. A platform independent version of Wikipedia-on-DVD could be distributed both by Mandrakesoft, which has already asked Wikimedia to provide a master DVD for their next Linux-version, and in computer-magazines. This would make Wikipedia far more popular and perhaps some people start writing their own articles.
- It could even take load from Wikipedia servers. As articles could be downloaded in raw wiki-text only and most articles don't even have to be downloaded at all, the expensive way of generating HTML-pages out of the wiki-text could be done on client side.
- Wiki-nature would not be lost. The client software should have support for writing Wikipedia articles. This way, in fact, more Wikipedians will find it easy to edit/write articles.
Why yet another Wikipedia-on-DVD-project?
- There are already lots of experimental Wikipedia-offline projects. Many of them are really great work, but none is really so sophisticated that it would meet the goals described below. (At least, none that I could find. If there is a stable Wikipedia-offline project available, please tell me.)
- Most are developed by individual hackers, not by a larger team.
- I think it is important that the way of coordination of this project is in a first step publicly discussed on a wiki-page like this one, before we start developing the system. This way, more people can contribute to this project (including non-programmers who bring in their ideas) and the probability that this project will end up in an unmaintained experimental prototype decreases.
Goals
What is important for a Wikipedia-on-DVD distribution?
- Ease of use. Wikipedia should be just as easy to use as the CDs/DVDs produced by traditional encyclopedia publisher. It should be as easy as insert the DVD, wait for autostart, install only if you want to, search for keyword and get related articles immediately.
- Read and write support. As Wikipedia is not static, it is important that users that read articles on a Wikipedia-DVD can edit or correct them by uploading an edited version to the Wikipedia servers.
- The client application accesses local sqldump (on DVD or hard drive) as default. If requested by user or if the client application recognizes that there is a newer version available: update from internet
- Platform independence
Technical details of the goals
The Wikipedia-DVD should contain
- an SQL-dump of the current encyclopedia
- client applications (wiki-reader/writer) for many operating systems (at least: Windows, Linux, Mac)
- possible add on in future: remastered Knoppix
When inserting the DVD, the user should be prompted how he wants to install Wikipedia.
- a) no installation, run from DVD only (updating articles from internet should be possible anyway - without storing the updated data)
- b) install only client application on hard disk to make it able to store local settings
- c) copy the client application for the employed OS plus all wiki text to hard disk, but without images
- d) copy everything on hard disk (data + images + client application for employed OS)
Editing pages:
- For a start, only a simple text box without WYSIWYG, though this could come. But displaying some sort of online help for formatting (both technical requirements and style conventions) shouldn't be that difficult.
- Preview should be generated at client-side in order to both make the editing-process faster (!) and take load from the Wikipedia servers
Other requirements:
- The GUI of the client application should be easily localizable/translatable
Implementation
How can the #Goals be achieved? This is only a proposal (as is everything on this page). Please write down your ideas here, too.
- client application generates HTML out of wiki-text and displays it in built-in rendering machine (browser), eg. wxHtmlWindow
- Comment: I've played a bit with wxHtmlWindow. This control element is really easy to use but doesn't support HTML and above all CSS very well, in my opinion. However, I found another useful tool: wxMozilla [1] is a library for embedding Gecko in a wxWindows-application (Gecko is the page-rendering engine used in Mozilla and new versions of Netscape). This really rules: It fits into the concept of wxWindows, gives the complete power of the Mozilla browser and is so easy to use that I got my own application run with it relatively quickly even though I'm an amateur.
- updating wiki-pages from internet: two possibilities:
- a) get raw wiki-text from server (preferred by robamler)
- This could be made similar to the way, XML-Data can already be downloaded by accessing http://en.wikipedia.org/wiki/Special:Export/Title_of_the_article.
- Scripts to generate HTML from wiki-text are already available on the internet, see: http://en.wikipedia.org/wiki/Wikipedia:Database_download#Static_HTML_tree_dumps_for_mirroring_or_CD_distribution. Any experiences with one of these?
- b) get ordinary HTML-pages from server (not preferred by robamler, as it slows down the server, which has to dynamically create the HTML-code and it's not so easy to provide write-access to these pages.)
- a) get raw wiki-text from server (preferred by robamler)
- The different types of installation mentioned in #Technical details of the goals are most easily manageable if the original DVD-data is stored separate from data received from the internet. Obviously, this is required anyway, if only the client-application is installed on the hard disk and it accesses data on the DVD.
- when updating a file: delete the old file if stored on hard disk. (ie. only keep current files on hard disk).
- platform independence: I think, wxWidgets (formerly known as wxWindows) is the easiest way to achieve this.
What needs to be done?
- before starting: discuss coordination here.
- proposal:
- create small SQL-database which includes only about 100 pages with images for better handling
- start with building an ugly mini-application which can: search the mini-database, generate HTML out of one database entry, view the HTML-pages in embedded HTML-viewer, edit pages and upload them to the Wikipedia servers.
- If this sort of minimalistic prototype runs relatively stable, we should start a project at sourceforge.net. The infrastructure provided at sourceforge.net (bug-tracker, CVS,...) could make developing and maintaining a lot easier.
Problems that may appear:
- There could be licensing-problems with a large ratio of the images on Wikipedia. To improve this, the WWW-version of Wikipedia could add a note at every place where a non-GFDL-image or its thumbnail appears in an article. This note should request the visitor to replace the image by one that may be published under the GFDL (eg. own photographs, drawings,...), if available.
Could someone who is familiar with the database system on the Wikipedia servers please state if something like this would be practicable?
- The sqldump of the current Wikipedia articles [2] covers 842 MB. I don't know whether this can be handled efficiently on a normal end-user desktop computer. Any experiences?
Don't forget to
Please write down anything here which comes to your mind related to a Wikipedia-on-DVD distribution and should be considered in a later state of the project, but isn't worth discussing right now.
How about a WIKI on floppy rather than a DVD?
That's what I'm looking for. Or, since floppy drives are bygone, how about a WIKI on a smart stick? CDs of course.
Most WIKI uses are going to be small. All are going to start out small. Most of the heavy lifting is done at the outset. Would it make sense to have tools that allow for easy start-up rather than work on something that is already quite large?
How about Bittorrent and GNU project servers?
Let's say a user finds an obsolete wikipedia.org DVD and wants to update it. Would it be good to have an automatic bittorrent download happen in that circumstance? I imagine that scenario would happen often enough to warrant the use of the bittorrent protocol.
I saw the suggestion to make this a Sourceforge project. As an alternative I would like to suggest making it a GNU project. The GNU project has similar project servers that could host the project. And since the organization behind the GNU project is FSF, and FSF is the organization behind the GPL and FDL licenses, I would imagine they would gladly host such a project as this. http://meta.wikimedia.org/wiki/User_talk:Tommy