
wminfo Plugins-HOWTO


This is the short instruction for wminfo plugins writers.  The wminfo
program displays dynamically different frequently changing information.  It
can be data got from the system as well as an information got from Internet.

The empty wminfo window looks like:

    +---------+
    |         |
    |         |
    |         |
    |         |
    |         |
    +---------+

It has five lines each of nine characters.  In the case of the plugins which
display the information got from the system or some precisely defined
information got from Internet such as weather reports, currency exchange
rates etc. the task is to put these information into 5x9 window in a
reasonable and intelligible way.  In the case of the plugins displaying the
information got from Internet websites such as system updates, headlines
news etc. it usually doesn't fit 9-characters wide window and is longer than
5 lines.  In such a case you have to scroll the contents of the wminfo
window in order to read the entire information.

To write wminfo plugins it is enough to know the basics of bash and some
frequently used commands.  The more you know the shell and the commands the
better your plugins are.  Poorly designed plugins consume a lot of the
system resources (mostly the CPU power).  Well designed plugins can be
really state of the art products.

To learn bash read:

  - man bash,

  - ``Bash Guide for Beginners'' by Machtelt Garrels
    (see: http://tldp.org/LDP/Bash-Beginners-Guide/html/),

  - ``Advanced Bash-Scripting Guide'' by Mendel Cooper
    (see: http://tldp.org/LDP/abs/html/),

  - ``GNU/Linux Command-Line Tools Summary'' by Gareth Anderson
    (see: http://tldp.org/LDP/GNU-Linux-Tools-Summary/html/).

Since version 4.0.0 wminfo runs not only bash scripts but also the scripts
written in different languages as well as binary programs.  The languages
such as Lua, AWK or C allow to write the plugins which have much better
performance than their equivalents written in bash.

The plugins which display the information got from the system use the
commands or programs displaying such an information and format these data to
fit them in wminfo window.  It is difficult to describe the general strategy
for such situations because it depends on the format of the information
provided by these commands or programs.

All the plugins which display the information got from Internet websites do
basically the same: read HTML code and extract from it the valuable data. 
The present document describes the methods which can be used in that
process.

                                      *
                                    *   *

In order to display the particular information got from the website or FTP
server it is enough to: download the source code of the site, change it to
one line and then insert a line break before each HTML tag, look up for the
required information and pass over the useless information, and finally
remove all HTML tags leaving the ASCII text only.

In the case of the websites in languages using diacritic characters or
special alphabets additional modifications of the code are required in order
to translate it to one of the ISO encodings used by the program.


1.

The first step is to grab the code of the website.  Usually lynx --source
command does the job but in some cases links -source command is required --
sometimes with the additional switches.  The commands to download the source
code are:

    lynx --source http://www.linuxquestions.org/questions/slackware-14/

or:

    links -source -http-bugs.no-compression 1 http://slashdot.org/

It is possible to download HTML code of the website or FTP server.

In the case of HTML code using DOS CR+LF end-of-lines the following
modification is allowed:

    tr -d '\r'

In the case of HTML code using Mac OS CR end-of-lines the following
modification is required:

    tr '\r' '\n'


2.

The second step is to clean the code formatting.  A convenient method is
to convert the code into one line by replacing new line characters with
spaces and then to insert new line characters before each HTML tag:

    sed 's/\n/ /g;s/</\n</g'

In some rare cases sed refuses to work but perl does the job:

    perl -pe 's/\n/ /g;s/</\n</g'

If, after that command it is impossible to distinguish the tags including
the information we want from the other tags, it is necessary to use the
trick implemented in the following plugins: bbc-mundo.wmi, cnet.wmi,
commentcamarche.wmi, dpreview.wmi, kommersant.wmi, and wyborcza.wmi.

In such a case we split the above command into two parts.

At the beginning we use the command:

    sed 's/\n/ /g'

Then we seek for a pair of tags in which one tag includes some keyword
and the other tag includes our information.  Next we change all these
following pairs of the HTML tags into one tag nonsensical from HTML
validity point of view but helpful when the script has to identify the
information using the mentioned keyword.  The sample command joins the
tag ending with 'class="title">' with the tag beginning with '<a href=':

    sed 's/class="title"><a href=/class="title" a href=/g'

At the end we insert new line characters before each HTML tag:

    sed 's/</\n</g'


3.

The third step is to select the information we look for.  There are two
methods –- one is to include some information and the other is to omit
some information.

To include and omit the information we use grep -- in the other case with
-v switch:

Including lines with 'onclick' word:

    grep 'onclick'

Omitting lines with 'NONE' word:

    grep -v 'NONE'

There is no need to omit empty lines -- wminfo does it for us -- though we
can do it for the cosmetics purposes:

    grep -v '^$'


4.

The fourth step is to remove all HTML tags:

    sed 's/<.*>//g'


5.

The processing of the websites written in Western European or Eastern
European languages as well as the websites using Cyrillic alphabet requires
a few additional steps.  The wminfo program recognizes diacritics used in
those languages according to ISO-8859-1 and ISO-8859-2 encodings and
Cyrillic alphabet according to ISO-8859-5 encoding but it requires the
additional conversion when the website uses the other encoding such as
UTF-8, CP-1250, CP-1251, or KOI8-R.

The samples directory contains two multilingual HTML files and some
corresponding wminfo plugins for educational purposes.

5.1

The simplest case is with English.  To display the punctuation marks
properly it is enough to filter the output of the plugin through the
following script:

    punctuation

5.2.

The other Western European languages require a few filters...

This is the code for the website in mentioned languages encoded in UTF-8:

    utf8-iso1 | punctuation

In the case of the websites using HTML entities such as: &nbsp;, &cent;,
&sect;, &Oacute;, &oacute;,  etc. the additional filtering is required:

    html-iso1

5.3.

The Eastern European languages are a more complicated case...

This is the code for Eastern European language website encoded in UTF-8:

    utf8-iso2 | punctuation

this is the code for Eastern European language website encoded in
CP-1250:

    piconv -f CP-1250 -t ISO-8859-2 | punctuation

In the case of the websites using HTML entities such as: &nbsp;, &#728;,
&sect;, &Oacute;, &oacute;,  etc. the additional filtering is required:

    html-iso2

5.4.

The most complicated case are languages using the Cyrillic alphabet...

This is the code for website using the Cyrillic alphabet encoded in UTF-8:

    utf8-iso5 | punctuation

this is the code for website using the Cyrillic alphabet encoded in KOI8-R:

    piconv -f KOI8-R -t ISO-8859-5 | punctuation

this is the code for website using the Cyrillic alphabet encoded in CP-1251:

    piconv -f CP-1251 -t ISO-8859-5 | punctuation

In the case of the websites using HTML entities such as: &nbsp;, &#1026;,
&#1075;, &#1107;, &sect;,  etc. the additional filtering is required:

    html-iso5


6.

If there is a risk that the output of some plugin can be sometimes empty --
for example after the upgrade of the system the list of the patches is empty
-- to avoid the error message and the program break it is enough to put at
the end of such a plugin the command:

    echo

The better method is to use the following commands:

    echo "         "
    echo "         "
    echo "         "
    echo "         "
    echo "         "

or:

    echo -e "         \n         \n         \n         \n         \n"

because they clear the application window when the former information
disappears.

If the input does not end with the end-of-line mark to prevent the error
message and the program break the same command is necessary:

    echo

All the above commands are not the parts of the pipeline but the separate
commands put at the end of the plugin.


7.

The above instruction covers the basics of the wminfo plugins writing.  In
the specific situations the optimal order of the steps may be different and
the other operations may be required.  In plugins.online and plugins.offline
directories there is a lot of the instructive plugins demonstrating the
different Internet-related scripting methods.


8.

Instead of grabbing the information with the plugin it is possible to grab
it with the script registered in the crontab and store it locally.  In such
a case wminfo plugins can use these local mirrors of the different websites.

The following crontab tasks run online script every six minutes when the
system is on-line:

    */6 * * * * if [ "`route | grep 'default'`" != "" ] ; then /usr/local/bin/online ; fi

or:

    */6 * * * * if [ "`route | grep -E 'pan0|wlan0'`" != "" ] ; then /usr/local/bin/online ; fi

or the similar specific for your system.

As a result the crontab runs online script every six minutes and that script
runs the registered commands.  Because the crontab task recognizes the
Internet connection and runs the online script only if the system is on-line
wminfo is able to display the information even without the Internet
connection using old temporary and mirror files.  That method results in the
more stable work of the program because it can display the information
regardless of the Internet connection.  It prevents aborting wminfo when the
timeout is too long and prevents stucking of the scrolling text when the
program gets and processes the information from the website.

Some websites do not allow to download the source code with lynx --source
command.  In such a case the command links -source is helpful.  Sometimes it
requires the additional switches.  After mirroring the site locally with the
links -source command the valid command to download the locally mirrored
source code is lynx --source.


9.

The websites change from time to time.  Some changes can cause the
particular wminfo plugin stop to work.  In such a situation try to remove
the consecutive commands one by one starting from the end of the plugin.
You can redirect the output of the plugin to the file or display it with
less.  When you will see the correct output recreate the plugin using the
new set of the commands.

