Vim as XML Editor: More Setup

The tools listed in this chapter are less basic/crucial than those in the previous chapter, and are optional for many users.

Ruby

Ruby is a very nice object-oriented programming language from Japan. Some scripts in this howto are written in Ruby so I recommend to install it. Alternatively you could translate the scripts to your favourite language.

home

www.ruby-lang.org/en/

books

www.rubygarden.org/ruby?RubyBookList

Make sure that you have the latest version or 1.8
$ ruby -v
otherwise install it.

On Linux

The latest stable version of Ruby (1.8.1) is available from various places, some are are listed below. The first one is a redirector, the last one is the original location which should not be used if possible. The MD5 check sum is 5d52c7d0e6a6eb6e3bc68d77e794898e.

After having downloaded and unpacked the archive read the README, under "How to compile and install". If you're on Mac OS X, check Rich Kilmer's "Building Ruby 1.8.1 on Panther".

Here's how I installed Ruby: First I installed readline-devel (I don't know if this was necessary since readline was installed already). Then I did the following (output of some commands is omitted):
$ mkdir del/compile/ruby
$ cd del/compile/ruby
$ wget http://www.approximity.com/ruby/mirror/ruby-1.8.1.tar.gz
$ md5sum --check
5d52c7d0e6a6eb6e3bc68d77e794898e *ruby-1.8.1.tar.gz
ruby-1.8.1.tar.gz: OK
$ tar -xzf ruby-1.8.1.tar.gz
$ cd ruby-1.8.1/
$ mkdir /home/tobi/bulk/run/ruby
$ mkdir /home/tobi/bulk/run/ruby/1_8_1
$ autoconf
$ ./configure --prefix=/home/tobi/bulk/run/ruby/1_8_1
$ make
$ make test
test succeeded
$ make install
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/ruby "$@"
.
w /home/tobi/data/commands/ruby_1.8.1
60
q
$ chmod 700 ~/data/commands/ruby_1.8.1
$ ruby_1.8.1 -v
ruby 1.8.1 (2003-12-25) [i686-linux]
$ ruby_1.8.1 test/runner.rb
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/irb "$@"
.
w /home/tobi/data/commands/irb_1.8.1
59
q
$ chmod 700 ~/data/commands/irb_1.8.1
$ irb_1.8.1
irb(main):001:0> puts 6
6
=> nil
irb(main):002:0> puts 6
6
=> nil
With the Ruby that came with my distro, readline doesn't work; [up] results in ^[[A. With the Ruby I installed IRB works (although I have to hit [escape] before entering [up]).

On Windows

Run the latest rubyversion.exe from rubyinstaller.sf.net/, restart, then test with
>ruby -v

Cross-OS Tool Calls

Unfortunately Ruby doesn't yet fully support Windows. The following file can help making system calls more portable, all you need to do is to require it in your scripts.

cross_os_calls.rb

#!/usr/bin/env ruby

# based on code by Park Heesob (http://www.ruby-talk.org/10006)
# (also see http://www.ruby-talk.org/9739 and
# http://www.ruby-talk.org/81128)
# please feed back improvements: tobiasreif pinkjuice com
# Before using this, please confirm that you have the latest version
# of Ruby, and that the problem still exists.

######################################################################
# problem
# (if this type of test works for you, you don't need to
# require this file)

# puts `tidy -v`
# ruby 1.8.0 (2003-08-04) [i386-mswin32]
# =>
# cross_os_calls.rb:15:in ``': No such file or directory -
# tidy -v (Errno::ENOENT)

######################################################################
# workaround

def windows?
  if Config::CONFIG["arch"] =~ /win/
    true
  else
    false
  end
end

require 'rbconfig'

alias oldSystem system
def system(command)
  if windows?
    require 'Win32API'
    Win32API.new("crtdll", "system", ['P'], 'L').Call(command)
  else
    oldSystem command
  end
end

alias oldBacktick `
def `(command)
  if windows?
    require 'Win32API'
    popen = Win32API.new("crtdll", "_popen", ['P','P'], 'L')
    pclose = Win32API.new("crtdll", "_pclose", ['L'], 'L')
    fread = Win32API.new("crtdll", "fread", ['P','L','L','L'], 'L')
    feof = Win32API.new("crtdll", "feof", ['L'], 'L')
    saved_stdout = $stdout.clone
    psBuffer = " " * 128
    rBuffer = ""
    f = popen.Call(command,"r")
    while feof.Call( f )==0
      l = fread.Call( psBuffer,1,128,f )
      rBuffer += psBuffer[0...l]
    end
    pclose.Call f
    $stdout.reopen(saved_stdout)
    rBuffer
  else
    oldBacktick command
  end
end

TempDir =
if ENV['TMP']
   ENV['TMP']
elsif windows?
  `echo %temp%`.strip
else
  '/tmp'
end

DirSep =
if File::ALT_SEPARATOR
  File::ALT_SEPARATOR
# work around cygwin returning '/'
elsif windows?
  '\\'
else
  File::SEPARATOR
end

# puts TempDir
# puts DirSep

######################################################################
# tests
# ... pass in 1.6.5 and 1.8.0 (rubyinstaller.sf.net) on Windows ME

# puts `tidy -v`

# if Config::CONFIG["arch"] =~ /win/
#   temp_path_command = 'echo %temp%'
#   p `#{temp_path_command}`.strip
# end

Jing

The normative schemas of many XML standards will be written in RNG, and Jing is an RNG validator written by one of the main creators of RNG, James Clark.

home

www.thaiopensource.com/relaxng/jing.html

man

See readme.html in the toplevel dir of the package and doc/jing.html.

Jing doesn't (yet) support stdin so I use a simple Ruby script to fake it for now.
Make sure you have the required version of the JRE. I recommend the latest version, currently that's 1.4.
java -version
After having downloaded jing-version.zip from www.thaiopensource.com/download/ and having unzipped it, save the following Ruby script. On Windows add suffix rb to the file name so that you get \any\directory\jing.rb, on Linux put it into a directory which is on the system path and do
$ chmod 700 jing

jing

#!/usr/bin/env ruby

# jing - faking stdin for Jing

Jing_Jar =
'/path/to/jing-version/bin/jing.jar'
$:<<'/path/to/ruby/shared/'
require 'cross_os_calls.rb'

files=ARGV.grep(/^[^-]/)
argument_string=ARGV.join(' ')
tempfile=TempDir+DirSep+'jing'+$$.to_s+Time.now.to_f.to_s
JING='java -jar '+Jing_Jar+' '

case files.length
when 1 then
  # If there's only one file arg given it's is the RNG;
  # stdin will be the XML doc to validate.
  if stdin = $stdin.read
    tf = File.new(tempfile,'w')
    tf.flock(File::LOCK_EX)
    tf.write stdin; tf.close
    command = JING+argument_string+' '+tempfile
    system command
    File.delete tempfile
  else
    puts "If you supply only one file arg (the RNG)\n"+
      'you must supply stdin (the XML).'
  end
else
  # If there are zero or more than one file args,
  # pass all args to Jing.
  command = JING+argument_string
  system command
end
Adjust the two paths in the script. On Windows put the following batch script into a directory which is on the system path:

jing.bat

@echo off
ruby /path/to/jing.rb %1 %2 %3 %4 %5 %6 %7 %8 %9
To test it go to jing-version/doc/xhtml/ and do
jing xhtml-strict.rng index.html
in the command line. Change index.html to be invalid, run jing again: You should see errors. Change it back to it's original state, validate again: There should be no output.
To validate XHTML documents you can do
:%w !jing /path/to/xhtml-strict.rng
in Vim. If the doc has a doctype declaration referencing an online DTD Jing will fetch it from the web which takes a while. You could comment it out if it's not needed for entity declarations or attribute value defaults, or you can exclude it by sending just the root element with contents, eg
:6,$w !jing /path/to/xhtml-strict.rng
As schema you can use jing-version/doc/xhtml/xhtml-strict.rng.

Note

Although the name suggests otherwise it is based on XHTML 1.1 not on 1.0 Strict, thus excludes attribute lang etc; for details see XHTML 1.1 Appendix A.

As an ad hoc solution for validating SVG you can download the SVG 1.1 RNG via
wget -q -nd -A rng -l 1 -r http://www.w3.org/Graphics/SVG/1.1/rng/
wget is available for many OSs including Windows, check the wget home page (alternative wget home page). Recently they added a zip file, check the directory or try www.w3.org/Graphics/SVG/1.1/rng/rng.zip. If the files still contain the following line
<!DOCTYPE grammar SYSTEM "../relaxng.dtd">
delete it from all files via the following command (substitute the apostrophes for quotes on Windows):
ruby -ni.bak -e 'print if not /^<!DOCTYPE/' *.rng
or change the path to a real system identifier (URI or canonical URL).

XMLStarlet

From the web site:
"XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands."

It's fast and promising.

home

xmlstar.sourceforge.net/

doc

xmlstar.sourceforge.net/docs.php

On Linux

Here's the script that I use to install XMLStarlet:

install_xmlstar

#!/bin/bash -x

# This is just an example you could use as basis for your script.
# (do not run it without having revised and adjusted it)

# The --with-[...]-src paths must point to the libxml and libxslt
# sources.
# The sources are available after install_libxml finished, for
# example.
# Set the version numbers below.
# Be online, then do
# tobi ~/del $ ~/data/run/install_xmlstar

# this doesn't really make sense ...
av_command="antivir -rs -z"

my_home=/home/tobi

if [ ! $HOME == $my_home ]; then
  exit
fi
if [ `whoami` != 'tobi' ]; then
  exit
fi

# set these:
ver_xmlstar=0.8.1
ver_libxml=2.6.5
ver_libxslt=1.1.2

run_top=${HOME}/bulk/run/xmlstar
run=${run_top}/${ver_xmlstar}
compile=${HOME}/del/compile_libxml
command=${HOME}/data/commands/xmlstar

if [ -d $run ]; then
  echo ${run}' exists, exiting'
  exit
else
  if [ ! -d $run_top ]; then
    mkdir $run_top
  fi
  if [ ! -d $run ]; then
    mkdir $run
  fi
fi
cd $compile

######################################################################

# based on
# http://xmlstar.sourceforge.net/doc/run-xmlstarlet-build

url_xmlstar="http://xmlstar.sourceforge.net/downloads/\
xmlstarlet-${ver_xmlstar}.tar.gz"

file_xmlstar=`basename ${url_xmlstar}`

if [ ! -f download/$file_xmlstar ]; then
  cd download
  wget $url_xmlstar
  $av_command $file_xmlstar
  # if [ $? != 0 ]; then
  if [ $? -ne 0 ]; then
    exit
  fi
  cd ../
fi

tar -xzf download/${file_xmlstar}

cd xmlstarlet-${ver_xmlstar}
./configure --prefix=${run} \
  --with-libxml-src=${compile}/libxml2-${ver_libxml} \
  --with-libxslt-src=${compile}/libxslt-${ver_libxslt}
make
make tests
make install

######################################################################

# if [ ! -f $command ]; then
  cat > $command << EOF
#!/usr/bin/env sh
# may get overwritten
${run}/bin/xml "\$@"
EOF
  chmod 700 $command
# fi

xmlstar --version

On Windows

Installation is very simple. After having downloaded and unzipped XMLStarlet (xmlstarlet-version-win32.zip) I added the directory containing xml.exe to the system path. This makes the system path longer and requires a restart, but batch files support only up to nine arguments which often is not enough when using XMLStarlet. I think that xml is a confusingly generic name for a command so I renamed it to xmlstar by renaming xml.exe to xmlstar.exe.

Try it out

Caution

Whenever you filter your data through a tool it can get corrupted. If something went wrong you can use u to undo the filtering.

XMLStarlet can be used to remove all objects matching an XPath, eg all style attributes from an XHTML document. Paste the following into Vim:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>foo</title>
  </head>
  <body>
    <div style="text-align:center">
      <p id="foo" style="color:green" class="blammo">
        foo
      </p>
    </div>
  </body>
</html>
Then do
:%!xmlstar ed --delete //@style
You should get something like this:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>foo</title>
  </head>
  <body>
    <div>
      <p id="foo" class="blammo">
        foo
      </p>
    </div>
  </body>
</html>

Tidy

Sometimes I receive HTML files generated by Microsoft Word; Often they are very bloated. Tidy can make them around five times smaller, and can help with turning them into valid XHTML. The results can't be guaranteed to be really good code regarding semantics and structure, but the files become much easier to work with.

Current Home

tidy.sourceforge.net/

Original Home

www.w3.org/People/Raggett/tidy/

Project

sourceforge.net/projects/tidy/

On Linux

Here's how I installed Tidy:
$ tidy -help
bash: tidy: command not found
$ cd bulk/run/
$ mkdir tidy && cd tidy
$ wget http://tidy.sourceforge.net/cf/tidy_linux_x86.tgz
$ md5sum --check
476326c3d44292108111841a42bd27f6 *tidy_linux_x86.tgz
tidy_linux_x86.tgz: OK
$ tar -xzf tidy_linux_x86.tgz
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/tidy/bin/tidy "$@"
.
w /home/tobi/data/commands/tidy
54
q
$ chmod 700 ~/data/commands/tidy
$ tidy -v
HTML Tidy for Linux/x86 released on 1st November 2003
$

On Windows

A tidy.bat could look like this: (two lines)

tidy.bat

@echo off
\path\to\tidy.exe -config /path/to/tidyrc.txt
-f /log/errors/here/tidyerrs.txt %1 %2 %3 %4 %5 %6 %7 %8 %9
(put it in a directory which is on the system path)

Settings

Sample tidyrc.txt:
word-2000: yes
clean: yes
doctype: strict
bare: yes
drop-font-tags: yes
drop-proprietary-attributes: yes
enclose-block-text: yes
escape-cdata: yes
logical-emphasis: yes
output-xhtml: yes