Vim as XML Editor: More Setup
The tools listed in this chapter are less basic/crucial than those in the previous chapter, and are optional for many users.
Ruby is a very nice object-oriented programming language from Japan. Some scripts in this howto are written in Ruby so I recommend to install it. Alternatively you could translate the scripts to your favourite language.
$ ruby -v
otherwise install it.
On Linux
- www.ruby-lang.org/cgi-bin/download-1.8.1.mrb
- www.approximity.com/ruby/mirror/ruby-1.8.1.tar.gz
- rubyforge.org/download.php/262/ruby-1.8.1.tar.gz
- www.ruby-doc.org/downloads/ruby-1.8.1.tar.gz
- ftp://ftp.ruby-lang.org/pub/ruby/ruby-1.8.1.tar.gz
After having downloaded and unpacked the archive read the README, under "How to compile and install". If you're on Mac OS X, check Rich Kilmer's "Building Ruby 1.8.1 on Panther".
$ mkdir del/compile/ruby
$ cd del/compile/ruby
$ wget http://www.approximity.com/ruby/mirror/ruby-1.8.1.tar.gz
$ md5sum --check
5d52c7d0e6a6eb6e3bc68d77e794898e *ruby-1.8.1.tar.gz
ruby-1.8.1.tar.gz: OK
$ tar -xzf ruby-1.8.1.tar.gz
$ cd ruby-1.8.1/
$ mkdir /home/tobi/bulk/run/ruby
$ mkdir /home/tobi/bulk/run/ruby/1_8_1
$ autoconf
$ ./configure --prefix=/home/tobi/bulk/run/ruby/1_8_1
$ make
$ make test
test succeeded
$ make install
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/ruby "$@"
.
w /home/tobi/data/commands/ruby_1.8.1
60
q
$ chmod 700 ~/data/commands/ruby_1.8.1
$ ruby_1.8.1 -v
ruby 1.8.1 (2003-12-25) [i686-linux]
$ ruby_1.8.1 test/runner.rb
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/ruby/1_8_1/bin/irb "$@"
.
w /home/tobi/data/commands/irb_1.8.1
59
q
$ chmod 700 ~/data/commands/irb_1.8.1
$ irb_1.8.1
irb(main):001:0> puts 6
6
=> nil
irb(main):002:0> puts 6
6
=> nil
With the Ruby that came with my distro, readline doesn't work;
[up]
results in ^[[A.
With the Ruby I installed IRB works (although I have
to hit [escape] before entering [up]).
On Windows
Cross-OS Tool Calls
cross_os_calls.rb
#!/usr/bin/env ruby
# based on code by Park Heesob (http://www.ruby-talk.org/10006)
# (also see http://www.ruby-talk.org/9739 and
# http://www.ruby-talk.org/81128)
# please feed back improvements: tobiasreif pinkjuice com
# Before using this, please confirm that you have the latest version
# of Ruby, and that the problem still exists.
######################################################################
# problem
# (if this type of test works for you, you don't need to
# require this file)
# puts `tidy -v`
# ruby 1.8.0 (2003-08-04) [i386-mswin32]
# =>
# cross_os_calls.rb:15:in ``': No such file or directory -
# tidy -v (Errno::ENOENT)
######################################################################
# workaround
def windows?
if Config::CONFIG["arch"] =~ /win/
true
else
false
end
end
require 'rbconfig'
alias oldSystem system
def system(command)
if windows?
require 'Win32API'
Win32API.new("crtdll", "system", ['P'], 'L').Call(command)
else
oldSystem command
end
end
alias oldBacktick `
def `(command)
if windows?
require 'Win32API'
popen = Win32API.new("crtdll", "_popen", ['P','P'], 'L')
pclose = Win32API.new("crtdll", "_pclose", ['L'], 'L')
fread = Win32API.new("crtdll", "fread", ['P','L','L','L'], 'L')
feof = Win32API.new("crtdll", "feof", ['L'], 'L')
saved_stdout = $stdout.clone
psBuffer = " " * 128
rBuffer = ""
f = popen.Call(command,"r")
while feof.Call( f )==0
l = fread.Call( psBuffer,1,128,f )
rBuffer += psBuffer[0...l]
end
pclose.Call f
$stdout.reopen(saved_stdout)
rBuffer
else
oldBacktick command
end
end
TempDir =
if ENV['TMP']
ENV['TMP']
elsif windows?
`echo %temp%`.strip
else
'/tmp'
end
DirSep =
if File::ALT_SEPARATOR
File::ALT_SEPARATOR
# work around cygwin returning '/'
elsif windows?
'\\'
else
File::SEPARATOR
end
# puts TempDir
# puts DirSep
######################################################################
# tests
# ... pass in 1.6.5 and 1.8.0 (rubyinstaller.sf.net) on Windows ME
# puts `tidy -v`
# if Config::CONFIG["arch"] =~ /win/
# temp_path_command = 'echo %temp%'
# p `#{temp_path_command}`.strip
# end
The normative schemas of many XML standards will be written in RNG, and Jing is an RNG validator written by one of the main creators of RNG, James Clark.
- home
- man
-
See readme.html in the toplevel dir of the package and doc/jing.html.
java -version
version.zip
from
www.thaiopensource.com/download/
and having unzipped it,
save the following Ruby script.
On Windows add suffix rb to the file name
so that you get \any\directory\jing.rb,
on Linux put it into a directory which is on the system path and do
$ chmod 700 jing
jing
#!/usr/bin/env ruby
# jing - faking stdin for Jing
Jing_Jar =
'/path/to/jing-version/bin/jing.jar'
$:<<'/path/to/ruby/shared/'
require 'cross_os_calls.rb'
files=ARGV.grep(/^[^-]/)
argument_string=ARGV.join(' ')
tempfile=TempDir+DirSep+'jing'+$$.to_s+Time.now.to_f.to_s
JING='java -jar '+Jing_Jar+' '
case files.length
when 1 then
# If there's only one file arg given it's is the RNG;
# stdin will be the XML doc to validate.
if stdin = $stdin.read
tf = File.new(tempfile,'w')
tf.flock(File::LOCK_EX)
tf.write stdin; tf.close
command = JING+argument_string+' '+tempfile
system command
File.delete tempfile
else
puts "If you supply only one file arg (the RNG)\n"+
'you must supply stdin (the XML).'
end
else
# If there are zero or more than one file args,
# pass all args to Jing.
command = JING+argument_string
system command
end
jing.bat
@echo off
ruby /path/to/jing.rb %1 %2 %3 %4 %5 %6 %7 %8 %9
version/doc/xhtml/
and do
jing xhtml-strict.rng index.html
in the command line.
Change index.html to be invalid,
run jing again:
You should see errors.
Change it back to it's original state, validate again:
There should be no output.
:%w !jing /path/to/xhtml-strict.rng
in Vim.
If the doc has a doctype declaration referencing an online DTD
Jing will fetch it from the web which takes a while.
You could comment it out if it's not needed for entity declarations or
attribute value defaults,
or you can exclude it by sending just the root element with contents,
eg
:6,$w !jing /path/to/xhtml-strict.rng
As schema you can use
jing-version/doc/xhtml/xhtml-strict.rng.
Note
Although the name suggests otherwise it is based on XHTML 1.1 not on 1.0 Strict, thus excludes attribute lang etc; for details see XHTML 1.1 Appendix A.
wget -q -nd -A rng -l 1 -r http://www.w3.org/Graphics/SVG/1.1/rng/
wget
is available for many OSs including Windows,
check the
wget
home page
(alternative
wget home page).
Recently they added a zip file, check the directory or try
www.w3.org/Graphics/SVG/1.1/rng/rng.zip.
If the files still contain the following line
<!DOCTYPE grammar SYSTEM "../relaxng.dtd">
delete it from all files via the following command (substitute the
apostrophes for quotes on Windows):
ruby -ni.bak -e 'print if not /^<!DOCTYPE/' *.rng
or change the path to a real system identifier (URI
or canonical URL).
"XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIXgrep,sed,awk,diff,patch,join, etc commands."
It's fast and promising.
On Linux
install_xmlstar
#!/bin/bash -x
# This is just an example you could use as basis for your script.
# (do not run it without having revised and adjusted it)
# The --with-[...]-src paths must point to the libxml and libxslt
# sources.
# The sources are available after install_libxml finished, for
# example.
# Set the version numbers below.
# Be online, then do
# tobi ~/del $ ~/data/run/install_xmlstar
# this doesn't really make sense ...
av_command="antivir -rs -z"
my_home=/home/tobi
if [ ! $HOME == $my_home ]; then
exit
fi
if [ `whoami` != 'tobi' ]; then
exit
fi
# set these:
ver_xmlstar=0.8.1
ver_libxml=2.6.5
ver_libxslt=1.1.2
run_top=${HOME}/bulk/run/xmlstar
run=${run_top}/${ver_xmlstar}
compile=${HOME}/del/compile_libxml
command=${HOME}/data/commands/xmlstar
if [ -d $run ]; then
echo ${run}' exists, exiting'
exit
else
if [ ! -d $run_top ]; then
mkdir $run_top
fi
if [ ! -d $run ]; then
mkdir $run
fi
fi
cd $compile
######################################################################
# based on
# http://xmlstar.sourceforge.net/doc/run-xmlstarlet-build
url_xmlstar="http://xmlstar.sourceforge.net/downloads/\
xmlstarlet-${ver_xmlstar}.tar.gz"
file_xmlstar=`basename ${url_xmlstar}`
if [ ! -f download/$file_xmlstar ]; then
cd download
wget $url_xmlstar
$av_command $file_xmlstar
# if [ $? != 0 ]; then
if [ $? -ne 0 ]; then
exit
fi
cd ../
fi
tar -xzf download/${file_xmlstar}
cd xmlstarlet-${ver_xmlstar}
./configure --prefix=${run} \
--with-libxml-src=${compile}/libxml2-${ver_libxml} \
--with-libxslt-src=${compile}/libxslt-${ver_libxslt}
make
make tests
make install
######################################################################
# if [ ! -f $command ]; then
cat > $command << EOF
#!/usr/bin/env sh
# may get overwritten
${run}/bin/xml "\$@"
EOF
chmod 700 $command
# fi
xmlstar --version
On Windows
version-win32.zip)
I
added the directory containing xml.exe
to the system path.
This makes the system path longer and requires a restart,
but batch files support only up to nine arguments which often is not enough
when using XMLStarlet.
I think that xml is a confusingly generic name for
a command so I renamed it to xmlstar
by renaming xml.exe
to
xmlstar.exe.
Try it out
Caution
Whenever you filter your data through a tool it can get corrupted.
If something went wrong you can use u to undo the
filtering.
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>foo</title>
</head>
<body>
<div style="text-align:center">
<p id="foo" style="color:green" class="blammo">
foo
</p>
</div>
</body>
</html>
Then do
:%!xmlstar ed --delete //@style
You should get something like this:
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>foo</title>
</head>
<body>
<div>
<p id="foo" class="blammo">
foo
</p>
</div>
</body>
</html>
Sometimes I receive HTML files generated by Microsoft Word; Often they are very bloated. Tidy can make them around five times smaller, and can help with turning them into valid XHTML. The results can't be guaranteed to be really good code regarding semantics and structure, but the files become much easier to work with.
- Current Home
- Original Home
- Project
On Linux
$ tidy -help
bash: tidy: command not found
$ cd bulk/run/
$ mkdir tidy && cd tidy
$ wget http://tidy.sourceforge.net/cf/tidy_linux_x86.tgz
$ md5sum --check
476326c3d44292108111841a42bd27f6 *tidy_linux_x86.tgz
tidy_linux_x86.tgz: OK
$ tar -xzf tidy_linux_x86.tgz
$ ed
a
#!/usr/bin/env sh
${HOME}/bulk/run/tidy/bin/tidy "$@"
.
w /home/tobi/data/commands/tidy
54
q
$ chmod 700 ~/data/commands/tidy
$ tidy -v
HTML Tidy for Linux/x86 released on 1st November 2003
$
On Windows
tidy.bat
@echo off
\path\to\tidy.exe -config /path/to/tidyrc.txt
-f /log/errors/here/tidyerrs.txt %1 %2 %3 %4 %5 %6 %7 %8 %9
Settings
word-2000: yes
clean: yes
doctype: strict
bare: yes
drop-font-tags: yes
drop-proprietary-attributes: yes
enclose-block-text: yes
escape-cdata: yes
logical-emphasis: yes
output-xhtml: yes



