More Setup
setup
The tools listed in this chapter are less basic/crucial than
those in the previous chapter, and are optional for many
users.
Ruby
Ruby
Ruby is a very nice object-oriented programming language from Japan.
Some scripts in this howto are written in Ruby so I recommend to
install it.
Alternatively you could translate the scripts to your favourite
language.
home
books
Make sure that you have the latest version or 1.8
$ ruby -v
otherwise install it.
On Linux
The latest stable version of Ruby (1.8.1) is available from
various places, some are are listed below.
The first one is a redirector, the last one is the original location
which should not be used if possible.
The MD5 check sum is
5d52c7d0e6a6eb6e3bc68d77e794898e.
After having downloaded and unpacked the archive read the
README,
under How to compile and install
.
If you're on Mac OS X, check
Rich
Kilmer's Building Ruby 1.8.1 on Panther
.
Here's how I installed Ruby:
First I installed readline-devel
(I don't know if this was necessary since readline was installed
already).
Then I did the following (output of some commands is omitted):
puts 6
6
=> nil
irb(main):002:0> puts 6
6
=> nil]]>
With the Ruby that came with my distro, readline doesn't work;
up
results in ^[[A.
With the Ruby I installed IRB works (although I have
to hit escape before entering up).
On Windows
Run the latest
rubyversion.exe
from
,
restart,
then test with
>ruby -v
Cross-OS Tool Calls
Unfortunately Ruby doesn't yet fully support Windows.
The following file can help making system calls more portable,
all you need to do is to require it in your
scripts.
cross_os_calls.rb
cross_os_calls.rb
# cross_os_calls.rb:15:in ``': No such file or directory -
# tidy -v (Errno::ENOENT)
######################################################################
# workaround
def windows?
if Config::CONFIG["arch"] =~ /win/
true
else
false
end
end
require 'rbconfig'
alias oldSystem system
def system(command)
if windows?
require 'Win32API'
Win32API.new("crtdll", "system", ['P'], 'L').Call(command)
else
oldSystem command
end
end
alias oldBacktick `
def `(command)
if windows?
require 'Win32API'
popen = Win32API.new("crtdll", "_popen", ['P','P'], 'L')
pclose = Win32API.new("crtdll", "_pclose", ['L'], 'L')
fread = Win32API.new("crtdll", "fread", ['P','L','L','L'], 'L')
feof = Win32API.new("crtdll", "feof", ['L'], 'L')
saved_stdout = $stdout.clone
psBuffer = " " * 128
rBuffer = ""
f = popen.Call(command,"r")
while feof.Call( f )==0
l = fread.Call( psBuffer,1,128,f )
rBuffer += psBuffer[0...l]
end
pclose.Call f
$stdout.reopen(saved_stdout)
rBuffer
else
oldBacktick command
end
end
TempDir =
if ENV['TMP']
ENV['TMP']
elsif windows?
`echo %temp%`.strip
else
'/tmp'
end
DirSep =
if File::ALT_SEPARATOR
File::ALT_SEPARATOR
# work around cygwin returning '/'
elsif windows?
'\\'
else
File::SEPARATOR
end
# puts TempDir
# puts DirSep
######################################################################
# tests
# ... pass in 1.6.5 and 1.8.0 (rubyinstaller.sf.net) on Windows ME
# puts `tidy -v`
# if Config::CONFIG["arch"] =~ /win/
# temp_path_command = 'echo %temp%'
# p `#{temp_path_command}`.strip
# end
]]>
Jing
Jing
The normative schemas of many XML standards
will be written in RNG
RNG, and Jing is an RNG validator
written by one of the main creators of RNG, James
Clark.
home
man
See readme.html
in the toplevel dir of the package and
doc/jing.html.
Jing doesn't (yet) support stdin so I use a simple Ruby script
to fake it for now.
Make sure you have the required version of the
JRE.
I recommend the latest version,
currently that's 1.4.
java -version
After having downloaded
jing-version.zip
from
and having unzipped it,
save the following Ruby script.
On Windows add suffix rb to the file name
so that you get \any\directory\jing.rb,
on Linux put it into a directory which is on the system path and do
$ chmod 700 jing
jing
#!/usr/bin/env ruby
# jing - faking stdin for Jing
Jing_Jar =
'/path/to/jing-version/bin/jing.jar'
$:<<'/path/to/ruby/shared/'
require 'cross_os_calls.rb'
files=ARGV.grep(/^[^-]/)
argument_string=ARGV.join(' ')
tempfile=TempDir+DirSep+'jing'+$$.to_s+Time.now.to_f.to_s
JING='java -jar '+Jing_Jar+' '
case files.length
when 1 then
# If there's only one file arg given it's is the RNG;
# stdin will be the XML doc to validate.
if stdin = $stdin.read
tf = File.new(tempfile,'w')
tf.flock(File::LOCK_EX)
tf.write stdin; tf.close
command = JING+argument_string+' '+tempfile
system command
File.delete tempfile
else
puts "If you supply only one file arg (the RNG)\n"+
'you must supply stdin (the XML).'
end
else
# If there are zero or more than one file args,
# pass all args to Jing.
command = JING+argument_string
system command
end
Adjust the two paths in the script.
On Windows put the following batch script into a directory which is on
the system path:
jing.bat
jing.bat
@echo off
ruby /path/to/jing.rb %1 %2 %3 %4 %5 %6 %7 %8 %9
To test it go to
jing-version/doc/xhtml/
and do
jing xhtml-strict.rng index.html
in the command line.
Change index.html to be invalid,
run jing again:
You should see errors.
Change it back to it's original state, validate again:
There should be no output.
To validate XHTML
documents
you can do
:%w !jing /path/to/xhtml-strict.rng
in Vim.
If the doc has a doctype declaration referencing an online DTD
Jing will fetch it from the web which takes a while.
You could comment it out if it's not needed for entity declarations or
attribute value defaults,
or you can exclude it by sending just the root element with contents,
eg
:6,$w !jing /path/to/xhtml-strict.rng
As schema you can use
jing-version/doc/xhtml/xhtml-strict.rng.
Although the name suggests otherwise it is based on
XHTML 1.1 not on 1.0 Strict,
thus excludes attribute lang etc;
for details see
XHTML 1.1
Appendix A.
As an ad hoc solution for validating SVG
you can download the
SVG
1.1
RNG
via
wget -q -nd -A rng -l 1 -r http://www.w3.org/Graphics/SVG/1.1/rng/
wget
is available for many OSs including Windows,
check the
wget
home page
wget
(alternative
wget home page).
Recently they added a zip file, check the directory or try
.
If the files still contain the following line
<!DOCTYPE grammar SYSTEM "../relaxng.dtd">
delete it from all files via the following command (substitute the
apostrophes for quotes on Windows):
ruby -ni.bak -e 'print if not /^<!DOCTYPE/' *.rng
or change the path to a real system identifier (URI
or canonical URL).
XMLStarlet
XMLStarlet
From the web site:
XMLStarlet is a set of command line utilities (tools) which can
be used to transform, query, validate, and edit XML
documents and files using simple set of shell commands in similar way
it is done for plain text files using UNIX grep,
sed, awk,
diff, patch,
join, etc commands.
It's fast and promising.
home
doc
On Linux
Here's the script that I use to install XMLStarlet:
install_xmlstar
$command << EOF
#!/usr/bin/env sh
# may get overwritten
${run}/bin/xml "\$@"
EOF
chmod 700 $command
# fi
xmlstar --version
]]>
On Windows
Installation is very simple.
After having downloaded and unzipped XMLStarlet
(xmlstarlet-version-win32.zip)
I
added the directory containing xml.exe
to the system path
system path.
This makes the system path longer and requires a restart,
but batch files support only up to nine arguments which often is not enough
when using XMLStarlet.
I think that xml is a confusingly generic name for
a command so I renamed it to xmlstar
by renaming xml.exe
to
xmlstar.exe.
Try it out
Whenever you filter your data through a tool it can get corrupted.
If something went wrong you can use u to undo the
filtering.
XMLStarlet can be used to remove all objects matching an
XPath
XPath,
eg all style attributes from an XHTML document.
Paste the following into Vim:
foo
]]>
Then do
:%!xmlstar ed --delete //@style
You should get something like this:
foo
]]>
Tidy
Tidy
Sometimes I receive HTML
HTML
files
generated by Microsoft Word;
Often they are very bloated.
Tidy can make them around five times smaller,
and can help with turning them into valid XHTML.
The results can't be guaranteed to be
really good code regarding
semantics
semantics
and
structure
structure
,
but the files become much easier to work with.
Current Home
Original Home
Project
On Linux
Here's how I installed Tidy:
On Windows
A tidy.bat could look like this: (two lines)
tidy.bat
tidy.bat
@echo off
\path\to\tidy.exe -config /path/to/tidyrc.txt
-f /log/errors/here/tidyerrs.txt %1 %2 %3 %4 %5 %6 %7 %8 %9
(put it in a directory which is on the system path)
Settings
Sample tidyrc.txt:
word-2000: yes
clean: yes
doctype: strict
bare: yes
drop-font-tags: yes
drop-proprietary-attributes: yes
enclose-block-text: yes
escape-cdata: yes
logical-emphasis: yes
output-xhtml: yes