More Setup setup The tools listed in this chapter are less basic/crucial than those in the previous chapter, and are optional for many users.
Ruby<indexterm> <primary>Ruby</primary> </indexterm> Ruby is a very nice object-oriented programming language from Japan. Some scripts in this howto are written in Ruby so I recommend to install it. Alternatively you could translate the scripts to your favourite language. home books Make sure that you have the latest version or 1.8 $ ruby -v otherwise install it.
On Linux The latest stable version of Ruby (1.8.1) is available from various places, some are are listed below. The first one is a redirector, the last one is the original location which should not be used if possible. The MD5 check sum is 5d52c7d0e6a6eb6e3bc68d77e794898e. After having downloaded and unpacked the archive read the README, under How to compile and install. If you're on Mac OS X, check Rich Kilmer's Building Ruby 1.8.1 on Panther. Here's how I installed Ruby: First I installed readline-devel (I don't know if this was necessary since readline was installed already). Then I did the following (output of some commands is omitted): puts 6 6 => nil irb(main):002:0> puts 6 6 => nil]]> With the Ruby that came with my distro, readline doesn't work; up results in ^[[A. With the Ruby I installed IRB works (although I have to hit escape before entering up).
On Windows Run the latest rubyversion.exe from , restart, then test with >ruby -v
Cross-<acronym>OS</acronym> Tool Calls Unfortunately Ruby doesn't yet fully support Windows. The following file can help making system calls more portable, all you need to do is to require it in your scripts. <filename>cross_os_calls.rb</filename><indexterm> <primary><filename>cross_os_calls.rb</filename></primary> </indexterm> # cross_os_calls.rb:15:in ``': No such file or directory - # tidy -v (Errno::ENOENT) ###################################################################### # workaround def windows? if Config::CONFIG["arch"] =~ /win/ true else false end end require 'rbconfig' alias oldSystem system def system(command) if windows? require 'Win32API' Win32API.new("crtdll", "system", ['P'], 'L').Call(command) else oldSystem command end end alias oldBacktick ` def `(command) if windows? require 'Win32API' popen = Win32API.new("crtdll", "_popen", ['P','P'], 'L') pclose = Win32API.new("crtdll", "_pclose", ['L'], 'L') fread = Win32API.new("crtdll", "fread", ['P','L','L','L'], 'L') feof = Win32API.new("crtdll", "feof", ['L'], 'L') saved_stdout = $stdout.clone psBuffer = " " * 128 rBuffer = "" f = popen.Call(command,"r") while feof.Call( f )==0 l = fread.Call( psBuffer,1,128,f ) rBuffer += psBuffer[0...l] end pclose.Call f $stdout.reopen(saved_stdout) rBuffer else oldBacktick command end end TempDir = if ENV['TMP'] ENV['TMP'] elsif windows? `echo %temp%`.strip else '/tmp' end DirSep = if File::ALT_SEPARATOR File::ALT_SEPARATOR # work around cygwin returning '/' elsif windows? '\\' else File::SEPARATOR end # puts TempDir # puts DirSep ###################################################################### # tests # ... pass in 1.6.5 and 1.8.0 (rubyinstaller.sf.net) on Windows ME # puts `tidy -v` # if Config::CONFIG["arch"] =~ /win/ # temp_path_command = 'echo %temp%' # p `#{temp_path_command}`.strip # end ]]>
Jing<indexterm> <primary>Jing</primary> </indexterm> The normative schemas of many XML standards will be written in RNG RNG, and Jing is an RNG validator written by one of the main creators of RNG, James Clark. home man See readme.html in the toplevel dir of the package and doc/jing.html. Jing doesn't (yet) support stdin so I use a simple Ruby script to fake it for now. Make sure you have the required version of the JRE. I recommend the latest version, currently that's 1.4. java -version After having downloaded jing-version.zip from and having unzipped it, save the following Ruby script. On Windows add suffix rb to the file name so that you get \any\directory\jing.rb, on Linux put it into a directory which is on the system path and do $ chmod 700 jing <filename>jing</filename> #!/usr/bin/env ruby # jing - faking stdin for Jing Jing_Jar = '/path/to/jing-version/bin/jing.jar' $:<<'/path/to/ruby/shared/' require 'cross_os_calls.rb' files=ARGV.grep(/^[^-]/) argument_string=ARGV.join(' ') tempfile=TempDir+DirSep+'jing'+$$.to_s+Time.now.to_f.to_s JING='java -jar '+Jing_Jar+' ' case files.length when 1 then # If there's only one file arg given it's is the RNG; # stdin will be the XML doc to validate. if stdin = $stdin.read tf = File.new(tempfile,'w') tf.flock(File::LOCK_EX) tf.write stdin; tf.close command = JING+argument_string+' '+tempfile system command File.delete tempfile else puts "If you supply only one file arg (the RNG)\n"+ 'you must supply stdin (the XML).' end else # If there are zero or more than one file args, # pass all args to Jing. command = JING+argument_string system command end Adjust the two paths in the script. On Windows put the following batch script into a directory which is on the system path: <filename>jing.bat</filename><indexterm> <primary><filename>jing.bat</filename></primary> </indexterm> @echo off ruby /path/to/jing.rb %1 %2 %3 %4 %5 %6 %7 %8 %9 To test it go to jing-version/doc/xhtml/ and do jing xhtml-strict.rng index.html in the command line. Change index.html to be invalid, run jing again: You should see errors. Change it back to it's original state, validate again: There should be no output. To validate XHTML documents you can do :%w !jing /path/to/xhtml-strict.rng in Vim. If the doc has a doctype declaration referencing an online DTD Jing will fetch it from the web which takes a while. You could comment it out if it's not needed for entity declarations or attribute value defaults, or you can exclude it by sending just the root element with contents, eg :6,$w !jing /path/to/xhtml-strict.rng As schema you can use jing-version/doc/xhtml/xhtml-strict.rng. Although the name suggests otherwise it is based on XHTML 1.1 not on 1.0 Strict, thus excludes attribute lang etc; for details see XHTML 1.1 Appendix A. As an ad hoc solution for validating SVG you can download the SVG 1.1 RNG via wget -q -nd -A rng -l 1 -r http://www.w3.org/Graphics/SVG/1.1/rng/ wget is available for many OSs including Windows, check the wget home page wget (alternative wget home page). Recently they added a zip file, check the directory or try . If the files still contain the following line <!DOCTYPE grammar SYSTEM "../relaxng.dtd"> delete it from all files via the following command (substitute the apostrophes for quotes on Windows): ruby -ni.bak -e 'print if not /^<!DOCTYPE/' *.rng or change the path to a real system identifier (URI or canonical URL).
XMLStarlet<indexterm> <primary>XMLStarlet</primary> </indexterm> From the web site:
XMLStarlet is a set of command line utilities (tools) which can be used to transform, query, validate, and edit XML documents and files using simple set of shell commands in similar way it is done for plain text files using UNIX grep, sed, awk, diff, patch, join, etc commands.
It's fast and promising. home doc
On Linux Here's the script that I use to install XMLStarlet: <filename>install_xmlstar</filename> $command << EOF #!/usr/bin/env sh # may get overwritten ${run}/bin/xml "\$@" EOF chmod 700 $command # fi xmlstar --version ]]>
On Windows Installation is very simple. After having downloaded and unzipped XMLStarlet (xmlstarlet-version-win32.zip) I added the directory containing xml.exe to the system path system path. This makes the system path longer and requires a restart, but batch files support only up to nine arguments which often is not enough when using XMLStarlet. I think that xml is a confusingly generic name for a command so I renamed it to xmlstar by renaming xml.exe to xmlstar.exe.
Try it out Whenever you filter your data through a tool it can get corrupted. If something went wrong you can use u to undo the filtering. XMLStarlet can be used to remove all objects matching an XPath XPath, eg all style attributes from an XHTML document. Paste the following into Vim: foo

foo

]]>
Then do :%!xmlstar ed --delete //@style You should get something like this: foo

foo

]]>
Tidy<indexterm> <primary>Tidy</primary> </indexterm> Sometimes I receive HTML HTML files generated by Microsoft Word; Often they are very bloated. Tidy can make them around five times smaller, and can help with turning them into valid XHTML. The results can't be guaranteed to be really good code regarding semantics semantics and structure structure , but the files become much easier to work with. Current Home Original Home Project
On Linux Here's how I installed Tidy:
On Windows A tidy.bat could look like this: (two lines) <filename>tidy.bat</filename><indexterm> <primary><filename>tidy.bat</filename></primary> </indexterm> @echo off \path\to\tidy.exe -config /path/to/tidyrc.txt -f /log/errors/here/tidyerrs.txt %1 %2 %3 %4 %5 %6 %7 %8 %9 (put it in a directory which is on the system path)
Settings Sample tidyrc.txt: word-2000: yes clean: yes doctype: strict bare: yes drop-font-tags: yes drop-proprietary-attributes: yes enclose-block-text: yes escape-cdata: yes logical-emphasis: yes output-xhtml: yes