Vim as XML Editor: More Tasks
Sometimes you need to find out facts about the buffer with which you're working. Here are some examples: (each command line is to be entered as one line)
- How many indexterm elements are there?
-
:%w !xmlstar sel -t -v "count(//indexterm)"
- What's the title of the first section which has more than one nested sections?
-
:%w !xmlstar sel -t -v "//section[count(.//section)>1]/title"
- What's the title of the first section which is nested deeper than one level?
-
:%w !xmlstar sel -t -v "//section[count(ancestor::section)>1]/title"
- How many programlistings are there which don't have a role attribute?
-
:%w !xmlstar sel -t -v "count(//programlisting[not(@role)])"
- How many XHTML div elements are in the document?
-
:%w !xmlstar sel -N "xh=http://www.w3.org/1999/xhtml" -t -v "count(//xh:div)"
- How many words are there in the document?
-
You might need to exchange the apostrophes around the Ruby code for quotes on Windows. To count the words inside the (first) chapter with title "Foo" replace "/" with "//chapter[title='Foo']". If:%w !xmlstar sel -t -v "/" | ruby -e 'puts($stdin.read.scan(/\S+/).length)'
wc
is available on your system you can try | wc -w instead of | ruby -e[...]
.
:!xmllint --noout %
:%w !xmllint --valid --noout -
This means you can validate the document with all changes,
without having to save it first.
If you set up a shell script or batch file just for validation, eg
xmllintval.bat
@echo off
xmllint --valid --noout %1 %2 %3 %4 %5
:%w !xmllintval -
:!xmllint --valid --noout %
or
:!xmllintval %
xmllint
also allows you to validate documents which
don't have a document type declaration
.
Let's say you have a DBX chapter file
<?
xml
version
=
"
1.0
"
encoding
=
"
UTF-8
"
?>
<
chapter
>
<
title
>
Setup<
/
title
>
...
which is included in the main book file via an entity reference
<?
xml
version
=
"
1.0
"
encoding
=
"
UTF-8
"
?>
<!DOCTYPE book
PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"https://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY setup SYSTEM "setup.dbx">
...
]>
...
&
setup
;
...
This means the chapter file can't have a doctype declaration.
XInclude is a solution for that problem but it's not yet as widely
supported as the entity reference
mechanism which is part of the
XML standard itself.
Being able to validate documents which have no
document type declaration
is generally useful,
and factors like the growing popularity of RNG, namespaces
and version/profile attributes, and
the hopefully decreasing dependence on DTDs to modify the
document's information set will probably mean that less documents
will have document type declarations.
So let's say you edit the chapter in Vim
and want to validate the buffer.
:%w !xmllint --valid --noout -
will fail because there is no FPI
included and no DTD referenced.
So you can try the following (one line):
:%w !xmllint --dtdvalid
"https://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" --noout -
If you set up the catalog as described in the previous chapter then
this works offline.
(It won't work if the chapter contains references to entities
that are declared in the main (book) file.
In this case you probably would feed the main file to the validator.)
The line doesn't have to be changed when used on a different machine
which means that you can add it to your crossplatform
vimrc:
nmap <Leader>d4 :%w !xmllint --dtdvalid
\ "https://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
\ --noout -<CR>
Alternatively you can pass the
FPI
to xmllint
(one line)
:%w !xmllint --dtdvalidfpi
"-//OASIS//DTD DocBook XML V4.2//EN" --noout -
if it's listed in a catalog.
[mapleader]
d
4
works great for DBX chapters which are external
entities referenced from the main book file as long as the following
requirements are met: In addition to being an external parsed entity
(can't contain a doctype declaration) the file must also be an
XML document. In other words: Since the file can't
contain entity declarations it also can't contain entity references.
Making sure that modules of a DBX document are
well-formed XML documents that can stand alone is
generally a good idea since it simplifies collaboration and exchange,
reuse, and last but not least editing and validation.
:%w !rxp -V -N -s -x
If you have set up rxpval
you can enter
:%w !rxpval
This will not work if there are
relative references to external entities
and the current directory is not that of the file being validated
(this also goes for stuff like :%w !xmllintval -).
When RXP or xmllint eat stdin they use the current directory
as base directory
when resolving relative references.
So if you want to validate a buffer which has relative references to
external entities, you can do
:cd /directory/of/the/file/
to change the working directory, and
:pwd
to check it.
A more convenient way to change the working directory
is to put
map <Leader>cd :exe 'cd ' . expand ("%:p:h")<CR>
into your vimrc then do
[mapleader]
c
d
before writing the buffer to the validator via
:%w !command
....
:!rxp -V -N -s -x %
or
:!rxpval %
RXP works with XHTML documents,
but with DBX documents I
experienced problems which hopefully will be resolved with future
releases.
There are various command line options; check
the RXP
man page.
- Type
to go to Vim's command line,:
- optionally type the start of the command,
- then repeat
until you see the command you were looking for.[up]
Pretty-printing is not a trivial task. Relevant details of your document might be changed, so beware. I mainly use pretty-printing when viewing documents with extremely long lines or no indention, or when editing XML generated by tools, but I nearly never use a pretty-printing tool to format the code of documents I author.
Warning
Again: Your data can get corrupted whenever you filter the buffer.
This goes for search'n'replace, external tools, etc.
Use u
to undo.
xmllint
's pretty-printer,
by entering
!xmllint --format -
xmllint
will insert an XML prolog;
if you didn't filter the whole buffer, this probably isn't desired.
You can map this filter command to some shorter key sequence,
and also include some commands to delete the prolog.
Caution
Namespace prefixes not declared inside the fragment are stripped.
:%!xmllint --format -
Example
<?
xml
version
=
"
1.0
"
?>
<
chapter
>
<
title
>
Ökopläne<
/
title
>
<
simplelist
>
<
member
role
=
"
überfällig
"
>
übermäßige
Ölförderung stoppen<
/
member
>
<
/
simplelist
>
<
/
chapter
>
The code is laid out in a way which makes it hard to work with.
This
:%!xmllint --format --encode UTF-8 -
should bring
<?
xml
version
=
"
1.0
"
encoding
=
"
UTF-8
"
?>
<
chapter
>
<
title
>
Ökopläne<
/
title
>
<
simplelist
>
<
member
role
=
"
&
#xFC
;
berf&
#xE4
;
llig
"
>
übermäßige
Ölförderung stoppen<
/
member
>
<
/
simplelist
>
<
/
chapter
>
which looks better, but
xmllint
currently doesn't resolve the
NCRs of "special characters"
(probably those outside the US-ASCII range)
inside attribute values
which can make editing harder.
You could also try tidy
for
pretty-printing.
:%!xmlstar ed -d //comment()
HTML
:%!tidy
If there are irrecoverable errors, eg unknown elements,
everything will be deleted.
You can undo
the filtering via u
.
!}tidy
but don't forget that depending on the doctype flag
Tidy might insert a doctype declaration if there is none.
:%s,</\?u>,,g