More Tasks
tasks
Questions
questions
Sometimes you need to find out facts about the buffer with which
you're working.
Here are some examples: (each command line is to be entered as one line)
How many indexterm elements are there?
What's the title of the first section
which has more than one nested sections?
What's the title of the first section
which is nested deeper than one level?
How many programlistings are there which don't have
a role attribute?
How many XHTML div
elements are in the document?
How many words are there in the document?
You might need to exchange the apostrophes around the Ruby code for
quotes on Windows.
To count the words inside
the (first) chapter with title Foo
replace "/"
with
"//chapter[title='Foo']".
If wc is available on your system you can try
| wc -w
instead of
| ruby -e [...].
Validation
validation
If you just want to check for well-formedness
well-formedness, enter
:!xmllint --noout %
Most often you will want to validate the document.
The following command writes the buffer to xmllint:
:%w !xmllint --valid --noout -
%buffer
This means you can validate the document with all changes,
without having to save it first.
If you set up a shell script or batch file just for validation, eg
xmllintval.bat
xmllintval.bat
@echo off
xmllint --valid --noout %1 %2 %3 %4 %5
then you can simply do
:%w !xmllintval -
If you want to validate the file,
pass the file's path
%file path
to the validator:
:!xmllint --valid --noout %
or
:!xmllintval %
xmllint also allows you to validate documents which
don't have a document type declaration
doctype
declarationnone.
Let's say you have a DBX chapter file
Setup
...]]>
which is included in the main book file via an entity reference
entity reference
...
]>
...
&setup;
...]]>
This means the chapter file can't have a doctype declaration
doctype declaration.
XInclude is a solution for that problem but it's not yet as widely
supported as the entity reference
mechanism which is part of the
XML standard itself.
Being able to validate documents which have no
document type declaration
is generally useful,
and factors like the growing popularity of RNG
RNG, namespaces
and version/profile attributes, and
the hopefully decreasing dependence on DTDs to modify the
document's information set will probably mean that less documents
will have document type declarations.
So let's say you edit the chapter in Vim
and want to validate the buffer.
:%w !xmllint --valid --noout -
will fail because there is no FPI
included and no DTD referenced.
So you can try the following (one line):
:%w !xmllint --dtdvalid
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" --noout -
If you set up the catalog as described in the previous chapter then
this works offline.
(It won't work if the chapter contains references to entities
that are declared in the main (book) file.
In this case you probably would feed the main file to the validator.)
The line doesn't have to be changed when used on a different machine
which means that you can add it to your crossplatform
vimrc:
d4 :%w !xmllint --dtdvalid
\ "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
\ --noout -]]>
Alternatively you can pass the
FPI
FPI
to xmllint (one line)
:%w !xmllint --dtdvalidfpi
"-//OASIS//DTD DocBook XML V4.2//EN" --noout -
if it's listed in a catalog.
mapleader
d
4
works great for DBX chapters which are external
entities referenced from the main book file as long as the following
requirements are met: In addition to being an external parsed entity
(can't contain a doctype declaration) the file must also be an
XML document. In other words: Since the file can't
contain entity declarations it also can't contain entity references.
Making sure that modules of a DBX document are
well-formed XML documents that can stand alone is
generally a good idea since it simplifies collaboration and exchange,
reuse, and last but not least editing and validation.
To validate the buffer with RXP, type
:%w !rxp -V -N -s -x
%buffer
If you have set up rxpval
you can enter
:%w !rxpval
This will not work if there are
relative references to external entities
and the current directory is not that of the file being validated
(this also goes for stuff like :%w !xmllintval -).
When RXP or xmllint eat stdin they use the current directory
as base directory
when resolving relative references.
So if you want to validate a buffer which has relative references to
external entities, you can do
:cd /directory/of/the/file/
to change the working directory, and
:pwd
to check it.
A more convenient way to change the working directory
working directory
is to put
cd :exe 'cd ' . expand ("%:p:h")]]>
into your vimrc then do
mapleader
c
d
before writing the buffer to the validator via
:%w !command....
If you want to validate the file, then it's not necessary to change
the working directory.
You can simply pass the file path, eg
:!rxp -V -N -s -x %
%file path
or
:!rxpval %
RXP works with XHTML documents,
but with DBX documents I
experienced problems which hopefully will be resolved with future
releases.
There are various command line options; check
the RXP
man page.
The commands are long enough to be tedious to type each time.
After you've entered a specific one once,
you can look for it in the command history:
Type
:
to go to Vim's command line
optionally type the start of the command
then repeat
up
until you see
the command you were looking for.
Now you can edit it, or enter it unmodified.
Another way of saving typing is to map commands to a very short key
sequence in the vimrc.
Pretty-printing
pretty-printing
Pretty-printing is not a trivial task.
Relevant details of your document might be changed, so beware.
I mainly use pretty-printing when viewing documents
with extremely long lines or no indention,
or when editing XML generated by tools,
but I nearly never use a pretty-printing tool to
format the code of documents I author.
Again: Your data can get corrupted whenever you filter the buffer.
This goes for search'n'replace, external tools, etc.
Use u to undo.
Select a well-formed fragment (one root element),
then filter
filtering
it through
xmllint's pretty-printer,
by entering
!xmllint --format -
xmllint will insert an XML prolog;
if you didn't filter the whole buffer, this probably isn't desired.
You can map this filter command to some shorter key sequence,
and also include some commands to delete the prolog.
Namespace prefixes
namespace prefixes
not declared inside the fragment are stripped.
To pretty-print the whole buffer, do
:%!xmllint --format -
%buffer
Example
Let's say you receive a file which looks like this:
Ökopläne
übermäßige
Ölförderung stoppen]]>
The code is laid out in a way which makes it hard to work with.
This
:%!xmllint --format --encode UTF-8 -
should bring
Ökopläne
übermäßige
Ölförderung stoppen
]]>
which looks better, but
xmllint currently doesn't resolve the
NCRs of "special characters"
(probably those outside the US-ASCII range)
inside attribute values
which can make editing harder.
You could also try tidy for
pretty-printing.
Cleaning up
cleaning up
To delete all comments do
:%!xmlstar ed -d //comment()
HTML
HTML
If you have to deal with tag soup
tag soup
you might want to try turning it into XHTML so that
you can enjoy all the advantages of XML.
To filter
filtering
the whole buffer through Tidy, do
:%!tidy
!filter
If there are irrecoverable errors, eg unknown elements,
everything will be deleted.
You can undo
undo
the filtering via u.
To filter a block, do
!}tidy
but don't forget that depending on the doctype flag
Tidy might insert a doctype declaration if there is none.
If there's unwanted stuff left,
strip tags
strip tags
like this:
,,g]]>