xp – Select nodes with XPath

XPath expression

Select nodes in an XML source with an XPath [1] expression.

List all attributes of an XML file:

xp "//@*" file.xml

List the latest Python PEPs:

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp "//item/title/text()"

List the latest Python PEPs with their link:

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp "//item/*[name()='title' or name()='link']/text()"

Options

xp can be used with the following command-line options:

$ xp --help

usage: xp [-h] [-V] [-e] [-d DEFAULT_NS_PREFIX] [-r] [-p] [-m] [-f | -F] [-q]
          xpath_expr [xml_source [xml_source ...]]

Select nodes in an XML source with an XPath expression.

positional arguments:
  xpath_expr            XPath expression
  xml_source            XML source (file, <stdin>, http://...)

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -e, --exslt           add EXSLT XML namespace prefixes
  -d DEFAULT_NS_PREFIX, --default-prefix DEFAULT_NS_PREFIX
                        set the prefix for the default namespace in XPath
                        [default: 'd']
  -r, --result-xpath    print the XPath expression of the result element (or
                        its parent)
  -p, --pretty-element  pretty print the result element
  -m, --method          use ElementTree.xpath method instead of XPath class
  -f, -l, --files-with-hits
                        only the names of files with a non-false and non-NaN
                        result are written to standard output
  -F, -L, --files-without-hits
                        only the names of files with a false or NaN result, or
                        without any results are written to standard output
  -q, --quiet           don't print the XML namespace list

Namespaces in XML

List all the XML namespaces [2] (prefix, URI) of the document element:

xp 'namespace::*' file.xml

Print the default namespace of the document element, if it has one:

xp 'namespace::*[name()=""]' file.xml

The default XML namespace in an XML document has no prefix (None). To select nodes in an XML namespace XPath needs prefixed names (qualified names). Therefore xp uses d as the prefix for the default XML namespace.

List the five most recent Python Insider posts:

xp "descendant::d:entry[position()<=5]/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
-d <prefix>, --default-prefix <prefix>

You can change the prefix for the default namespace with the --default-prefix option:

xp -d p "descendant::p:entry[position()<=5]/p:title/text()" \
http://feeds.feedburner.com/PythonInsider

Extensions to XSLT

-e, --exslt

lxml supports the EXSLT [3] extensions through libxslt (requires libxslt 1.1.25 or higher). Add EXSLT namespaces with the --exslt command-line option.

Find Python Insider posts published in or after 2015 with EXSLT (date prefix):

xp -e "//d:entry[date:year(d:published) >= '2015']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider

Python Insider posts updated in December:

xp -e "//d:entry[date:month-name(d:updated) = 'December']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
-q, --quiet

The --quiet command-line option will not print the list with XML namespaces.

Use the power of regular expression (re prefix). Find Python PEPs with “remove” or “specification” in the title (case-insensitive):

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp -e '//item/title[re:match(text(), "(remove|specification)", "i")]' -q

Pretty print element result

-p, --pretty-element

A result element node can be pretty printed with the --pretty-element command-line option.

Warning

The --pretty-element option removes all white space text nodes before applying the XPath expression. Therefore there will be no white space text nodes in the results.

Pretty print the latest Python PEP:

curl -s https://www.python.org/dev/peps/peps.rss/ | xp "//item[1]" -p

xpath method

-m, --method

xp uses the lxml.etree.XPath class by default. You can choose the lxml.etree.ElementTree.xpath method with the --method command-line option. The results should be the same but error reporting can be different.

Footnotes

[1]XML Path Language (XPath) 1.0
[2]Namespaces in XML 1.0
[3]Extensions to XSLT (EXSLT)