xp – Select nodes with XPath

XPath expression

Select nodes in an XML source with an XPath [1] expression.

List all attributes of an XML file:

xp "//@*" file.xml

List the latest Python PEPs:

curl -s https://peps.python.org/peps.rss | xp "//item/title/text()"

List the latest Python PEPs with their link:

curl -s https://peps.python.org/peps.rss | \
xp "//item/*[name()='title' or name()='link']/text()"

Options

xp can be used with the following command-line options:

$ xp --help

usage: xp [-h] [-V] [-l | -L] [-d DEFAULT_NS_PREFIX] [-e] [-q] [-p] [-r] [-m] xpath_expr [xml_source ...]

Select nodes in an XML source with an XPath expression.

positional arguments:
  xpath_expr            XPath expression
  xml_source            XML source (file, <stdin>, http://...)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m, --method          use ElementTree.xpath method instead of XPath class

file hit options:
  output filenames to standard output

  -l, -f, --files-with-hits
                        only the names of files with a non-false and non-NaN result are written to standard output
  -L, -F, --files-without-hits
                        only the names of files with a false or NaN result, or without any results are written to
                        standard output

namespace options:
  -d DEFAULT_NS_PREFIX, --default-prefix DEFAULT_NS_PREFIX
                        set the prefix for the default namespace in XPath [default: 'd']
  -e, --exslt           add EXSLT XML namespaces
  -q, --quiet           don't print XML source namespaces

output options:
-p, --pretty-element  pretty print the result element
-r, --result-xpath    print the XPath expression of the result element (or its parent)

Pretty print result element

-p, --pretty-element

A result element node can be pretty printed with the --pretty-element command-line option.

Warning

The --pretty-element option removes all white space text nodes before applying the XPath expression. Therefore there will be no white space text nodes in the results.

Pretty print the latest Python PEP:

curl -s https://peps.python.org/peps.rss | xp "//item[1]" -p

Namespaces in XML

List all the XML namespaces [2] (prefix, URI) of the document element:

xp 'namespace::*' file.xml

Print the default namespace of the document element, if it has one:

xp 'namespace::*[name()=""]' file.xml

The default XML namespace in an XML document has no prefix (None). To select nodes in an XML namespace XPath needs prefixed names (qualified names). Therefore xp uses d as the prefix for the default XML namespace.

List the five most recent Python Insider posts:

xp "descendant::d:entry[position()<=5]/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
-d <prefix>, --default-prefix <prefix>

You can change the prefix for the default namespace with the --default-prefix option:

xp -d p "descendant::p:entry[position()<=5]/p:title/text()" \
http://feeds.feedburner.com/PythonInsider

Extensions to XSLT

-e, --exslt

lxml supports the EXSLT [3] extensions through libxslt (requires libxslt 1.1.25 or higher). Add EXSLT namespaces with the --exslt command-line option.

Find Python Insider posts published in or after 2015 with EXSLT (date prefix):

xp -e "//d:entry[date:year(d:published) >= '2015']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider

Python Insider posts updated in December:

xp -e "//d:entry[date:month-name(d:updated) = 'December']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
-q, --quiet

The --quiet command-line option will not print the list with XML namespaces.

Use the power of regular expression (re prefix). Find Python PEPs with four digits in the title (case-insensitive):

curl -s https://peps.python.org/peps.rss | \
xp -e '//item/title[re:match(text(), "pep [0-9]{4}:", "i")]' -q

xpath method

-m, --method

xp uses the lxml.etree.XPath class by default. You can choose the lxml.etree.ElementTree.xpath method with the --method command-line option. The results should be the same but error reporting can be different.

Footnotes