xp – Select nodes with XPath

xp is a tool to fine-tune your XPath [1] expressions. You can also use xp to search for XML files matching an XPath expression.

Examples

Select nodes in an XML source with an XPath expression.

List all attributes of an XML file:

xp "//@*" file.xml

List the latest Python PEPs:

curl -s https://peps.python.org/peps.rss | xp "//item/title/text()"

List the latest Python PEPs with their link:

curl -s https://peps.python.org/peps.rss | \
   xp "//item/*[name()='title' or name()='link']/text()"

Options

xp can be used with the following command-line options:

$ xp --help

usage: xp [-h] [-V] [-l | -L] [-d DEFAULT_NS_PREFIX] [-e] [-q] [-c] [-p] [-r] [-m] xpath_expr [xml_source ...]

Select nodes in an XML source with an XPath expression.

positional arguments:
  xpath_expr            XPath expression
  xml_source            XML source (file, <stdin>, http://...)

options:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -m, --method          use ElementTree.xpath method instead of XPath class

file hit options:
  output filenames to standard output

  -l, -f, --files-with-hits
                        only names of files with a result that is not false and not NaN
                        are written to standard output
  -L, -F, --files-without-hits
                        only names of files with a false or NaN result, or without a result,
                        are written to standard output

namespace options:
  -d DEFAULT_NS_PREFIX, --default-prefix DEFAULT_NS_PREFIX
                        set the prefix for the default namespace in XPath [default: 'd']
  -e, --exslt           add EXSLT XML namespaces
  -q, --quiet           don't print XML source namespaces

output options:
  -c, --count           only a count of the result nodes is printed
  -p, --pretty-element  pretty print the result element
  -r, --result-xpath    also print the XPath expression of the result element (or its parent)

Searching XML files

xp can print file names matching an XPath expression. A matching result (hit) is not false or NaN (not a number). xp can also print file names not matching an XPath expression. False and NaN are non-matching results.

Matching XML files

-l, -f, --files-with-hits

The --files-with-hits command-line option only prints the names of files with an XPath result that is not false and not NaN (not a number). This is similar to grep --files-with-matches using XPath instead of regular expressions.

Find XML files with HTTP URL’s:

xp -l "//mpeg7:MediaUri[starts-with(., 'http://')]" *.xml

XML files where all the book prices are below € 25,-.

xp -el "math:max(//book/price[@currency='€'])<25" *.xml

Non-matching XML files

-L, -F, --files-without-hits

The --files-without-hits command-line option only prints the names of files without any XPath results, or with a false or NaN result. This is similar to grep --files-without-match using XPath instead of regular expressions.

XML files without a person with the family name ‘Bauwens’:

xp -L "//mpeg7:FamilyName[text()='Bauwens']" *.xml

Namespaces in XML

List all the XML namespaces [2] (prefix, URI) of the document element:

xp 'namespace::*' file.xml

Print the default namespace of the document element, if it has one:

xp 'namespace::*[name()=""]' file.xml

Default prefix

-d <prefix>, --default-prefix <prefix>

The default XML namespace in an XML document has no prefix (None). To select nodes in an XML namespace XPath needs prefixed names (qualified names). Therefore xp uses d as the prefix for the default XML namespace.

List the five most recent Python Insider posts:

curl -s https://feeds.feedburner.com/PythonInsider | \
   xp "descendant::d:entry[position()<=5]/d:title/text()"

You can change the prefix for the default namespace with the --default-prefix option:

curl -s https://feeds.feedburner.com/PythonInsider | \
   xp -d p "descendant::p:entry[position()<=5]/p:title/text()" \

Extensions to XSLT

-e, --exslt

lxml supports the EXSLT [3] extensions through libxslt (requires libxslt 1.1.25 or higher). Add EXSLT namespaces with the --exslt command-line option.

Find Python Insider posts published in or after 2015 with EXSLT (date prefix):

curl -s https://feeds.feedburner.com/PythonInsider | \
   xp -e "//d:entry[date:year(d:published) >= '2015']/d:title/text()"

Python Insider posts updated in December:

curl -s https://feeds.feedburner.com/PythonInsider | \
   xp -e "//d:entry[date:month-name(d:updated) = 'December']/d:title/text()"

Do not list namespaces

-q, --quiet

With the --quiet command-line option xp will not print the list with XML namespaces.

Find Python PEPs with four digits in the title (case-insensitive) using the power of regular expression (EXSLT re prefix):

curl -s https://peps.python.org/peps.rss | \
   xp -eq '//item/title[re:match(text(), "pep [0-9]{4}:", "i")]'

Output options

xp can show the XPath expression of the result elements and/or pretty print the result elements. Or you can just count the number of result nodes.

Node count

-c, --count

Count the number of result nodes with the --count command-line option. This is similar to grep --count using XPath instead of regular expressions.

Only count the number of series titles:

xp --count "//d:Title[@type='parentSeriesTitle']" file1.xml file2.xml⋅file3.xml

Pretty print result element

-p, --pretty-element

A result element node can be pretty printed with the --pretty-element command-line option.

Note

The --pretty-element option removes all white space text nodes before applying the XPath expression. Therefore there will be no white space text nodes in the results.

Pretty print the latest Python PEP:

curl -s https://peps.python.org/peps.rss | xp -p "//item[1]"

Other options

xpath method

-m, --method

xp uses the lxml.etree.XPath class by default. You can choose the lxml.etree.ElementTree.xpath method with the --method command-line option. The results should be the same but error reporting can be different.

Footnotes