xp – Select nodes with XPath¶

XPath expression¶

Select nodes in an XML source with an XPath [1] expression.

List all attributes of an XML file:

xp "//@*" file.xml

List the latest Python PEPs:

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp "//item/title/text()"

List the latest Python PEPs with their link:

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp "//item/*[name()='title' or name()='link']/text()"

Options¶

xp can be used with the following command-line options:

$ xp --help

usage: xp [-h] [-V] [-e] [-d DEFAULT_NS_PREFIX] [-r] [-p] [-m] [-f | -F] [-q]
          xpath_expr [xml_source [xml_source ...]]

Select nodes in an XML source with an XPath expression.

positional arguments:
  xpath_expr            XPath expression
  xml_source            XML source (file, <stdin>, http://...)

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -e, --exslt           add EXSLT XML namespace prefixes
  -d DEFAULT_NS_PREFIX, --default-prefix DEFAULT_NS_PREFIX
                        set the prefix for the default namespace in XPath
                        [default: 'd']
  -r, --result-xpath    print the XPath expression of the result element (or
                        its parent)
  -p, --pretty-element  pretty print the result element
  -m, --method          use ElementTree.xpath method instead of XPath class
  -f, -l, --files-with-hits
                        only the names of files with a non-false and non-NaN
                        result are written to standard output
  -F, -L, --files-without-hits
                        only the names of files with a false or NaN result, or
                        without any results are written to standard output
  -q, --quiet           don't print the XML namespace list

Print result’s XPath¶

-r, --result-xpath¶

Print the XPath expression of each result element with the --result-xpath option. Each XPath expression will have an absolute location path.

xp --result-xpath "//title" file.xml

If an XPath result is a text or attribute node xp prints the parent element’s XPath expression.

List the XPath expressions of all elements with attributes:

xp -r "//@*" file.xml

Namespaces in XML¶

List all the XML namespaces [2] (prefix, URI) of the document element:

xp 'namespace::*' file.xml

Print the default namespace of the document element, if it has one:

xp 'namespace::*[name()=""]' file.xml

The default XML namespace in an XML document has no prefix (None). To select nodes in an XML namespace XPath needs prefixed names (qualified names). Therefore xp uses d as the prefix for the default XML namespace.

List the five most recent Python Insider posts:

xp "descendant::d:entry[position()<=5]/d:title/text()" \
http://feeds.feedburner.com/PythonInsider

-d <prefix>, --default-prefix <prefix>¶

You can change the prefix for the default namespace with the --default-prefix option:

xp -d p "descendant::p:entry[position()<=5]/p:title/text()" \
http://feeds.feedburner.com/PythonInsider

Extensions to XSLT¶

-e, --exslt¶

lxml supports the EXSLT [3] extensions through libxslt (requires libxslt 1.1.25 or higher). Add EXSLT namespaces with the --exslt command-line option.

Find Python Insider posts published in or after 2015 with EXSLT (date prefix):

xp -e "//d:entry[date:year(d:published) >= '2015']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider

Python Insider posts updated in December:

xp -e "//d:entry[date:month-name(d:updated) = 'December']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider

-q, --quiet¶

The --quiet command-line option will not print the list with XML namespaces.

Use the power of regular expression (re prefix). Find Python PEPs with “remove” or “specification” in the title (case-insensitive):

curl -s https://www.python.org/dev/peps/peps.rss/ | \
xp -e '//item/title[re:match(text(), "(remove|specification)", "i")]' -q

Pretty print element result¶

-p, --pretty-element¶

A result element node can be pretty printed with the --pretty-element command-line option.

Warning

The --pretty-element option removes all white space text nodes before applying the XPath expression. Therefore there will be no white space text nodes in the results.

Pretty print the latest Python PEP:

curl -s https://www.python.org/dev/peps/peps.rss/ | xp "//item[1]" -p

Print file names¶

-f, -l, --files-with-hits¶

The --files-with-hits command-line option only prints the names of files with an XPath result that is not false and not NaN (not a number).

Find XML files with HTTP URL’s:

xp "//mpeg7:MediaUri[starts-with(., 'http://')]" *.xml -f

XML files where all the book prices are below € 25,-.

xp -el "math:max(//book/price[@currency='€'])<25" *.xml

-F, -L, --files-without-hits¶

The --files-without-hits command-line option only prints the names of files without any XPath results, or with a false or NaN result.

XML files without a person with the family name ‘Bauwens’:

xp "//mpeg7:FamilyName[text()='Bauwens']" *.xml -F

xpath method¶

-m, --method¶

xp uses the lxml.etree.XPath class by default. You can choose the lxml.etree.ElementTree.xpath method with the --method command-line option. The results should be the same but error reporting can be different.

Footnotes

[1]	XML Path Language (XPath) 1.0

[2]	Namespaces in XML 1.0

[3]	Extensions to XSLT (EXSLT)

Xul

Navigation

Related Topics