xp – Select nodes with XPath¶
XPath expression¶
Select nodes in an XML source with an XPath [1] expression.
List all attributes of an XML file:
xp "//@*" file.xml
List the latest Python PEPs:
curl -s https://peps.python.org/peps.rss | xp "//item/title/text()"
List the latest Python PEPs with their link:
curl -s https://peps.python.org/peps.rss | \
xp "//item/*[name()='title' or name()='link']/text()"
Options¶
xp
can be used with the following command-line options:
$ xp --help
usage: xp [-h] [-V] [-l | -L] [-d DEFAULT_NS_PREFIX] [-e] [-q] [-p] [-r] [-m] xpath_expr [xml_source ...]
Select nodes in an XML source with an XPath expression.
positional arguments:
xpath_expr XPath expression
xml_source XML source (file, <stdin>, http://...)
options:
-h, --help show this help message and exit
-V, --version show program's version number and exit
-m, --method use ElementTree.xpath method instead of XPath class
file hit options:
output filenames to standard output
-l, -f, --files-with-hits
only the names of files with a non-false and non-NaN result are written to standard output
-L, -F, --files-without-hits
only the names of files with a false or NaN result, or without any results are written to
standard output
namespace options:
-d DEFAULT_NS_PREFIX, --default-prefix DEFAULT_NS_PREFIX
set the prefix for the default namespace in XPath [default: 'd']
-e, --exslt add EXSLT XML namespaces
-q, --quiet don't print XML source namespaces
output options:
-p, --pretty-element pretty print the result element
-r, --result-xpath print the XPath expression of the result element (or its parent)
Print result’s XPath¶
- -r, --result-xpath¶
Print the XPath expression of each result element with the --result-xpath
option.
Each XPath expression will have an absolute location path.
xp --result-xpath "//title" file.xml
If an XPath result is a text or attribute node xp
prints the parent element’s
XPath expression.
List the XPath expressions of all elements with attributes:
xp -r "//@*" file.xml
Pretty print result element¶
- -p, --pretty-element¶
A result element node can be pretty printed with the --pretty-element
command-line option.
Warning
The --pretty-element
option removes all white space text nodes
before applying the XPath expression. Therefore there will be no white space
text nodes in the results.
Pretty print the latest Python PEP:
curl -s https://peps.python.org/peps.rss | xp "//item[1]" -p
Namespaces in XML¶
List all the XML namespaces [2] (prefix, URI) of the document element:
xp 'namespace::*' file.xml
Print the default namespace of the document element, if it has one:
xp 'namespace::*[name()=""]' file.xml
The default XML namespace in an XML document has no prefix (None).
To select nodes in an XML namespace XPath needs prefixed names (qualified names).
Therefore xp
uses d
as the prefix for the default XML namespace.
List the five most recent Python Insider posts:
xp "descendant::d:entry[position()<=5]/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
- -d <prefix>, --default-prefix <prefix>¶
You can change the prefix for the default namespace with the --default-prefix
option:
xp -d p "descendant::p:entry[position()<=5]/p:title/text()" \
http://feeds.feedburner.com/PythonInsider
Extensions to XSLT¶
- -e, --exslt¶
lxml supports the EXSLT [3] extensions through libxslt (requires libxslt 1.1.25 or higher). Add EXSLT namespaces with the --exslt
command-line option.
Find Python Insider posts published in or after 2015 with EXSLT (date
prefix):
xp -e "//d:entry[date:year(d:published) >= '2015']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
Python Insider posts updated in December:
xp -e "//d:entry[date:month-name(d:updated) = 'December']/d:title/text()" \
http://feeds.feedburner.com/PythonInsider
- -q, --quiet¶
The --quiet
command-line option will not print the list with XML namespaces.
Use the power of regular expression (re
prefix).
Find Python PEPs with four digits in the title (case-insensitive):
curl -s https://peps.python.org/peps.rss | \
xp -e '//item/title[re:match(text(), "pep [0-9]{4}:", "i")]' -q
Print file names¶
- -l, -f, --files-with-hits¶
The --files-with-hits
command-line option only prints the names
of files with an XPath result that is not false and not NaN (not a number).
This is similar to grep --files-with-matches
using XPath instead of regular expressions.
Find XML files with HTTP URL’s:
xp "//mpeg7:MediaUri[starts-with(., 'http://')]" *.xml -l
XML files where all the book prices are below € 25,-.
xp -el "math:max(//book/price[@currency='€'])<25" *.xml
- -L, -F, --files-without-hits¶
The --files-without-hits
command-line option only prints the names
of files without any XPath results, or with a false or NaN result.
This is similar to grep --files-without-match
using XPath instead of regular expressions.
XML files without a person with the family name ‘Bauwens’:
xp "//mpeg7:FamilyName[text()='Bauwens']" *.xml -L
xpath method¶
- -m, --method¶
xp
uses the lxml.etree.XPath class by default. You can choose the
lxml.etree.ElementTree.xpath method with the --method
command-line option.
The results should be the same but error reporting can be different.
Footnotes