本文最后更新于：2021年4月30日下午

信息

XPath是一门在XML档中查找信息的语言
XPath在XML文档中通过元素和属性进行导航
其速度比Beautifulsoup要快，比正则表达式要慢

正则，永远的神

词汇描述

在使用xpath前，先要知道一些DOM描述词汇的意义

其描述与数据结构中树的描述有很多相似的地方

<?xml version="1.0" encoding="ISO-8859-1"?>
<bookstore>
    <book>
      <title lang="en">Harry Potter</title>
      <author>J K. Rowling</author> 
      <year>2005</year>
      <price>29.99</price>
    </book>
</bookstore>

节点 Node

从标签开头到标签结束都是节点内容

比如

1	`<author>J K. Rowling</author>`

值 Atomic value

比如

29.99

项 Item

值或节点

父级 Parent 与子级 Children

直接父级指上级节点，直接子级指下级节点

比如：title节点的父级是book节点
比如：book节点有子节点title

<book>
      <title lang="en">Harry Potter</title>
      ......
</book>

>比如：book节点有子节点title

祖先 Ancestor 与后代 Descendant

多级父级关系与多级子级关系

比如：bookstore有后代节点title节点
比如：title有祖先bookstore节点

<bookstore>
    <book>
      <title>Harry Potter</title>
      ......
    </book>
</bookstore>

同胞 Sibling

拥有相同父节点的节点
然而我还是喜欢叫兄弟节点多一点

比如：title节点与year节点是同胞

<book>
  <title>Harry Potter</title>
  <year>2005</year>
</book>

属性

指节点标签内部声明的内容

比如：title节点有lang属性，其值为en

1	`<title lang="en">Harry Potter</title>`

轴描述语法

语法本身是比较的冗长
缩写的语法虽然简易，但是无法做到一些比较复杂的事情

坐标	名称	说明	缩写语法
child	子节点	比自身节点深度大的一层的节点，且被包含在自身之内	默认，不需要
attribute	属性		@
descendant	子孙节点	比自身节点深度大的节点，且被包含在自身之内	不提供
descendant-or-self	自身引用及子孙节点		//
parent	父节点	比自身节点深度小一层的节点，且包含自身	..
ancestor	祖先节点	比自身节点深度小的节点，且包含自身	不提供
ancestor-or-self	自身引用及祖先节点		不提供
following	下文节点	按纵轴视图，在此节点后的所有完整节点，即不包含其祖先节点	不提供
preceding	前文节点	按纵轴视图，在此节点前的所有完整节点，即不包含其子孙节点	不提供
following-sibling	下一个同级节点		不提供
preceding-sibling	上一个同级节点		不提供
self	自己		.
namespace	名称空间		不提供

<bookstore>
    <book>
      <title lang="eng">Harry Potter</title>
      <price>29.99</price>
    </book>
    <book>
      <title lang="eng">Learning XML</title>
      <price>39.95</price>
    </book>
    <book>
      <title lang="cn">Learning XPath</title>
      <price>23.33</price>
    </book>
</bookstore>

案例：原轴描述语法获取title节点

1	`/child::bookstore/child::book/child::title`

案例：缩写语法获取title节点

1	`/bookstore/book/title`

案例：原轴描述语法获取price节点

1	`/descendant-or-self::price`

案例：缩写语法获取price节点

//price

节点测试

格式	信息	例子	简写
comment()	寻找XML注释节点	获取`<!-- 注释 -->`中的`注释`
text()	寻找某点的文字型别	获取`<k>hello</k>`中的`hello`
processing-instruction()	寻找XML处理指令	`<?php echo $a; ?>`在这个例子里，将符合processing-instruction(‘php’)会传回值
node()	寻找所有点	`//node()[@lang="cn"]`寻找属性`lang`为`cn`的节点	`*`

运算符

运算符	描述	实例	返回值
`+`,`-`,`*`,`div`	加,减,乘,除	`6 + 4`	返回对应计算结果
`=`,`!=`	等于, 不等于	`price=9.80`	根据比较返回`true`或`false`
`<`,`>`,`<=`,`>=`	小于，大于，小于等于，大于等于	`price<=9.80`	根据比较结果返回`true`或`false`
or	或	`price=9.80 or price=9.70`	如果 price 是 9.80，则返回 true，否则返回 false
and	与	`price>9.00 and price<9.90`	如果 price 是 9.80，则返回 true，否则返回 false
mod	计算除法的余数	`5 mod 2`	1

常用函数

<bookstore>
    <book>
      <title lang="eng">Harry Potter</title>
      <price>29.99</price>
    </book>
    <book>
      <title lang="eng">Learning XML</title>
      <price>39.95</price>
    </book>
    <book>
      <title lang="cn">Learning XPath</title>
      <price>23.33</price>
    </book>
    <ticket>
        <spend>10</spend>
    </ticket>
</bookstore>

类型转换函数

string()，number()，boolean()

字符串运算函数

拼接字符串

concat(string1, string2)
拼接两个字符串

例子：concat(/bookstore/book[1]/price/text(), "元")
结果：29.99元
string-join((str1,str2,…),sep)
用指定分割字符拼接多个字符串，与 Python 中的'sep'.join([str1, str2......])相似

例子：string-join(/bookstore/book/title/text(), ',')
结果：Harry Potter,Learning XML,Learning XPath

分割字符串

substring(str, start, length)
返回指定区间子字符串，若不传入长度参数则返回后续全部字符

例子: substring(/bookstore/book[1]/title/text(), 2, 2)
结果：ar
substring-before(str, after_str)
返回在特定字串前的字符串, 类似于 str1[:str1.find(str2)]

例子： substring-before(/bookstore/book[1]/title/text(), "ry")
结果: Har
substring-after(str, before_str)
返回在特定字串后的字符串, 类似于 str1[str1.find(str2) + 1:]

例子：substring-after(/bookstore/book[1]/title/text(), "ry ")
结果：Potter

替换字符串

translate(str, origin_str, replace_str)
用指定字符串替换目标字符串，类似于 str.replace(origin_str, replace_str)

例子：translate(/bookstore/book[1]/title/text(), "Potter", "After")
结果：Harry After
normalize-space(str)
删除开头结尾空白字符，类似于 str.strip()

例子：normalize-space(' The XML ')
结果：The XML

其它

string-length(str)
得到字符串长度

例子：string-length('Beatles')
结果：7
contains(str, sub_str)
检查字符串中是否包含子字符串

例子：contains(/bookstore/book[1]/title/text(), 'Harry')
结果: True

数学运算函数

函数	作用描述	例子
sum()	求和	`sum(/bookstore/book[position()>1]/price/text())` 结果：63.28
ceiling()	向上取整	`ceiling(/bookstore/book[1]/price/text())` 结果：30
floor()	向下取整	`floor(/bookstore/book[1]/price/text())` 结果：29
round()	取最接近整数并非四舍五入，处中间则向下取整入	`round(/bookstore/book[1]/price/text())` 结果：30

节点属性取得函数

name(), local-name(), namespace-uri()

处理上下文数据取得函数

position(), last()

xpath

本博客所有文章除特别声明外，均采用 CC BY-SA 4.0 协议，转载请注明出处！

Django-后端-Python 上一篇

Python-tqdm-进度条下一篇

xpath-XML-数据解析

信息