Python解析XML可扩展标记语言
XML可扩展标记语言
from xml.etree import Elementree as ET
- 引用xml.etree模块,导入ElementTree功能,可赋予别称
获取根标签(内容)
从本地
- 解析 parse
- 打开 getroot
from xml.etree import ElementTree as ET root = ET.parse('files/my.xml').getroot() for i in root: print(i.tag,i.text)
从网络
- 打开
from xml.etree import ElementTree as ET root = ET.XML(content) for i in root: print(i.tag,i.text)
子元素具有 名称,属性,内容
依赖于find寻找对象(大对象,小对象)
查找
可‘对象’后循环(可多次)
from xml.etree import Elementtree as ET root = ET.parse('files/xo.xml') country_1 = root.find('country') country_year = country.find('year') country_year.text = '2060' root.write('files/xo2.xml', encoding='utf-8', xml_declaration=False)
- 成为对象后可再对象寻找,名称,属性,参数
查找同一层级第一个符合条件的对象
- object.find()
查找同一层级所有符合条件的对象
- object.findall()
查找根层和子层所有符合条件的的对象
- object.iter()
修改
最小化对象后修改内容
a = object.find(’country’).find(’year’) a.text = '123'
增加或者修改最小化对象的属性
country_year.set('age','22')
删除
删除节点
- object-1.remove()
- del object-1.find()
删除操作适用于上层对象
from xml.etree import ElementTree as ET #解析文件(从本地) tree = ET.parse('files/xo.xml') #获取根对象 root = tree.getroot() #删除country=Singapore下的gdppc #循环全部country country_all = root.findall('country') #循环country对象,找到符合Singapore对象 for i in country_all: if i.get('name') == 'Singapore': gdppc_object = i.find('gdppc') i.remove(gdppc_object) # del gdppc_object #写入新文件 tree.write('files/xo2.xml', encoding='utf-8', xml_declaration=False)
创建标签
添加标签
有对象继承
ET.element(’a’,{’a’:’b’})
#按照节点创建新文件 root = ET.Element('root') son1 = ET.Element('son1',{'name':'儿1'}) son2 = ET.Element('son2',{'name':'儿2'}) grandson1 = ET.Element('grandson1',{'son11':'孙1'}) grandson2 = ET.Element('grandson2',{'son22':'孙2'}) #grandson加入son son1.append(grandson1) son1.append(grandson2) root.append(son1) root.append(son2) tree = ET.ElementTree(root) tree.write('files/son.xml',encoding='utf-8',xml_declaration=False)
无对象继承
root.makeelement(’a’,{’a’:’b’})
#按照节点创建新文件 root = ET.Element('root') son1 = root.makeelement('son1',{'name':'儿1'}) son2 = root.makeelement('son2',{'name':'儿2'}) grandson1 = root.makeelement('grandson1',{'son11':'孙1'}) grandson2 = root.makeelement('grandson2',{'son22':'孙2'}) #grandson加入son son1.append(grandson1) son1.append(grandson2) root.append(son1) root.append(son2) tree = ET.ElementTree(root) tree.write('files/son.xml',encoding='utf-8',xml_declaration=False)
线性处理et.subelement(总,现,{属性:属性})
#创建新节点subelement root = ET.Element('root') son1 = ET.SubElement(root, 'son1',attrib={'name':'儿1'}) son2 = ET.SubElement(root, 'son2',attrib={'name':'儿2'}) grandson1 = ET.SubElement(son1, 'grandson',{'name':"孙1"}) grandson2 = ET.SubElement(son1, 'grandson',{'name':"孙2"}) #整合 tree = ET.ElementTree(root) tree.write('files/son2.xml', encoding='utf-8', xml_declaration=False)
保存
另存新文件
tree = ET.ElementTree(root) tree.write('files/son.xml',encoding='utf-8',xml_declaration=False)
短标签(简写)
xml_declaration=False
补充
‘<![CDATA[你好呀]]’ #适用于微信公众号
会被对象.text 自动解析
- 避免特殊符号混淆
content = """<xml> <ToUserName><![CDATA[gh_7f083739789a]]></ToUserName> <FromUserName><![CDATA[oia2TjuEGTNoeX76QEjQNrcURxG8]]></FromUserName> <CreateTime>1395658920</CreateTime> <MsgType><![CDATA[event]]></MsgType> <Event><![CDATA[TEMPLATESENDJOBFINISH]]></Event> <MsgID>200163836</MsgID> <Status><![CDATA[success]]></Status> </xml>""" import xml.etree.ElementTree as ET #微信存储键值对 weixin_info_dict = {} #获取根标签 root = ET.XML(content) #循环根标签对象 for i in root: weixin_info_dict[i.tag] = i.text print(weixin_info_dict)
修改同名元素下的不同元素
- 先findall寻找同名元素,作为对象 (判断名字)
- 循环findall对象,使用get函数确认属性 (判断属性)
- 然后find()作为对象,再直接修改
from xml.etree import ElementTree as ET # 解析文件 tree = ET.parse('files/xo.xml') root = tree.getroot() #筛选重名文件 data = root.findall('country') #根据重名文件确认属性 for child in data: if child.get('name') == 'Singapore': #找到符合条件的,转换对象 sg_data = child.find('year') sg_data.text = '2030' tree.write('files/xo1.xml', encoding='utf-8', xml_declaration=False)