Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
ÉîÈë½â¶ÁPython½âÎöXMLµÄ¼¸ÖÖ·½Ê½
 
À´×ÔÓÚ£º±à³ÌÅÉ ·¢²¼ÓÚ 2016-7-21
  3115  次浏览      27
 

ÔÚXML½âÎö·½Ã棬Python¹á³¹ÁË×Ô¼º¡°¿ªÏä¼´Óá±£¨batteries included£©µÄÔ­Ôò¡£ÔÚ×Ô´øµÄ±ê×¼¿âÖУ¬PythonÌṩÁË´óÁ¿¿ÉÒÔÓÃÓÚ´¦ÀíXMLÓïÑԵİüºÍ¹¤¾ß£¬ÊýÁ¿Ö®¶à£¬ÉõÖÁÈÃPython±à³ÌÐÂÊÖÎÞ´ÓÑ¡Ôñ¡£

±¾ÎĽ«½éÉÜÉîÈë½â¶ÁÀûÓÃPythonÓïÑÔ½âÎöXMLÎļþµÄ¼¸ÖÖ·½Ê½£¬²¢ÒÔ±ÊÕßÍÆ¼öʹÓõÄElementTreeÄ£¿éΪÀý£¬ÑÝʾ¾ßÌåʹÓ÷½·¨ºÍ³¡¾°¡£ÎÄÖÐËùʹÓõÄPython°æ±¾Îª2.7¡£

ʲôÊÇXML?

XMLÊÇ¿ÉÀ©Õ¹±ê¼ÇÓïÑÔ£¨Extensible Markup Language£©µÄËõд£¬ÆäÖÐµÄ ±ê¼Ç£¨markup£©Êǹؼü²¿·Ö¡£Äú¿ÉÒÔ´´½¨ÄÚÈÝ£¬È»ºóʹÓÃÏÞ¶¨±ê¼Ç±ê¼ÇËü£¬´Ó¶øÊ¹Ã¿¸öµ¥´Ê¡¢¶ÌÓï»ò¿é³ÉΪ¿Éʶ±ð¡¢¿É·ÖÀàµÄÐÅÏ¢¡£

±ê¼ÇÓïÑÔ´ÓÔçÆÚµÄ˽Óй«Ë¾ºÍÕþ¸®Öƶ¨ÐÎʽÖð½¥Ñݱä³É±ê׼ͨÓñê¼ÇÓïÑÔ£¨Standard Generalized Markup Language£¬SGML£©¡¢³¬Îı¾±ê¼ÇÓïÑÔ£¨Hypertext Markup Language£¬HTML£©£¬²¢ÇÒ×îÖÕÑݱä³É XML¡£XMLÓÐÒÔϼ¸¸öÌØµã:

XMLµÄÉè¼Æ×ÚÖ¼ÊÇ´«ÊäÊý¾Ý£¬¶ø·ÇÏÔʾÊý¾Ý¡£

XML±êǩûÓб»Ô¤¶¨Òå¡£ÄúÐèÒª×ÔÐж¨Òå±êÇ©¡£

XML±»Éè¼ÆÎª¾ßÓÐ×ÔÎÒÃèÊöÐÔ¡£

XMLÊÇW3CµÄÍÆ¼ö±ê×¼¡£

Ŀǰ£¬XMLÔÚWebÖÐÆðµ½µÄ×÷Óò»»áÑÇÓÚÒ»Ö±×÷ΪWeb»ùʯµÄHTML¡£ XMLÎÞËù²»ÔÚ¡£XMLÊǸ÷ÖÖÓ¦ÓóÌÐòÖ®¼ä½øÐÐÊý¾Ý´«ÊäµÄ×î³£ÓõŤ¾ß£¬²¢ÇÒÔÚÐÅÏ¢´æ´¢ºÍÃèÊöÁìÓò±äµÃÔ½À´Ô½Á÷ÐС£Òò´Ë£¬Ñ§»áÈçºÎ½âÎöXMLÎļþ£¬¶ÔÓÚWeb¿ª·¢À´ËµÊÇÊ®·ÖÖØÒªµÄ¡£

ÓÐÄÄЩ¿ÉÒÔ½âÎöXMLµÄPython°ü£¿

PythonµÄ±ê×¼¿âÖУ¬ÌṩÁË6ÖÖ¿ÉÒÔÓÃÓÚ´¦ÀíXMLµÄ°ü¡£

xml.dom

xml.domʵÏÖµÄÊÇW3CÖÆ¶¨µÄDOM API¡£Èç¹ûÄãϰ¹ßÓÚʹÓÃDOM API»òÕßÓÐÈËÒªÇóÕâÕâÑù×ö£¬¿ÉÒÔʹÓÃÕâ¸ö°ü¡£²»¹ýҪעÒ⣬ÔÚÕâ¸ö°üÖУ¬»¹ÌṩÁ˼¸¸ö²»Í¬µÄÄ£¿é£¬¸÷×ÔµÄÐÔÄÜÓÐËùÇø±ð¡£

DOM½âÎöÆ÷ÔÚÈκδ¦Àí¿ªÊ¼Ö®Ç°£¬±ØÐë°Ñ»ùÓÚXMLÎļþÉú³ÉµÄÊ÷×´Êý¾Ý·ÅÔÚÄڴ棬ËùÒÔDOM½âÎöÆ÷µÄÄÚ´æÊ¹ÓÃÁ¿ÍêÈ«¸ù¾ÝÊäÈë×ÊÁϵĴóС¡£

xml.dom.minidom

xml.dom.minidomÊÇDOM APIµÄ¼«¼ò»¯ÊµÏÖ£¬±ÈÍêÕû°æµÄDOMÒª¼òµ¥µÄ¶à£¬¶øÇÒÕâ¸ö°üҲСµÄ¶à¡£ÄÇЩ²»ÊìϤDOMµÄÅóÓÑ£¬Ó¦¸Ã¿¼ÂÇʹÓÃxml.etree.ElementTreeÄ£¿é¡£¾ÝlxmlµÄ×÷Õ߯À¼Û£¬Õâ¸öÄ£¿éʹÓÃÆðÀ´²¢²»·½±ã£¬Ð§ÂÊÒ²²»¸ß£¬¶øÇÒ»¹ÈÝÒ׳öÏÖÎÊÌâ¡£

xml.dom.pulldom

ÓëÆäËûÄ£¿é²»Í¬£¬xml.dom.pulldomÄ£¿éÌṩµÄÊÇÒ»¸ö¡°pull½âÎöÆ÷¡±£¬Æä±³ºóµÄ»ù±¾¸ÅÄîÖ¸µÄÊÇ´ÓXMLÁ÷ÖÐpullʼþ£¬È»ºó½øÐд¦Àí¡£ËäÈ»ÓëSAXÒ»Ñù²ÉÓÃʼþÇý¶¯Ä£ÐÍ£¨event-driven processing model£©£¬µ«ÊDz»Í¬µÄÊÇ£¬Ê¹ÓÃpull½âÎöÆ÷ʱ£¬Ê¹ÓÃÕßÐèÒªÃ÷È·µØ´ÓXMLÁ÷ÖÐpullʼþ£¬²¢¶ÔÕâЩʼþ±éÀú´¦Àí£¬Ö±µ½´¦ÀíÍê³É»òÕß³öÏÖ´íÎó¡£

pull½âÎö£¨pull parsing£©ÊǽüÀ´ÐËÆðµÄÒ»ÖÖXML´¦ÀíÇ÷ÊÆ¡£´ËǰÖîÈçSAXºÍDOMÕâЩÁ÷ÐеÄXML½âÎö¿ò¼Ü£¬¶¼ÊÇpush-based£¬Ò²¾ÍÊÇ˵¶Ô½âÎö¹¤×÷µÄ¿ØÖÆÈ¨£¬ÕÆÎÕÔÚ½âÎöÆ÷µÄÊÖÖС£

xml.sax

xml.saxÄ£¿éʵÏÖµÄÊÇSAX API£¬Õâ¸öÄ£¿éÎþÉüÁ˱ã½ÝÐÔÀ´»»È¡ËٶȺÍÄÚ´æÕ¼Óá£SAXÊÇSimple API for XMLµÄËõд£¬Ëü²¢²»ÊÇÓÉW3C¹Ù·½ËùÌá³öµÄ±ê×¼¡£ËüÊÇʼþÇý¶¯µÄ£¬²¢²»ÐèÒªÒ»´ÎÐÔ¶ÁÈëÕû¸öÎĵµ£¬¶øÎĵµµÄ¶ÁÈë¹ý³ÌÒ²¾ÍÊÇSAXµÄ½âÎö¹ý³Ì¡£ËùνʼþÇý¶¯£¬ÊÇÖ¸Ò»ÖÖ»ùÓڻص÷£¨callback£©»úÖÆµÄ³ÌÐòÔËÐз½·¨¡£

xml.parser.expat

xml.parser.expatÌṩÁ˶ÔCÓïÑÔ±àдµÄexpat½âÎöÆ÷µÄÒ»¸öÖ±½ÓµÄ¡¢µ×²ãAPI½Ó¿Ú¡£expat½Ó¿ÚÓëSAXÀàËÆ£¬Ò²ÊÇ»ùÓÚʼþ»Øµ÷»úÖÆ£¬µ«ÊÇÕâ¸ö½Ó¿Ú²¢²»ÊDZê×¼»¯µÄ£¬Ö»ÊÊÓÃÓÚexpat¿â¡£

expatÊÇÒ»¸öÃæÏòÁ÷µÄ½âÎöÆ÷¡£Äú×¢²áµÄ½âÎöÆ÷»Øµ÷£¨»òhandler£©¹¦ÄÜ£¬È»ºó¿ªÊ¼ËÑË÷ËüµÄÎĵµ¡£µ±½âÎöÆ÷ʶ±ð¸ÃÎļþµÄÖ¸¶¨µÄλÖã¬Ëü»áµ÷Óøò¿·ÖÏàÓ¦µÄ´¦Àí³ÌÐò£¨Èç¹ûÄúÒѾ­×¢²áµÄÒ»¸ö£©¡£¸ÃÎļþ±»ÊäË͵½½âÎöÆ÷£¬»á±»·Ö¸î³É¶à¸öƬ¶Ï£¬²¢·Ö¶Î×°µ½ÄÚ´æÖС£Òò´Ëexpat¿ÉÒÔ½âÎöÄÇЩ¾Þ´óµÄÎļþ¡£

xml.etree.ElementTree£¨ÒÔϼò³ÆET£©

xml.etree.ElementTreeÄ£¿éÌṩÁËÒ»¸öÇáÁ¿¼¶¡¢PythonicµÄAPI£¬Í¬Ê±»¹ÓÐÒ»¸ö¸ßЧµÄCÓïÑÔʵÏÖ£¬¼´xml.etree.cElementTree¡£ÓëDOMÏà±È£¬ETµÄËٶȸü¿ì£¬APIʹÓøüÖ±½Ó¡¢·½±ã¡£ÓëSAXÏà±È£¬ET.iterparseº¯ÊýͬÑùÌṩÁ˰´Ðè½âÎöµÄ¹¦ÄÜ£¬²»»áÒ»´ÎÐÔÔÚÄÚ´æÖжÁÈëÕû¸öÎĵµ¡£ETµÄÐÔÄÜÓëSAXÄ£¿é´óÖÂÏà·Â£¬µ«ÊÇËüµÄAPI¸ü¼Ó¸ß²ã´Î£¬Óû§Ê¹ÓÃÆðÀ´¸ü¼Ó±ã½Ý¡£

±ÊÕß½¨Ò飬ÔÚʹÓÃPython½øÐÐXML½âÎöʱ£¬Ê×ѡʹÓÃETÄ£¿é£¬³ý·ÇÄãÓÐÆäËûÌØ±ðµÄÐèÇ󣬿ÉÄÜÐèÒªÁíÍâµÄÄ£¿éÀ´Âú×ã¡£

½âÎöXMLµÄÕ⼸ÖÖAPI²¢²»ÊÇPython¶À´´µÄ£¬PythonÒ²ÊÇͨ¹ý½è¼øÆäËûÓïÑÔ»òÕßÖ±½Ó´ÓÆäËûÓïÑÔÒýÈë½øÀ´µÄ¡£ÀýÈçexpat¾ÍÊÇÒ»¸öÓÃC
ÓïÑÔ¿ª·¢µÄ¡¢ÓÃÀ´½âÎöXMLÎĵµµÄ¿ª·¢¿â¡£¶øSAX×î³õÊÇÓÉDavidMegginson²ÉÓÃjavaÓïÑÔ¿ª·¢µÄ£¬DOM¿ÉÒÔÒÔÒ»ÖÖ¶ÀÁ¢ÓÚÆ½Ì¨ºÍÓïÑÔµÄ
·½Ê½·ÃÎʺÍÐÞ¸ÄÒ»¸öÎĵµµÄÄÚÈݺͽṹ£¬¿ÉÒÔÓ¦ÓÃÓÚÈκαà³ÌÓïÑÔ¡£

ÏÂÃæ£¬ÎÒÃÇÒÔElementTreeÄ£¿éΪÀý£¬½éÉÜÔÚPythonÖÐÈçºÎ½âÎölxml¡£

ÀûÓÃElementTree½âÎöXML

Python±ê×¼¿âÖУ¬ÌṩÁËETµÄÁ½ÖÖʵÏÖ¡£Ò»¸öÊÇ´¿PythonʵÏÖµÄxml.etree.ElementTree£¬ÁíÒ»¸öÊÇËٶȸü¿ìµÄCÓïÑÔʵÏÖxml.etree.cElementTree¡£Çë¼ÇסʼÖÕʹÓÃCÓïÑÔʵÏÖ£¬ÒòΪËüµÄËÙ¶ÈÒª¿ìºÜ¶à£¬¶øÇÒÄÚ´æÏûºÄÒ²ÒªÉٺܶࡣÈç¹ûÄãËùʹÓõÄPython°æ±¾ÖÐûÓÐcElementTreeËùÐèµÄ¼ÓËÙÄ£¿é£¬Äã¿ÉÒÔÕâÑùµ¼ÈëÄ£¿é£º

try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET

Èç¹ûij¸öAPI´æÔÚ²»Í¬µÄʵÏÖ£¬ÉÏÃæÊdz£¼ûµÄµ¼È뷽ʽ¡£µ±È»£¬ºÜ¿ÉÄÜÄãÖ±½Óµ¼ÈëµÚÒ»¸öÄ£¿éʱ£¬²¢²»»á³öÏÖÎÊÌâ¡£Çë×¢Ò⣬×ÔPython 3.3Ö®ºó£¬¾Í²»ÓòÉÓÃÉÏÃæµÄµ¼Èë·½·¨£¬ÒòΪElemenTreeÄ£¿é»á×Ô¶¯ÓÅÏÈʹÓÃC¼ÓËÙÆ÷£¬Èç¹û²»´æÔÚCʵÏÖ£¬Ôò»áʹÓÃPythonʵÏÖ¡£Òò´Ë£¬Ê¹ÓÃPython 3.3+µÄÅóÓÑ£¬Ö»ÐèÒªimport xml.etree.ElementTree¼´¿É¡£

½«XMLÎĵµ½âÎöΪÊ÷£¨tree£©

ÎÒÃÇÏÈ´Ó»ù´¡½²Æð¡£XMLÊÇÒ»Öֽṹ»¯¡¢²ã¼¶»¯µÄÊý¾Ý¸ñʽ£¬×îÊʺÏÌåÏÖXMLµÄÊý¾Ý½á¹¹¾ÍÊÇÊ÷¡£ETÌṩÁËÁ½¸ö¶ÔÏó£ºElementTree½«Õû¸öXMLÎĵµ×ª»¯ÎªÊ÷£¬ElementÔò´ú±í×ÅÊ÷Éϵĵ¥¸ö½Úµã¡£¶ÔÕû¸öXMLÎĵµµÄ½»»¥£¨¶ÁÈ¡£¬Ð´È룬²éÕÒÐèÒªµÄÔªËØ£©£¬Ò»°ãÊÇÔÚElementTree²ãÃæ½øÐеġ£¶Ôµ¥¸öXMLÔªËØ¼°Æä×ÓÔªËØ£¬ÔòÊÇÔÚElement²ãÃæ½øÐеġ£ÏÂÃæÎÒÃǾÙÀý½éÉÜÖ÷ҪʹÓ÷½·¨¡£

ÎÒÃÇʹÓÃÏÂÃæµÄXMLÎĵµ£¬×÷ΪÑÝʾÊý¾Ý£º

<?xml version="1.0"?>
<doc>
<branch name="codingpy.com" hash="1cdf045c">
text,source
</branch>
<branch name="release01" hash="f200013e">
<sub-branch name="subrelease01">
xml,sgml
</sub-branch>
</branch>
<branch name="invalid">
</branch>
</doc>

½ÓÏÂÀ´£¬ÎÒÃǼÓÔØÕâ¸öÎĵµ£¬²¢½øÐнâÎö£º

>>> import xml.etree.ElementTree as ET
>>> tree = ET.ElementTree(file='doc1.xml')

È»ºó£¬ÎÒÃÇ»ñÈ¡¸ùÔªËØ£¨root element£©£º

>>> tree.getroot()
<Element 'doc' at 0x11eb780>

ÕýÈç֮ǰËù½²µÄ£¬¸ùÔªËØ£¨root£©ÊÇÒ»¸öElement¶ÔÏó¡£ÎÒÃÇ¿´¿´¸ùÔªËØ¶¼ÓÐÄÄЩÊôÐÔ£º

>>> root = tree.getroot()
>>> root.tag, root.attrib
('doc', {})

û´í£¬¸ùÔªËØ²¢Ã»ÓÐÊôÐÔ¡£ÓëÆäËûElement¶ÔÏóÒ»Ñù£¬¸ùÔªËØÒ²¾ß±¸±éÀúÆäÖ±½Ó×ÓÔªËØµÄ½Ó¿Ú£º

>>> for child_of_root in root:
... print child_of_root.tag, child_of_root.attrib
...
branch {'hash': '1cdf045c', 'name': 'codingpy.com'}
branch {'hash': 'f200013e', 'name': 'release01'}
branch {'name': 'invalid'}

ÎÒÃÇ»¹¿ÉÒÔͨ¹ýË÷ÒýÖµÀ´·ÃÎÊÌØ¶¨µÄ×ÓÔªËØ£º

>>> root[0].tag, root[0].text
('branch', '\n text,source\n '

²éÕÒÐèÒªµÄÔªËØ

´ÓÉÏÃæµÄʾÀýÖУ¬¿ÉÒÔÃ÷ÏÔ·¢ÏÖÎÒÃÇÄܹ»Í¨¹ý¼òµ¥µÄµÝ¹é·½·¨£¨¶Ôÿһ¸öÔªËØ£¬µÝ¹éʽ·ÃÎÊÆäËùÓÐ×ÓÔªËØ£©»ñÈ¡Ê÷ÖеÄËùÓÐÔªËØ¡£µ«ÊÇ£¬ÓÉÓÚÕâÊÇÊ®·Ö³£¼ûµÄ¹¤×÷£¬ETÌṩÁËһЩ¼ò±ãµÄʵÏÖ·½·¨¡£

Element¶ÔÏóÓÐÒ»¸öiter·½·¨£¬¿ÉÒÔ¶Ôij¸öÔªËØ¶ÔÏóÖ®ÏÂËùÓеÄ×ÓÔªËØ½øÐÐÉî¶ÈÓÅÏȱéÀú£¨DFS£©¡£ElementTree¶ÔÏóͬÑùÒ²ÓÐÕâ¸ö·½·¨¡£ÏÂÃæÊDzéÕÒXMLÎĵµÖÐËùÓÐÔªËØµÄ×î¼òµ¥·½·¨£º

>>> for elem in tree.iter():
... print elem.tag, elem.attrib
...
doc {}
branch {'hash': '1cdf045c', 'name': 'codingpy.com'}
branch {'hash': 'f200013e', 'name': 'release01'}
sub-branch {'name': 'subrelease01'}
branch {'name': 'invalid'

ÔÚ´Ë»ù´¡ÉÏ£¬ÎÒÃÇ¿ÉÒÔ¶ÔÊ÷½øÐÐÈÎÒâ±éÀú¡ª¡ª±éÀúËùÓÐÔªËØ£¬²éÕÒ³ö×Ô¼º¸ÐÐËȤµÄÊôÐÔ¡£µ«ÊÇET¿ÉÒÔÈÃÕâ¸ö¹¤×÷¸ü¼Ó¼ò±ã¡¢¿ì½Ý¡£iter·½·¨¿ÉÒÔ½ÓÊÜtagÃû³Æ£¬È»ºó±éÀúËùÓо߱¸ËùÌṩtagµÄÔªËØ£º

>>> for elem in tree.iter(tag='branch'):
... print elem.tag, elem.attrib
...
branch {'hash': '1cdf045c', 'name': 'codingpy.com'}
branch {'hash': 'f200013e', 'name': 'release01'}
branch {'name': 'invalid'}

Ö§³Öͨ¹ýXPath²éÕÒÔªËØ

ʹÓÃXPath²éÕÒ¸ÐÐËȤµÄÔªËØ£¬¸ü¼Ó·½±ã¡£Element¶ÔÏóÖÐÓÐһЩfind·½·¨¿ÉÒÔ½ÓÊÜXpath·¾¶×÷Ϊ²ÎÊý£¬find·½·¨»á·µ»ØµÚÒ»¸öÆ¥ÅäµÄ×ÓÔªËØ£¬findallÒÔÁбíµÄÐÎʽ·µ»ØËùÓÐÆ¥ÅäµÄ×ÓÔªËØ, iterfindÔò·µ»ØÒ»¸öËùÓÐÆ¥ÅäÔªËØµÄµü´úÆ÷£¨iterator£©¡£ElementTree¶ÔÏóÒ²¾ß±¸ÕâЩ·½·¨£¬ÏàÓ¦µØËüµÄ²éÕÒÊÇ´Ó¸ù½Úµã¿ªÊ¼µÄ¡£

ÏÂÃæÊÇÒ»¸öʹÓÃXPath²éÕÒÔªËØµÄʾÀý£º

>>> for elem in tree.iterfind('branch/sub-branch'):
... print elem.tag, elem.attrib
...
sub-branch {'name': 'subrelease01'}

ÉÏÃæµÄ´úÂë·µ»ØÁËbranchÔªËØÖ®ÏÂËùÓÐtagΪsub-branchµÄÔªËØ¡£½ÓÏÂÀ´²éÕÒËùÓо߱¸Ä³¸önameÊôÐÔµÄbranchÔªËØ£º

>>> for elem in tree.iterfind('branch[@name="release01"]'):
... print elem.tag, elem.attrib
...
branch {'hash': 'f200013e', 'name': 'release01'}

¹¹½¨XMLÎĵµ

ÀûÓÃET£¬ºÜÈÝÒ׾ͿÉÒÔÍê³ÉXMLÎĵµ¹¹½¨£¬²¢Ð´Èë±£´æÎªÎļþ¡£ElementTree¶ÔÏóµÄwrite·½·¨¾Í¿ÉÒÔʵÏÖÕâ¸öÐèÇó¡£

Ò»°ãÀ´Ëµ£¬ÓÐÁ½ÖÖÖ÷ҪʹÓó¡¾°¡£Ò»ÊÇÄãÏȶÁȡһ¸öXMLÎĵµ£¬½øÐÐÐ޸ģ¬È»ºóÔÙ½«ÐÞ¸ÄдÈëÎĵµ£¬¶þÊÇ´ÓÍ·´´½¨Ò»¸öÐÂXMLÎĵµ¡£

ÐÞ¸ÄÎĵµµÄ»°£¬¿ÉÒÔͨ¹ýµ÷ÕûElement¶ÔÏóÀ´ÊµÏÖ¡£Çë¿´ÏÂÃæµÄÀý×Ó£º

>>> root = tree.getroot()
>>> del root[2]
>>> root[0].set('foo', 'bar')
>>> for subelem in root:
... print subelem.tag, subelem.attrib
...
branch {'foo': 'bar', 'hash': '1cdf045c', 'name': 'codingpy.com'}
branch {'hash': 'f200013e', 'name': 'release01'}

ÔÚÉÏÃæµÄ´úÂëÖУ¬ÎÒÃÇɾ³ýÁËrootÔªËØµÄµÚÈý¸ö×ÓÔªËØ£¬ÎªµÚÒ»¸ö×ÓÔªËØÔö¼ÓÁËÐÂÊôÐÔ¡£Õâ¸öÊ÷¿ÉÒÔÖØÐÂдÈëÖÁÎļþÖС£×îÖÕµÄXMLÎĵµÓ¦¸ÃÊÇÏÂÃæÕâÑùµÄ£º

>>> import sys
>>> tree.write(sys.stdout)
<doc>
<branch foo="bar" hash="1cdf045c" name="codingpy.com">
text,source
</branch>
<branch hash="f200013e" name="release01">
<sub-branch name="subrelease01">
xml,sgml
</sub-branch>
</branch>
</doc>

Çë×¢Ò⣬ÎĵµÖÐÔªËØµÄÊôÐÔ˳ÐòÓëÔ­Îĵµ²»Í¬¡£ÕâÊÇÒòΪETÊÇÒÔ×ÖµäµÄÐÎʽ±£´æÊôÐԵ쬶ø×ÖµäÊÇÒ»¸öÎÞÐòµÄÊý¾Ý½á¹¹¡£µ±È»£¬XMLÒ²²»¹Ø×¢ÊôÐÔµÄ˳Ðò¡£

´ÓÍ·¹¹½¨Ò»¸öÍêÕûµÄÎĵµÒ²ºÜÈÝÒס£ETÄ£¿éÌṩÁËÒ»¸öSubElement¹¤³§º¯Êý£¬Èô´½¨ÔªËصĹý³Ì±äµÃºÜ¼òµ¥£º

>>> a = ET.Element('elem')
>>> c = ET.SubElement(a, 'child1')
>>> c.text = "some text"
>>> d = ET.SubElement(a, 'child2')
>>> b = ET.Element('elem_b')
>>> root = ET.Element('root')
>>> root.extend((a, b))
>>> tree = ET.ElementTree(root)
>>> tree.write(sys.stdout)
<root><elem><child1>some text</child1><child2 /></elem><elem_b /></root>

ÀûÓÃiterparse½âÎöXMLÁ÷

XMLÎĵµÍ¨³£¶¼»á±È½Ï´ó£¬ÈçºÎÖ±½Ó½«Îĵµ¶ÁÈëÄÚ´æµÄ»°£¬ÄÇô½øÐнâÎöʱ¾Í»á³öÏÖÎÊÌâ¡£ÕâÒ²¾ÍÊÇΪʲô²»½¨ÒéʹÓÃDOM£¬¶øÊÇSAX APIµÄÀíÓÉÖ®Ò»¡£

ÎÒÃÇÉÏÃæÌ¸µ½£¬ET¿ÉÒÔ½«XMLÎĵµ¼ÓÔØÎª±£´æÔÚÄÚ´æÀïµÄÊ÷£¨in-memory tree£©£¬È»ºóÔÙ½øÐд¦Àí¡£µ«ÊÇÔÚ½âÎö´óÎļþʱ£¬ÕâÓ¦¸ÃÒ²»á³öÏÖºÍDOMÒ»ÑùµÄÄÚ´æÏûºÄ´óµÄÎÊÌâ°É£¿Ã»´í£¬µÄÈ·ÓÐÕâ¸öÎÊÌ⡣ΪÁ˽â¾öÕâ¸öÎÊÌ⣬ETÌṩÁËÒ»¸öÀàËÆSAXµÄÌØÊ⹤¾ß¡ª¡ªiterparse£¬¿ÉÒÔÑ­ÐòµØ½âÎöXML¡£

½ÓÏÂÀ´£¬±ÊÕßΪ´ó¼ÒչʾÈçºÎʹÓÃiterparse£¬²¢Óë±ê×¼µÄÊ÷½âÎö·½Ê½½øÐжԱȡ£ÎÒÃÇʹÓÃÒ»¸ö×Ô¶¯Éú³ÉµÄXMLÎĵµ£¬ÏÂÃæÊǸÃÎĵµµÄ¿ªÍ·²¿·Ö£º

<?xml version="1.0" standalone="yes"?>
<site>
<regions>
<africa>
<item id="item0">
<location>United States</location> <!-- Counting locations -->
<quantity>1</quantity>
<name>duteous nine eighteen </name>
<payment>Creditcard</payment>
<description>
<parlist>
[...]

ÎÒÃÇÀ´Í³¼ÆÒ»ÏÂÎĵµÖгöÏÖÁ˶àÉÙ¸öÎı¾ÖµÎªZimbabweµÄlocationÔªËØ¡£ÏÂÃæÊÇʹÓÃET.parseµÄ±ê×¼·½·¨£º

tree = ET.parse(sys.argv[2])

count = 0
for elem in tree.iter(tag='location'):
if elem.text == 'Zimbabwe':
count += 1

print count

ÉÏÃæµÄ´úÂë»á½«È«²¿ÔªËØÔØÈëÄڴ棬ÖðÒ»½âÎö¡£µ±½âÎöÒ»¸öÔ¼100MBµÄXMLÎĵµÊ±£¬ÔËÐÐÉÏÃæ½Å±¾µÄPython½ø³ÌµÄÄÚ´æÊ¹Ó÷åֵΪԼ560MB£¬×ÜÔËÐÐʱ¼äÎÊ2.9Ãë¡£

Çë×¢Ò⣬ÎÒÃÇÆäʵ²»ÐèÒª½²Õû¸öÊ÷¼ÓÔØµ½ÄÚ´æÀï¡£Ö»Òª¼ì²â³öÎı¾ÎªÏàÓ¦ÖµµÃlocationÔªËØ¼´¿É¡£ÆäËûÊý¾Ý¶¼¿ÉÒÔ·ÏÆú¡£Õâʱ£¬ÎÒÃǾͿÉÒÔÓÃÉÏiterparse·½·¨ÁË£º

count = 0
for event, elem in ET.iterparse(sys.argv[2]):
if event == 'end':
if elem.tag == 'location' and elem.text == 'Zimbabwe':
count += 1
elem.clear() # ½«ÔªËØ·ÏÆú

print count

ÉÏÃæµÄforÑ­»·»á±éÀúiterparseʼþ£¬Ê×Ïȼì²éʼþÊÇ·ñΪend£¬È»ºóÅжÏÔªËØµÄtagÊÇ·ñΪlocation£¬ÒÔ¼°ÆäÎı¾ÖµÊÇ·ñ·ûºÏÄ¿±êÖµ¡£ÁíÍ⣬µ÷ÓÃelem.clear()·Ç³£¹Ø¼ü£ºÒòΪiterparseÈÔÈ»»áÉú³ÉÒ»¸öÊ÷£¬Ö»ÊÇÑ­ÐòÉú³ÉµÄ¶øÒÑ¡£·ÏÆúµô²»ÐèÒªµÄÔªËØ£¬¾ÍÏ൱ÓÚ·ÏÆúÁËÕû¸öÊ÷£¬Êͷųöϵͳ·ÖÅäµÄÄÚ´æ¡£

µ±ÀûÓÃÉÏÃæÕâ¸ö½Å±¾½âÎöͬһ¸öÎļþʱ£¬ÄÚ´æÊ¹Ó÷åÖµÖ»ÓÐ7MB£¬ÔËÐÐʱ¼äΪ2.5Ãë¡£ËÙ¶ÈÌáÉýµÄÔ­Òò£¬ÊÇÎÒÃÇÕâÀïÖ»ÔÚÊ÷±»¹¹½¨Ê±£¬±éÀúÒ»´Î¡£¶øÊ¹ÓÃparseµÄ±ê×¼·½·¨ÊÇÏÈÍê³ÉÕû¸öÊ÷µÄ¹¹½¨ºó£¬²ÅÔٴαéÀú²éÕÒËùÐèÒªµÄÔªËØ¡£

iterparseµÄÐÔÄÜÓëSAXÏ൱£¬µ«ÊÇÆäAPIÈ´¸ü¼ÓÓÐÓãºiterparse»áÑ­ÐòµØ¹¹½¨Ê÷£»¶øÀûÓÃSAXʱ£¬Ä㻹µÃ×Ô¼ºÍê³ÉÊ÷µÄ¹¹½¨¹¤×÷¡£

   
3115 ´Îä¯ÀÀ       27
 
Ïà¹ØÎÄÕÂ

ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖÓë̽ÌÖ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
 
Ïà¹ØÎĵµ

Android_UI¹Ù·½Éè¼Æ½Ì³Ì
ÊÖ»ú¿ª·¢Æ½Ì¨½éÉÜ
androidÅÄÕÕ¼°ÉÏ´«¹¦ÄÜ
Android½²ÒåÖÇÄÜÊÖ»ú¿ª·¢
Ïà¹Ø¿Î³Ì

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
Androidϵͳ¿ª·¢
AndroidÓ¦Óÿª·¢
ÊÖ»úÈí¼þ²âÊÔ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

androidÈË»ú½çÃæÖ¸ÄÏ
AndroidÊÖ»ú¿ª·¢£¨Ò»£©
AndroidÊÖ»ú¿ª·¢£¨¶þ£©
AndroidÊÖ»ú¿ª·¢£¨Èý£©
AndroidÊÖ»ú¿ª·¢£¨ËÄ£©
iPhoneÏûÏ¢ÍÆËÍ»úÖÆÊµÏÖ̽ÌÖ
ÊÖ»úÈí¼þ²âÊÔÓÃÀýÉè¼ÆÊµ¼ù
ÊÖ»ú¿Í»§¶ËUI²âÊÔ·ÖÎö
ÊÖ»úÈí¼þ×Ô¶¯»¯²âÊÔÑо¿±¨¸æ

Android¸ß¼¶Òƶ¯Ó¦ÓóÌÐò
AndroidÓ¦Óÿª·¢
Androidϵͳ¿ª·¢
ÊÖ»úÈí¼þ²âÊÔ
ǶÈëʽÈí¼þ²âÊÔ
AndroidÈí¡¢Ó²¡¢ÔÆÕûºÏ

ÁìÏÈIT¹«Ë¾ android¿ª·¢Æ½Ì¨×î¼Ñʵ¼ù
±±¾© Android¿ª·¢¼¼Êõ½ø½×
ijÐÂÄÜÔ´ÁìÓòÆóÒµ Android¿ª·¢¼¼Êõ
ijº½Ì칫˾ Android¡¢IOSÓ¦ÓÃÈí¼þ¿ª·¢
°¢¶û¿¨ÌØ LinuxÄÚºËÇý¶¯
°¬Ä¬Éú ǶÈëʽÈí¼þ¼Ü¹¹Éè¼Æ
Î÷ÃÅ×Ó Ç¶Èëʽ¼Ü¹¹Éè¼Æ