ÔÚXML½âÎö·½Ã棬Python¹á³¹ÁË×Ô¼º¡°¿ªÏä¼´Óá±£¨batteries
included£©µÄÔÔò¡£ÔÚ×Ô´øµÄ±ê×¼¿âÖУ¬PythonÌṩÁË´óÁ¿¿ÉÒÔÓÃÓÚ´¦ÀíXMLÓïÑԵİüºÍ¹¤¾ß£¬ÊýÁ¿Ö®¶à£¬ÉõÖÁÈÃPython±à³ÌÐÂÊÖÎÞ´ÓÑ¡Ôñ¡£
±¾ÎĽ«½éÉÜÉîÈë½â¶ÁÀûÓÃPythonÓïÑÔ½âÎöXMLÎļþµÄ¼¸ÖÖ·½Ê½£¬²¢ÒÔ±ÊÕßÍÆ¼öʹÓõÄElementTreeÄ£¿éΪÀý£¬ÑÝʾ¾ßÌåʹÓ÷½·¨ºÍ³¡¾°¡£ÎÄÖÐËùʹÓõÄPython°æ±¾Îª2.7¡£
ʲôÊÇXML?
XMLÊÇ¿ÉÀ©Õ¹±ê¼ÇÓïÑÔ£¨Extensible Markup Language£©µÄËõд£¬ÆäÖÐµÄ ±ê¼Ç£¨markup£©Êǹؼü²¿·Ö¡£Äú¿ÉÒÔ´´½¨ÄÚÈÝ£¬È»ºóʹÓÃÏÞ¶¨±ê¼Ç±ê¼ÇËü£¬´Ó¶øÊ¹Ã¿¸öµ¥´Ê¡¢¶ÌÓï»ò¿é³ÉΪ¿Éʶ±ð¡¢¿É·ÖÀàµÄÐÅÏ¢¡£

±ê¼ÇÓïÑÔ´ÓÔçÆÚµÄ˽Óй«Ë¾ºÍÕþ¸®Öƶ¨ÐÎʽÖð½¥Ñݱä³É±ê׼ͨÓñê¼ÇÓïÑÔ£¨Standard
Generalized Markup Language£¬SGML£©¡¢³¬Îı¾±ê¼ÇÓïÑÔ£¨Hypertext
Markup Language£¬HTML£©£¬²¢ÇÒ×îÖÕÑݱä³É XML¡£XMLÓÐÒÔϼ¸¸öÌØµã:
XMLµÄÉè¼Æ×ÚÖ¼ÊÇ´«ÊäÊý¾Ý£¬¶ø·ÇÏÔʾÊý¾Ý¡£
XML±êǩûÓб»Ô¤¶¨Òå¡£ÄúÐèÒª×ÔÐж¨Òå±êÇ©¡£
XML±»Éè¼ÆÎª¾ßÓÐ×ÔÎÒÃèÊöÐÔ¡£
XMLÊÇW3CµÄÍÆ¼ö±ê×¼¡£
Ŀǰ£¬XMLÔÚWebÖÐÆðµ½µÄ×÷Óò»»áÑÇÓÚÒ»Ö±×÷ΪWeb»ùʯµÄHTML¡£ XMLÎÞËù²»ÔÚ¡£XMLÊǸ÷ÖÖÓ¦ÓóÌÐòÖ®¼ä½øÐÐÊý¾Ý´«ÊäµÄ×î³£ÓõŤ¾ß£¬²¢ÇÒÔÚÐÅÏ¢´æ´¢ºÍÃèÊöÁìÓò±äµÃÔ½À´Ô½Á÷ÐС£Òò´Ë£¬Ñ§»áÈçºÎ½âÎöXMLÎļþ£¬¶ÔÓÚWeb¿ª·¢À´ËµÊÇÊ®·ÖÖØÒªµÄ¡£

ÓÐÄÄЩ¿ÉÒÔ½âÎöXMLµÄPython°ü£¿
PythonµÄ±ê×¼¿âÖУ¬ÌṩÁË6ÖÖ¿ÉÒÔÓÃÓÚ´¦ÀíXMLµÄ°ü¡£
xml.dom
xml.domʵÏÖµÄÊÇW3CÖÆ¶¨µÄDOM API¡£Èç¹ûÄãϰ¹ßÓÚʹÓÃDOM
API»òÕßÓÐÈËÒªÇóÕâÕâÑù×ö£¬¿ÉÒÔʹÓÃÕâ¸ö°ü¡£²»¹ýҪעÒ⣬ÔÚÕâ¸ö°üÖУ¬»¹ÌṩÁ˼¸¸ö²»Í¬µÄÄ£¿é£¬¸÷×ÔµÄÐÔÄÜÓÐËùÇø±ð¡£

DOM½âÎöÆ÷ÔÚÈκδ¦Àí¿ªÊ¼Ö®Ç°£¬±ØÐë°Ñ»ùÓÚXMLÎļþÉú³ÉµÄÊ÷×´Êý¾Ý·ÅÔÚÄڴ棬ËùÒÔDOM½âÎöÆ÷µÄÄÚ´æÊ¹ÓÃÁ¿ÍêÈ«¸ù¾ÝÊäÈë×ÊÁϵĴóС¡£
xml.dom.minidom
xml.dom.minidomÊÇDOM APIµÄ¼«¼ò»¯ÊµÏÖ£¬±ÈÍêÕû°æµÄDOMÒª¼òµ¥µÄ¶à£¬¶øÇÒÕâ¸ö°üҲСµÄ¶à¡£ÄÇЩ²»ÊìϤDOMµÄÅóÓÑ£¬Ó¦¸Ã¿¼ÂÇʹÓÃxml.etree.ElementTreeÄ£¿é¡£¾ÝlxmlµÄ×÷Õ߯À¼Û£¬Õâ¸öÄ£¿éʹÓÃÆðÀ´²¢²»·½±ã£¬Ð§ÂÊÒ²²»¸ß£¬¶øÇÒ»¹ÈÝÒ׳öÏÖÎÊÌâ¡£
xml.dom.pulldom
ÓëÆäËûÄ£¿é²»Í¬£¬xml.dom.pulldomÄ£¿éÌṩµÄÊÇÒ»¸ö¡°pull½âÎöÆ÷¡±£¬Æä±³ºóµÄ»ù±¾¸ÅÄîÖ¸µÄÊÇ´ÓXMLÁ÷ÖÐpullʼþ£¬È»ºó½øÐд¦Àí¡£ËäÈ»ÓëSAXÒ»Ñù²ÉÓÃʼþÇý¶¯Ä£ÐÍ£¨event-driven
processing model£©£¬µ«ÊDz»Í¬µÄÊÇ£¬Ê¹ÓÃpull½âÎöÆ÷ʱ£¬Ê¹ÓÃÕßÐèÒªÃ÷È·µØ´ÓXMLÁ÷ÖÐpullʼþ£¬²¢¶ÔÕâЩʼþ±éÀú´¦Àí£¬Ö±µ½´¦ÀíÍê³É»òÕß³öÏÖ´íÎó¡£
pull½âÎö£¨pull parsing£©ÊǽüÀ´ÐËÆðµÄÒ»ÖÖXML´¦ÀíÇ÷ÊÆ¡£´ËǰÖîÈçSAXºÍDOMÕâЩÁ÷ÐеÄXML½âÎö¿ò¼Ü£¬¶¼ÊÇpush-based£¬Ò²¾ÍÊÇ˵¶Ô½âÎö¹¤×÷µÄ¿ØÖÆÈ¨£¬ÕÆÎÕÔÚ½âÎöÆ÷µÄÊÖÖС£
xml.sax

xml.saxÄ£¿éʵÏÖµÄÊÇSAX API£¬Õâ¸öÄ£¿éÎþÉüÁ˱ã½ÝÐÔÀ´»»È¡ËٶȺÍÄÚ´æÕ¼Óá£SAXÊÇSimple
API for XMLµÄËõд£¬Ëü²¢²»ÊÇÓÉW3C¹Ù·½ËùÌá³öµÄ±ê×¼¡£ËüÊÇʼþÇý¶¯µÄ£¬²¢²»ÐèÒªÒ»´ÎÐÔ¶ÁÈëÕû¸öÎĵµ£¬¶øÎĵµµÄ¶ÁÈë¹ý³ÌÒ²¾ÍÊÇSAXµÄ½âÎö¹ý³Ì¡£ËùνʼþÇý¶¯£¬ÊÇÖ¸Ò»ÖÖ»ùÓڻص÷£¨callback£©»úÖÆµÄ³ÌÐòÔËÐз½·¨¡£
xml.parser.expat
xml.parser.expatÌṩÁ˶ÔCÓïÑÔ±àдµÄexpat½âÎöÆ÷µÄÒ»¸öÖ±½ÓµÄ¡¢µ×²ãAPI½Ó¿Ú¡£expat½Ó¿ÚÓëSAXÀàËÆ£¬Ò²ÊÇ»ùÓÚʼþ»Øµ÷»úÖÆ£¬µ«ÊÇÕâ¸ö½Ó¿Ú²¢²»ÊDZê×¼»¯µÄ£¬Ö»ÊÊÓÃÓÚexpat¿â¡£
expatÊÇÒ»¸öÃæÏòÁ÷µÄ½âÎöÆ÷¡£Äú×¢²áµÄ½âÎöÆ÷»Øµ÷£¨»òhandler£©¹¦ÄÜ£¬È»ºó¿ªÊ¼ËÑË÷ËüµÄÎĵµ¡£µ±½âÎöÆ÷ʶ±ð¸ÃÎļþµÄÖ¸¶¨µÄλÖã¬Ëü»áµ÷Óøò¿·ÖÏàÓ¦µÄ´¦Àí³ÌÐò£¨Èç¹ûÄúÒѾע²áµÄÒ»¸ö£©¡£¸ÃÎļþ±»ÊäË͵½½âÎöÆ÷£¬»á±»·Ö¸î³É¶à¸öƬ¶Ï£¬²¢·Ö¶Î×°µ½ÄÚ´æÖС£Òò´Ëexpat¿ÉÒÔ½âÎöÄÇЩ¾Þ´óµÄÎļþ¡£
xml.etree.ElementTree£¨ÒÔϼò³ÆET£©

xml.etree.ElementTreeÄ£¿éÌṩÁËÒ»¸öÇáÁ¿¼¶¡¢PythonicµÄAPI£¬Í¬Ê±»¹ÓÐÒ»¸ö¸ßЧµÄCÓïÑÔʵÏÖ£¬¼´xml.etree.cElementTree¡£ÓëDOMÏà±È£¬ETµÄËٶȸü¿ì£¬APIʹÓøüÖ±½Ó¡¢·½±ã¡£ÓëSAXÏà±È£¬ET.iterparseº¯ÊýͬÑùÌṩÁ˰´Ðè½âÎöµÄ¹¦ÄÜ£¬²»»áÒ»´ÎÐÔÔÚÄÚ´æÖжÁÈëÕû¸öÎĵµ¡£ETµÄÐÔÄÜÓëSAXÄ£¿é´óÖÂÏà·Â£¬µ«ÊÇËüµÄAPI¸ü¼Ó¸ß²ã´Î£¬Óû§Ê¹ÓÃÆðÀ´¸ü¼Ó±ã½Ý¡£
±ÊÕß½¨Ò飬ÔÚʹÓÃPython½øÐÐXML½âÎöʱ£¬Ê×ѡʹÓÃETÄ£¿é£¬³ý·ÇÄãÓÐÆäËûÌØ±ðµÄÐèÇ󣬿ÉÄÜÐèÒªÁíÍâµÄÄ£¿éÀ´Âú×ã¡£
½âÎöXMLµÄÕ⼸ÖÖAPI²¢²»ÊÇPython¶À´´µÄ£¬PythonÒ²ÊÇͨ¹ý½è¼øÆäËûÓïÑÔ»òÕßÖ±½Ó´ÓÆäËûÓïÑÔÒýÈë½øÀ´µÄ¡£ÀýÈçexpat¾ÍÊÇÒ»¸öÓÃC
ÓïÑÔ¿ª·¢µÄ¡¢ÓÃÀ´½âÎöXMLÎĵµµÄ¿ª·¢¿â¡£¶øSAX×î³õÊÇÓÉDavidMegginson²ÉÓÃjavaÓïÑÔ¿ª·¢µÄ£¬DOM¿ÉÒÔÒÔÒ»ÖÖ¶ÀÁ¢ÓÚÆ½Ì¨ºÍÓïÑÔµÄ
·½Ê½·ÃÎʺÍÐÞ¸ÄÒ»¸öÎĵµµÄÄÚÈݺͽṹ£¬¿ÉÒÔÓ¦ÓÃÓÚÈκαà³ÌÓïÑÔ¡£
|
ÏÂÃæ£¬ÎÒÃÇÒÔElementTreeÄ£¿éΪÀý£¬½éÉÜÔÚPythonÖÐÈçºÎ½âÎölxml¡£
ÀûÓÃElementTree½âÎöXML
Python±ê×¼¿âÖУ¬ÌṩÁËETµÄÁ½ÖÖʵÏÖ¡£Ò»¸öÊÇ´¿PythonʵÏÖµÄxml.etree.ElementTree£¬ÁíÒ»¸öÊÇËٶȸü¿ìµÄCÓïÑÔʵÏÖxml.etree.cElementTree¡£Çë¼ÇסʼÖÕʹÓÃCÓïÑÔʵÏÖ£¬ÒòΪËüµÄËÙ¶ÈÒª¿ìºÜ¶à£¬¶øÇÒÄÚ´æÏûºÄÒ²ÒªÉٺܶࡣÈç¹ûÄãËùʹÓõÄPython°æ±¾ÖÐûÓÐcElementTreeËùÐèµÄ¼ÓËÙÄ£¿é£¬Äã¿ÉÒÔÕâÑùµ¼ÈëÄ£¿é£º
try: import xml.etree.cElementTree as ET except ImportError: import xml.etree.ElementTree as ET |
Èç¹ûij¸öAPI´æÔÚ²»Í¬µÄʵÏÖ£¬ÉÏÃæÊdz£¼ûµÄµ¼È뷽ʽ¡£µ±È»£¬ºÜ¿ÉÄÜÄãÖ±½Óµ¼ÈëµÚÒ»¸öÄ£¿éʱ£¬²¢²»»á³öÏÖÎÊÌâ¡£Çë×¢Ò⣬×ÔPython
3.3Ö®ºó£¬¾Í²»ÓòÉÓÃÉÏÃæµÄµ¼Èë·½·¨£¬ÒòΪElemenTreeÄ£¿é»á×Ô¶¯ÓÅÏÈʹÓÃC¼ÓËÙÆ÷£¬Èç¹û²»´æÔÚCʵÏÖ£¬Ôò»áʹÓÃPythonʵÏÖ¡£Òò´Ë£¬Ê¹ÓÃPython
3.3+µÄÅóÓÑ£¬Ö»ÐèÒªimport xml.etree.ElementTree¼´¿É¡£
½«XMLÎĵµ½âÎöΪÊ÷£¨tree£©
ÎÒÃÇÏÈ´Ó»ù´¡½²Æð¡£XMLÊÇÒ»Öֽṹ»¯¡¢²ã¼¶»¯µÄÊý¾Ý¸ñʽ£¬×îÊʺÏÌåÏÖXMLµÄÊý¾Ý½á¹¹¾ÍÊÇÊ÷¡£ETÌṩÁËÁ½¸ö¶ÔÏó£ºElementTree½«Õû¸öXMLÎĵµ×ª»¯ÎªÊ÷£¬ElementÔò´ú±í×ÅÊ÷Éϵĵ¥¸ö½Úµã¡£¶ÔÕû¸öXMLÎĵµµÄ½»»¥£¨¶ÁÈ¡£¬Ð´È룬²éÕÒÐèÒªµÄÔªËØ£©£¬Ò»°ãÊÇÔÚElementTree²ãÃæ½øÐеġ£¶Ôµ¥¸öXMLÔªËØ¼°Æä×ÓÔªËØ£¬ÔòÊÇÔÚElement²ãÃæ½øÐеġ£ÏÂÃæÎÒÃǾÙÀý½éÉÜÖ÷ҪʹÓ÷½·¨¡£
ÎÒÃÇʹÓÃÏÂÃæµÄXMLÎĵµ£¬×÷ΪÑÝʾÊý¾Ý£º
<?xml version="1.0"?> <doc> <branch name="codingpy.com" hash="1cdf045c"> text,source </branch> <branch name="release01" hash="f200013e"> <sub-branch name="subrelease01"> xml,sgml </sub-branch> </branch> <branch name="invalid"> </branch> </doc> |
½ÓÏÂÀ´£¬ÎÒÃǼÓÔØÕâ¸öÎĵµ£¬²¢½øÐнâÎö£º
>>> import xml.etree.ElementTree as ET >>> tree = ET.ElementTree(file='doc1.xml') |
È»ºó£¬ÎÒÃÇ»ñÈ¡¸ùÔªËØ£¨root element£©£º
>>> tree.getroot() <Element 'doc' at 0x11eb780> |
ÕýÈç֮ǰËù½²µÄ£¬¸ùÔªËØ£¨root£©ÊÇÒ»¸öElement¶ÔÏó¡£ÎÒÃÇ¿´¿´¸ùÔªËØ¶¼ÓÐÄÄЩÊôÐÔ£º
>>> root = tree.getroot() >>> root.tag, root.attrib ('doc', {}) |
û´í£¬¸ùÔªËØ²¢Ã»ÓÐÊôÐÔ¡£ÓëÆäËûElement¶ÔÏóÒ»Ñù£¬¸ùÔªËØÒ²¾ß±¸±éÀúÆäÖ±½Ó×ÓÔªËØµÄ½Ó¿Ú£º
>>> for child_of_root in root: ... print child_of_root.tag, child_of_root.attrib ... branch {'hash': '1cdf045c', 'name': 'codingpy.com'} branch {'hash': 'f200013e', 'name': 'release01'} branch {'name': 'invalid'} |
ÎÒÃÇ»¹¿ÉÒÔͨ¹ýË÷ÒýÖµÀ´·ÃÎÊÌØ¶¨µÄ×ÓÔªËØ£º
>>> root[0].tag, root[0].text ('branch', '\n text,source\n ' |
²éÕÒÐèÒªµÄÔªËØ
´ÓÉÏÃæµÄʾÀýÖУ¬¿ÉÒÔÃ÷ÏÔ·¢ÏÖÎÒÃÇÄܹ»Í¨¹ý¼òµ¥µÄµÝ¹é·½·¨£¨¶Ôÿһ¸öÔªËØ£¬µÝ¹éʽ·ÃÎÊÆäËùÓÐ×ÓÔªËØ£©»ñÈ¡Ê÷ÖеÄËùÓÐÔªËØ¡£µ«ÊÇ£¬ÓÉÓÚÕâÊÇÊ®·Ö³£¼ûµÄ¹¤×÷£¬ETÌṩÁËһЩ¼ò±ãµÄʵÏÖ·½·¨¡£
Element¶ÔÏóÓÐÒ»¸öiter·½·¨£¬¿ÉÒÔ¶Ôij¸öÔªËØ¶ÔÏóÖ®ÏÂËùÓеÄ×ÓÔªËØ½øÐÐÉî¶ÈÓÅÏȱéÀú£¨DFS£©¡£ElementTree¶ÔÏóͬÑùÒ²ÓÐÕâ¸ö·½·¨¡£ÏÂÃæÊDzéÕÒXMLÎĵµÖÐËùÓÐÔªËØµÄ×î¼òµ¥·½·¨£º
>>> for elem in tree.iter(): ... print elem.tag, elem.attrib ... doc {} branch {'hash': '1cdf045c', 'name': 'codingpy.com'} branch {'hash': 'f200013e', 'name': 'release01'} sub-branch {'name': 'subrelease01'} branch {'name': 'invalid' |
ÔÚ´Ë»ù´¡ÉÏ£¬ÎÒÃÇ¿ÉÒÔ¶ÔÊ÷½øÐÐÈÎÒâ±éÀú¡ª¡ª±éÀúËùÓÐÔªËØ£¬²éÕÒ³ö×Ô¼º¸ÐÐËȤµÄÊôÐÔ¡£µ«ÊÇET¿ÉÒÔÈÃÕâ¸ö¹¤×÷¸ü¼Ó¼ò±ã¡¢¿ì½Ý¡£iter·½·¨¿ÉÒÔ½ÓÊÜtagÃû³Æ£¬È»ºó±éÀúËùÓо߱¸ËùÌṩtagµÄÔªËØ£º
>>> for elem in tree.iter(tag='branch'): ... print elem.tag, elem.attrib ... branch {'hash': '1cdf045c', 'name': 'codingpy.com'} branch {'hash': 'f200013e', 'name': 'release01'} branch {'name': 'invalid'} |
Ö§³Öͨ¹ýXPath²éÕÒÔªËØ
ʹÓÃXPath²éÕÒ¸ÐÐËȤµÄÔªËØ£¬¸ü¼Ó·½±ã¡£Element¶ÔÏóÖÐÓÐһЩfind·½·¨¿ÉÒÔ½ÓÊÜXpath·¾¶×÷Ϊ²ÎÊý£¬find·½·¨»á·µ»ØµÚÒ»¸öÆ¥ÅäµÄ×ÓÔªËØ£¬findallÒÔÁбíµÄÐÎʽ·µ»ØËùÓÐÆ¥ÅäµÄ×ÓÔªËØ,
iterfindÔò·µ»ØÒ»¸öËùÓÐÆ¥ÅäÔªËØµÄµü´úÆ÷£¨iterator£©¡£ElementTree¶ÔÏóÒ²¾ß±¸ÕâЩ·½·¨£¬ÏàÓ¦µØËüµÄ²éÕÒÊÇ´Ó¸ù½Úµã¿ªÊ¼µÄ¡£
ÏÂÃæÊÇÒ»¸öʹÓÃXPath²éÕÒÔªËØµÄʾÀý£º
>>> for elem in tree.iterfind('branch/sub-branch'): ... print elem.tag, elem.attrib ... sub-branch {'name': 'subrelease01'} |
ÉÏÃæµÄ´úÂë·µ»ØÁËbranchÔªËØÖ®ÏÂËùÓÐtagΪsub-branchµÄÔªËØ¡£½ÓÏÂÀ´²éÕÒËùÓо߱¸Ä³¸önameÊôÐÔµÄbranchÔªËØ£º
>>> for elem in tree.iterfind('branch[@name="release01"]'): ... print elem.tag, elem.attrib ... branch {'hash': 'f200013e', 'name': 'release01'} |
¹¹½¨XMLÎĵµ
ÀûÓÃET£¬ºÜÈÝÒ׾ͿÉÒÔÍê³ÉXMLÎĵµ¹¹½¨£¬²¢Ð´Èë±£´æÎªÎļþ¡£ElementTree¶ÔÏóµÄwrite·½·¨¾Í¿ÉÒÔʵÏÖÕâ¸öÐèÇó¡£
Ò»°ãÀ´Ëµ£¬ÓÐÁ½ÖÖÖ÷ҪʹÓó¡¾°¡£Ò»ÊÇÄãÏȶÁȡһ¸öXMLÎĵµ£¬½øÐÐÐ޸ģ¬È»ºóÔÙ½«ÐÞ¸ÄдÈëÎĵµ£¬¶þÊÇ´ÓÍ·´´½¨Ò»¸öÐÂXMLÎĵµ¡£
ÐÞ¸ÄÎĵµµÄ»°£¬¿ÉÒÔͨ¹ýµ÷ÕûElement¶ÔÏóÀ´ÊµÏÖ¡£Çë¿´ÏÂÃæµÄÀý×Ó£º
>>> root = tree.getroot() >>> del root[2] >>> root[0].set('foo', 'bar') >>> for subelem in root: ... print subelem.tag, subelem.attrib ... branch {'foo': 'bar', 'hash': '1cdf045c', 'name': 'codingpy.com'} branch {'hash': 'f200013e', 'name': 'release01'} |
ÔÚÉÏÃæµÄ´úÂëÖУ¬ÎÒÃÇɾ³ýÁËrootÔªËØµÄµÚÈý¸ö×ÓÔªËØ£¬ÎªµÚÒ»¸ö×ÓÔªËØÔö¼ÓÁËÐÂÊôÐÔ¡£Õâ¸öÊ÷¿ÉÒÔÖØÐÂдÈëÖÁÎļþÖС£×îÖÕµÄXMLÎĵµÓ¦¸ÃÊÇÏÂÃæÕâÑùµÄ£º
>>> import sys >>> tree.write(sys.stdout) <doc> <branch foo="bar" hash="1cdf045c" name="codingpy.com"> text,source </branch> <branch hash="f200013e" name="release01"> <sub-branch name="subrelease01"> xml,sgml </sub-branch> </branch> </doc> |
Çë×¢Ò⣬ÎĵµÖÐÔªËØµÄÊôÐÔ˳ÐòÓëÔÎĵµ²»Í¬¡£ÕâÊÇÒòΪETÊÇÒÔ×ÖµäµÄÐÎʽ±£´æÊôÐԵ쬶ø×ÖµäÊÇÒ»¸öÎÞÐòµÄÊý¾Ý½á¹¹¡£µ±È»£¬XMLÒ²²»¹Ø×¢ÊôÐÔµÄ˳Ðò¡£
´ÓÍ·¹¹½¨Ò»¸öÍêÕûµÄÎĵµÒ²ºÜÈÝÒס£ETÄ£¿éÌṩÁËÒ»¸öSubElement¹¤³§º¯Êý£¬Èô´½¨ÔªËصĹý³Ì±äµÃºÜ¼òµ¥£º
>>> a = ET.Element('elem') >>> c = ET.SubElement(a, 'child1') >>> c.text = "some text" >>> d = ET.SubElement(a, 'child2') >>> b = ET.Element('elem_b') >>> root = ET.Element('root') >>> root.extend((a, b)) >>> tree = ET.ElementTree(root) >>> tree.write(sys.stdout) <root><elem><child1>some text</child1><child2 /></elem><elem_b /></root> |
ÀûÓÃiterparse½âÎöXMLÁ÷
XMLÎĵµÍ¨³£¶¼»á±È½Ï´ó£¬ÈçºÎÖ±½Ó½«Îĵµ¶ÁÈëÄÚ´æµÄ»°£¬ÄÇô½øÐнâÎöʱ¾Í»á³öÏÖÎÊÌâ¡£ÕâÒ²¾ÍÊÇΪʲô²»½¨ÒéʹÓÃDOM£¬¶øÊÇSAX
APIµÄÀíÓÉÖ®Ò»¡£
ÎÒÃÇÉÏÃæÌ¸µ½£¬ET¿ÉÒÔ½«XMLÎĵµ¼ÓÔØÎª±£´æÔÚÄÚ´æÀïµÄÊ÷£¨in-memory tree£©£¬È»ºóÔÙ½øÐд¦Àí¡£µ«ÊÇÔÚ½âÎö´óÎļþʱ£¬ÕâÓ¦¸ÃÒ²»á³öÏÖºÍDOMÒ»ÑùµÄÄÚ´æÏûºÄ´óµÄÎÊÌâ°É£¿Ã»´í£¬µÄÈ·ÓÐÕâ¸öÎÊÌ⡣ΪÁ˽â¾öÕâ¸öÎÊÌ⣬ETÌṩÁËÒ»¸öÀàËÆSAXµÄÌØÊ⹤¾ß¡ª¡ªiterparse£¬¿ÉÒÔÑÐòµØ½âÎöXML¡£
½ÓÏÂÀ´£¬±ÊÕßΪ´ó¼ÒչʾÈçºÎʹÓÃiterparse£¬²¢Óë±ê×¼µÄÊ÷½âÎö·½Ê½½øÐжԱȡ£ÎÒÃÇʹÓÃÒ»¸ö×Ô¶¯Éú³ÉµÄXMLÎĵµ£¬ÏÂÃæÊǸÃÎĵµµÄ¿ªÍ·²¿·Ö£º
<?xml version="1.0" standalone="yes"?> <site> <regions> <africa> <item id="item0"> <location>United States</location> <!-- Counting locations --> <quantity>1</quantity> <name>duteous nine eighteen </name> <payment>Creditcard</payment> <description> <parlist> [...] |
ÎÒÃÇÀ´Í³¼ÆÒ»ÏÂÎĵµÖгöÏÖÁ˶àÉÙ¸öÎı¾ÖµÎªZimbabweµÄlocationÔªËØ¡£ÏÂÃæÊÇʹÓÃET.parseµÄ±ê×¼·½·¨£º
tree = ET.parse(sys.argv[2])
count = 0
for elem in tree.iter(tag='location'):
if elem.text == 'Zimbabwe':
count += 1
print count |
ÉÏÃæµÄ´úÂë»á½«È«²¿ÔªËØÔØÈëÄڴ棬ÖðÒ»½âÎö¡£µ±½âÎöÒ»¸öÔ¼100MBµÄXMLÎĵµÊ±£¬ÔËÐÐÉÏÃæ½Å±¾µÄPython½ø³ÌµÄÄÚ´æÊ¹Ó÷åֵΪԼ560MB£¬×ÜÔËÐÐʱ¼äÎÊ2.9Ãë¡£
Çë×¢Ò⣬ÎÒÃÇÆäʵ²»ÐèÒª½²Õû¸öÊ÷¼ÓÔØµ½ÄÚ´æÀï¡£Ö»Òª¼ì²â³öÎı¾ÎªÏàÓ¦ÖµµÃlocationÔªËØ¼´¿É¡£ÆäËûÊý¾Ý¶¼¿ÉÒÔ·ÏÆú¡£Õâʱ£¬ÎÒÃǾͿÉÒÔÓÃÉÏiterparse·½·¨ÁË£º
count = 0 for event, elem in ET.iterparse(sys.argv[2]): if event == 'end': if elem.tag == 'location' and elem.text == 'Zimbabwe': count += 1 elem.clear() # ½«ÔªËØ·ÏÆú
print count |
ÉÏÃæµÄforÑ»·»á±éÀúiterparseʼþ£¬Ê×Ïȼì²éʼþÊÇ·ñΪend£¬È»ºóÅжÏÔªËØµÄtagÊÇ·ñΪlocation£¬ÒÔ¼°ÆäÎı¾ÖµÊÇ·ñ·ûºÏÄ¿±êÖµ¡£ÁíÍ⣬µ÷ÓÃelem.clear()·Ç³£¹Ø¼ü£ºÒòΪiterparseÈÔÈ»»áÉú³ÉÒ»¸öÊ÷£¬Ö»ÊÇÑÐòÉú³ÉµÄ¶øÒÑ¡£·ÏÆúµô²»ÐèÒªµÄÔªËØ£¬¾ÍÏ൱ÓÚ·ÏÆúÁËÕû¸öÊ÷£¬Êͷųöϵͳ·ÖÅäµÄÄÚ´æ¡£
µ±ÀûÓÃÉÏÃæÕâ¸ö½Å±¾½âÎöͬһ¸öÎļþʱ£¬ÄÚ´æÊ¹Ó÷åÖµÖ»ÓÐ7MB£¬ÔËÐÐʱ¼äΪ2.5Ãë¡£ËÙ¶ÈÌáÉýµÄÔÒò£¬ÊÇÎÒÃÇÕâÀïÖ»ÔÚÊ÷±»¹¹½¨Ê±£¬±éÀúÒ»´Î¡£¶øÊ¹ÓÃparseµÄ±ê×¼·½·¨ÊÇÏÈÍê³ÉÕû¸öÊ÷µÄ¹¹½¨ºó£¬²ÅÔٴαéÀú²éÕÒËùÐèÒªµÄÔªËØ¡£
iterparseµÄÐÔÄÜÓëSAXÏ൱£¬µ«ÊÇÆäAPIÈ´¸ü¼ÓÓÐÓãºiterparse»áÑÐòµØ¹¹½¨Ê÷£»¶øÀûÓÃSAXʱ£¬Ä㻹µÃ×Ô¼ºÍê³ÉÊ÷µÄ¹¹½¨¹¤×÷¡£ |