¾ø´ó²¿·ÖµÄ´óÊý¾ÝÐèÇó¶¼À´×ÔÓÚ
Internet ¼¼ÊõµÄ±¬Õ¨£¬ÕâÒѾ²»ÊÇÊ²Ã´ÃØÃÜ¡£ÃæÏò¹«ÖÚµÄÓ¦ÓóÌÐò¿ÉÒÔÓµÓм¸°ÙÍòÓû§£¬Õâ¸öÏë·¨ÔÚ 10-20
ÄêǰÊÇÎÅËùδÎŵġ£Èç½ñ£¬¼´Ê¹ÊÇÒ»¸öÆÕÍ¨ÍøÕ¾£¬Ò²¿ÉÄÜÓµÓÐÊý°ÙÍòÓû§£¬Èç¹ûÕâЩÓû§ÊÇ»îÔ¾µÄ£¬ÄÇôÿÌì¿ÉÄܲúÉúÊý°ÙÍò¸öÊý¾ÝÏî¡£¾ßÓзí´ÌÒâζµÄÊÇ£¬´´½¨´óÊý¾ÝµÄ»ù´¡¼Ü¹¹ºÍϵͳҲ¿ÉÒÔ·´Ïò¹¤×÷£¬ÌṩһЩ¸üºÃµÄ·½·¨À´¼¯³ÉºÍʹÓøÃÊý¾Ý¡£ÓÐÓõÄÊÇ£¬InfoSphere
BigInsights ͨ¹ýÒ»¸ö¼òµ¥µÄ REST API Ö§³ÖÊý¾Ý×÷ÒµµÄ¹ÜÀíºÍÖ´ÐС£Í¨¹ý Jaql ½Ó¿Ú£¬ÎÒÃÇ¿ÉÒÔÔËÐвéѯ£¬²¢Ö±½Ó´Ó
Hadoop ¼¯ÈºÖлñÈ¡ÐÅÏ¢¡£±¾ÎĽ«Öصã½éÉÜÕâЩϵͳÈçºÎÐͬ¹¤×÷£¬Îª²¶×½Êý¾ÝÌṩ·á¸»µÄ»ù´¡£¬²¢ÌṩÁËÒ»??¸öÓÃÀ´Ôٴα¸·ÝÐÅÏ¢µÄ½Ó¿Ú¡£
ʹÓà REST ¼¼ÊõµÄÓ¦ÓóÌÐò
REST ÊÇÒ»¸ö¼òµ¥µÄÒ×ÓÚʹÓõĽṹ£¬ÓÃÓÚºÍÌØ¶¨µÄ·þÎñ¼°Ó¦ÓóÌÐò½øÐн»»¥¡£ËüÖ²¸ùÓÚÐí¶à¼¼Êõ£¬ÆäÖаüÀ¨
XML-RPC ºÍ SOAP£¬µ±È»»¹ÓÐ HTTP£¬ËüÏÖÔÚÊÇÎÞ´¦²»ÔÚµÄÊ×Ñ¡ÍøÂç´«Êä·½·¨¡£
InfoSphere BigInsights ×Ô´ø Jaql ºÍÏàÓ¦µÄ²¿Êð½Ó¿Ú£¬Ê¹Óû§¿Éͨ¹ý
REST ½Ó¿Ú·ÃÎÊËü¡£ÎªÁËÔËÐÐ Jaql ½Ó¿Ú£¬Ê×ÏÈÐèÒª°²×°Ê¾Àý Jaql Ó¦ÓóÌÐò£¬Äú¿ÉÒÔͨ¹ý InfoSphere
BigInsights ¿ØÖÆÌ¨Íê³É´Ë°²×°¡£
·ÃÎÊ http://servername:8080 »òÕߣ¬Èç¹ûÄúλÓÚͬһ̨»úÆ÷ÉÏ£¬Çë·ÃÎÊ
http://localhost:8080£¬²¢´ò¿ª¿ØÖÆÌ¨¡£µ¥»÷ Applications Ñ¡Ï£¬Èçͼ
1 Ëùʾ¡£

ͼ 1. Applications Ñ¡Ï
ÏÖÔÚ£¬µ¥»÷Ò³Ãæ×óÉÏ½ÇµÄ Manage Á´½Ó£¬Èçͼ 2 Ëùʾ£¬È»ºóÑ¡ÖÐ Ad
hoc Jaql query Ó¦ÓóÌÐò£¬µ¥»÷ Deploy¡£

ͼ 2. Manage Á´½Ó
ÔÚÍê³ÉÓ¦ÓóÌÐò²¿Êðºó£¬ÒªÕÒµ½ËüµÄ¶Ëµã¡£ÔÚ REST ÖУ¬¶ËµãÊÇÓ¦ÓóÌÐòµÄ
URL¡£InfoSphere BigInsights ÔÚÓ¦ÓóÌÐò²¿Êðʱ´´½¨Ò»¸öΩһµÄÓ¦ÓóÌÐòÒýÓá£
ΪÁËÕÒµ½¶Ëµã£¬¿ÉÒÔʹÓÃÒ»¸ö REST µ÷ÓÃÀ´»ñµÃÒÑÅäÖõÄÓ¦ÓóÌÐòµÄÁÐ±í¡£Äú¿ÉÒÔʹÓÃÈκÎ
REST ¿Í»§¶Ë£¬°üÀ¨ä¯ÀÀÆ÷¡£ÏÂÃæµÄʾÀýʹÓÃÁËÃüÁîÐй¤¾ß curl¡£Ê×ÏÈ£¬Í¨¹ý·ÃÎÊÒÔÏ URL À´»ñȡӦÓóÌÐòÁÐ±í£ºhttp://servername:8080/data/controller/catalog/applications:
$ curl -O http://192.168.0.20:8080/data/controller/catalog/applications¡£
Õ⽫´´½¨Ò»¸öÃûΪ applications µÄÎļþ£¬Ëü°üº¬ÁËËùÓÐÒÑÅäÖõÄÓ¦ÓóÌÐòºÍ»î¶¯Ó¦ÓóÌÐòµÄÏêϸÐÅÏ¢¡£²é¿´¸ÃÎļþ½«»áÏÔʾ°üº¬Ó¦ÓóÌÐò¶¨ÒåµÄ
XML¡£Ñ°ÕÒ°üº¬ Jaql µÄÓ¦ÓóÌÐò£¬ÈçÇåµ¥ 1 Ëùʾ¡£
Çåµ¥ 1. Applications Îļþ
<row> <column>3d420497-e1a6-411f-9644-c40db1c290b6</column> <column>Ad hoc Jaql query</column> <column>The Ad hoc Jaql Query application runs a custom query entered in the UI to analyze data.</column> <column>samples</column> <column>images/appicons/3d420497-e1a6-411f-9644-c40db1c290b6.png </column> <column>catalogStore/archive/3d420497-e1a6-411f-9644-c40db1c290b6.zip </column> <column>Query,SQL</column> <column>DEPLOYED</column> </row> |
µÚÒ»¸ö <column> ¿éÏÔʾÁËΩһµÄÓ¦ÓóÌÐò ID£¬Î´À´µÄ²éѯ½«»áʹÓÃËü¡£
ΪÁËÈ·ÈÏÒѾ»ñµÃÕýÈ·µÄÓ¦ÓóÌÐò£¬¿Éͨ¹ýʹÓà REST URL ²é¿´Ëù·ÃÎʵÄÓ¦ÓóÌÐòµÄÏêϸÐÅÏ¢£¬ÈçÇåµ¥
2 Ëùʾ£º http://servername:8080/data/controller/catalog/applications/applicationID¡£
Çåµ¥ 2. »ñµÃ¹ØÓÚij¸öÓ¦ÓóÌÐòµÄÏêϸӦÓóÌÐòÐÅÏ¢
$ curl -O http://servername:8080/data/controller/catalog/applications/applicationID 3d420497-e1a6-411f-9644-c40db1c290b6 |
Çåµ¥ 2 ÖеĴúÂ뽫»áÉú³É¸üÏêϸµÄ XML ÃèÊö£¬ÈçÇåµ¥ 3 Ëùʾ¡£
Çåµ¥ 3. ÏêϸµÄ XML ÃèÊö
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <application-template xmlns="http://biginsights.ibm.com/application"> <name>Ad hoc Jaql query</name> <description>The Ad hoc Jaql Query application runs a custom query entered in the UI to analyze data.</description> <properties> <property uitype="textfield" paramtype="TEXTAREA" name="script" label="Jaql query" isRequired="true" isOutputPath="false" isInputPath="false" description=" Ad hoc Jaql query"/> </properties> <assets> <asset type="WORKFLOW" id="Ad hoc Jaql query"/> </assets> <localContextPath>catalogStore/archive/3d420497-e1a6-411f-9644 -c40db1c290b6.zip</localContextPath> <imagePath>images/appicons/3d420497-e1a6-411f-9644 -c40db1c290b6.png</imagePath> <appId>3d420497-e1a6-411f-9644-c40db1c290b6</appId> <creator>samples</creator> <dfsPath>/user/applications/3d420497-e1a6-411f-9644 -c40db1c290b6/workflow</dfsPath> <categories>Query,SQL</categories> <runtimeDependencies>/biginsights/oozie/sharedLibraries/jaql</ runtimeDependencies> </application-template> |
ʹÓÃÕâЩÏêϸÐÅÏ¢£¬½«Ó¦ÓóÌÐòÇëÇóÌá½»¸øÔËÐÐ Jaql ²éѯµÄϵͳ¡£Ïà¹ØµÄÐÅÏ¢ÊÇÊôÐÔÁÐ±í¡£Êä³öÏÔʾ£¬Ö»ÓÐÒ»¸öÊôÐÔ£ºJaql
²éѯÎı¾¡£ÎÒÃÇÒª½Ù³Ö´ËÊôÐÔ£¬ÒÔ±ãÔËÐÐÈÎÒâ Jaql ²éѯ¡£
ΪÁËÔËÐÐÒ»¸ö²éѯ£¬ÎÒÃǽ«¹¹ÔìÒ»¸ö°üº¬ÊôÐÔÐÅÏ¢µÄ XML Îļþ¡£Æä»ù±¾½á¹¹ÈçÇåµ¥ 4 Ëùʾ¡£
Çåµ¥ 4. °üº¬ÊôÐÔÐÅÏ¢µÄ XML ÎļþµÄ»ù±¾½á¹¹
<runconfig> <name>Hello Jaql</name> <appid>3d420497-e1a6-411f-9644-c40db1c290b6</appid> <properties> <property> <name>script</name> <value paramtype='TEXTAREA'>'Hello World';</value> </property> </properties> </runconfig> |
<appid> ÊÇÓ¦ÓóÌÐò ID£¬ÔÚÇ°Ãæ²½ÖèÖлñµÃÒÑÅäÖÃÓ¦ÓóÌÐòµÄÁбíʱ½øÐÐÈ·¶¨¡£script
<value> ÊÇÏëÖ´ÐÐµÄ Jaql ½Å±¾¡£
ΪÁËÌá½»Ò»¸ö×÷Òµ£¬±ØÐ뽫´Ë XML£¨URL ±àÂëµÄ£©×÷Ϊ²ÎÊýÖµ·¢Ë͵½²»Í¬µÄ REST ¶Ëµã£¬½«ÒѱàÂëµÄ
XML Ìṩ¸ø runconfig ²ÎÊý£¬ÈçÇåµ¥ 5 Ëùʾ¡£
Çåµ¥ 5. Ìá½»Ò»¸ö×÷Òµ
$ curl -o t.out "http://192.168.0.20:8080/data/controller/ApplicationManagement ?actiontype=run_application&runconfig=$jaqlxml" |
¶Ô XML ½øÐбàÂ룬ʹÓÃÁ˶à¸ö URL ±àÂëµÄ¹¤¾ß»òº¯ÊýÖ®Ò»£¬±ÈÈç PHP ÖÐµÄ urlencode()
º¯Êý£¬»ò JavaScript ÖÐµÄ encodeURIComponent() º¯Êý¡£
Çåµ¥ 6 ½«ÐÅϢд³öµ½Ò»¸ö t.out Îļþ£¬Ëü½«°üº¬Ö´ÐÐ ID ºÍÒ»¸öÓà JSON Öµ±íʾµÄ״̬¡£
Çåµ¥ 6. ½«ÐÅϢд³öµ½Ò»¸öÎļþ
{ "result":{ "oozie_id":"0000003-131017053452866-oozie-biad-W", "status":"OK" } } |
Èç¹û״̬ÊÇ ¡°OK¡± ÒÔÍâµÄÈκÎ״̬£¬ÄÇôËùÌá½»µÄ×÷Òµ´æÔÚÒ»¸öÎÊÌâ¡£Á½¸ö³£¼ûµÄÎÊÌâÊÇ£¬Ó¦ÓóÌÐò ID
ÎÞЧ£¬»ò XML ûÓÐÁ¼ºÃµÄ½á¹¹»ò±àÂë¡£
±»·µ»ØµÄ oozie_id ÊÇÒ»¸ö×÷Òµ±êʶ·û£¬¿ÉÓÃÓÚÔÚ×÷ÒµÍê³Éºó´Ó×÷ÒµÖлñµÃÊä³ö¡£ÎªÁË»ñµÃ×÷ÒµÏêϸÐÅÏ¢£¬Çë·ÃÎÊ
REST ¶Ëµã£ºhttp://<oozieHost>:<ooziePort>/oozie/v1/job/<oozieid>?show=info¡£ÀýÈ磬ΪÁË»ñµÃ¸Õ¸ÕÌá½»µÄ×÷ÒµµÄ״̬£¬¿ÉʹÓÃÇåµ¥
7 ÖеĴúÂë¡£
Çåµ¥ 7. »ñµÃ¸Õ¸ÕÌá½»µÄ×÷ÒµµÄ״̬
$ curl -o status.out "http://192.168.0.20:8280/oozie/v1/job/0000003 -131017053452866-oozie-biad-W?show=info" |
´Ë´úÂëÉú³ÉÎļþ status.out£¬ÆäÖаüº¬ËùÖ´ÐеÄ×÷ÒµµÄ JSON ±íʾ£¬ÈçÇåµ¥ 8 Ëùʾ¡£
Çåµ¥ 8. Éú³ÉÎļþ status.out
{ "actions" :[ { "retries" :0, "externalStatus" :"SUCCEEDED", "externalId" :"job_201310170523_0004", "status" :"OK", "trackerUri" :"bivm:9001", "toString" :"Action name[jaql1] status[OK]", "errorCode" : null, "endTime" :"Thu, 17 Oct 2013 12:22:58 GMT", "id" :"0000003-131017053452866-oozie-biad-W@jaql1", "startTime" :"Thu, 17 Oct 2013 12:22:38 GMT", "consoleUrl" :"http://bivm:50030/jobdetails.jsp?jobid =job_201310170523_0004", "transition" :"end", "stats" : null, "name" :"jaql1", "data" :"#\n#Thu Oct 17 08:22:58 EDT 2013\nhadoopJobs=\n", "errorMessage" : null, "conf" :"<jaql xmlns=\"uri:oozie:jaql-action:0.1\">\r\n <job-tracker>bivm:9001</job-tracker>\r\n <name-node>hdfs://bivm:9000</name-node>\r\n <configuration>\r\n <property>\r\n <name>mapred.compress.map.output</name>\r\n <value>true</value>\r\n </property>\r\n <property>\r\n <name>mapred.job.queue.name</name>\r\n <value>default</value>\r\n </property>\r\n </configuration>\r\n <script>adhoc.jaql</script>\r\n <eval>setOptions( { conf:{ \"hadoop.job.ugi\":\"biadmin,\" }} );\r\n\t\t\t\tsetOptions( { conf:{ \"user.name\":\"biadmin\" }} );\r\n \t 'Hello World';;</eval>\r\n</jaql>", "externalChildIDs" : null, "cred" :"null", "type" :"jaql" } ], "appPath" :"hdfs://bivm:9000/user/applications/3d420497-e1a6-411f-9644 -c40db1c290b6/workflow", "appName" :"jaql-adhoc", "externalId" : null, "status" :"SUCCEEDED", "lastModTime" :"Thu, 17 Oct 2013 12:22:59 GMT", "createdTime" :"Thu, 17 Oct 2013 12:22:38 GMT", "toString" :"Workflow id[0000003-131017053452866-oozie-biad-W] status[SUCCEEDED]", "group" : null, "run" :0, "endTime" :"Thu, 17 Oct 2013 12:22:59 GMT", "user" :"biadmin", "id" :"0000003-131017053452866-oozie-biad-W", "startTime" :"Thu, 17 Oct 2013 12:22:38 GMT", "consoleUrl" :"http://bivm:8280/oozie?job=0000003-131017053452866-oozie-biad-W", "acl" : null, "progress" :1, "conf" :"<configuration>\r\n <property>\r\n <name>oozie.libpath</name>\r\n <value>/biginsights/oozie/sharedLibraries/jaql</value>\r\n </property>\r\n <property>\r\n <name>oozie.wf.application.path</name>\r\n <value>hdfs://bivm:9000/user/applications/3d420497 -e1a6-411f-9644-c40db1c290b6/workflow</value>\r\n </property>\r\n <property>\r\n <name>jobTracker</name>\r\n <value>bivm:9001</value>\r\n </property>\r\n <property>\r\n <name>script</name>\r\n <value>'Hello World';</value>\r\n </property>\r\n <property>\r\n <name>queueName</name>\r\n <value>default</value>\r\n </property>\r\n <property>\r\n <name>nameNode</name>\r\n <value>hdfs://bivm:9000</value>\r\n </property>\r\n <property>\r\n <name>user.name</name>\r\n <value>biadmin</value>\r\n </property>\r\n</configuration>", "parentId" : null } |
status.out ÎļþµÄ¹Ø¼ü²¿·ÖÊÇ status ÐУ¬Ëü˵Ã÷¸Ã×÷ÒµÒѳɹ¦Íê³É¡£URL Ò²ºÞÓÐÓã¬ËüÔÚ
consoleUrl ÖÐÌṩÁËÓйØ×÷ÒµµÄ¸ü¶àÐÅÏ¢¡£
Ô¶³Ì·ÃÎʵÄ×îºóÒ»²¿·ÖÊÇÀûÓà WebHDFS ½Ó¿Ú£¬Ëüͨ¹ý HTTP Ìṩ¶Ô´æ´¢ÔÚ Hadoop ÖеÄÊý¾ÝµÄ·ÃÎÊ¡£´Ë¹¦ÄÜĬÈÏÇé¿öϰ²×°ÔÚ
InfoSphere BigInsights Öв¢ÔÚÆäÖÐʹÓá£ÎªÁ˲âÊÔÕâ¸ö¹¦ÄÜÕýÔÚ¹¤×÷£¬Çë·ÃÎÊÒÔÏ URL£ºhttp://servername:14000/webhdfs/v1?op=GETHOMEDIRECTORY&user.name=biadmin¡£
ÐèҪʹÓà user.name ²ÎÊý£¬¶øÇҸòÎÊý±ØÐëÓë Hadoop °²×°ÖеÄÓÐЧÓû§ÏàÆ¥Åä¡£op ²ÎÊý£¨GETHOMEDIRECTORY£©Ó¦·µ»ØÓû§µÄ
home Ŀ¼¡£ÐÅÏ¢ÒÔÒ»¸ö JSON ¶ÔÏóµÄÐÎʽ·µ»Ø£º{"Path":"\/user\/biadmin"}¡£
ÒªÕæÕý´Ó HDFS ÏÂÔØÒ»¸öÎļþ£¬¿ÉʹÓà OPEN ²Ù×÷¡£ÀýÈ磬´Ó biadmin Óû§µÄ chicago
Ŀ¼ÏÂÔØÎļþ chicago.csv£¬Ê¹ÓÃÇåµ¥ 9¡£
Çåµ¥ 9. ´Ó HDFS ÏÂÔØÒ»¸öÎļþ
$ curl -o chicago.csv "http://192.168.0.20:14000/webhdfs/v1/user/biadmin/chicago/ chicago.csv?user.name=biadmin&op=OPEN" |
ÀûÓÃÕâÒ»×é»ù±¾µÄ REST ºÍ HTTP ½Ó¿Ú£¬Ê¹ÓÃÒ»¸öºÜºÃµÄ˳Ðò¶ÔÊý¾ÝÖ´ÐÐÈÎÒâ Jaql ²éѯ£º
ͨ¹ý REST£¬Ê¹Óà XML ½«×÷ÒµÌá½»µ½ Jaql Ó¦ÓóÌÐò·þÎñ¡£
ͨ¹ý REST ¼ì²é×÷ҵ״̬¡£
ͨ¹ý REST£¬Ê¹Óà WebHDFS ÏÂÔØÉú³ÉµÄÎļþ¡£
ÔÚ³¢ÊÔÕâ¸ö˳Ðò֮ǰ£¬ÎÒÃÇ¿ìËÙ½éÉÜÒ»ÏÂͨ¹ý Jaql ´¦ÀíÊý¾Ý¡£
ͨ¹ý Jaql ¼ÓÔØºÍ¶ÁÈ¡Êý¾Ý
Jaql µÄ¹¤×÷ÔÀíÊÇÖ±½Ó´ÓÒ»¸öÔ´¶ÁÈ¡ºÍдÈëÊý¾Ý£¬´¦Àí²¢½âÎöÄÚÈÝ£¬È»ºóд»Ø¸ÃÐÅÏ¢¡£Jaql ÊÇÒ»¸ö²éѯ·ÖÎöÆ÷£¬Ëü·ÃÎÊÊý¾Ý£¨Í¨¹ýÈκοɷÃÎʵÄÊֶΣ©£¬¶ÔÕâЩÐÅÏ¢½øÐвéѯ£¬²¢·µ»ØÊý¾Ý¡£Jaql
ÆäʵÊÇÕë¶ÔÊý¾Ý´¦Àí¶øÉè¼ÆµÄÒ»ÖÖСÓïÑÔ£¬µ«ËüÒ²°üÀ¨¶Ô¶Áд HDFS µÄÖ§³Ö¡£
ËäÈ»ÔÚʵ¼ùÖÐ Jaql ¿ÉÒÔ¶ÁÈ¡ºÍдÈë¶àÖÖÊý¾Ý´æ´¢£¬µ«×î¼ÑµÄÐÔÄܺʹ¦ÀíÊÇÔÚ Jaql ¿ÉÒÔ´Ó´æ´¢²¢ÐжÁÈ¡Êý¾ÝʱʵÏֵġ£Jaql
ʵ¼ÊÉÏ´Ó I/O ²ã½ÓÊÕÓйØÊý¾ÝÊÇÒÔ´®Ðл¹ÊDz¢Ðз½Ê½¶ÁÈëµÄÐÅÏ¢¡£ÕâÖÖÄÜÁ¦Ê¹µÃËü·Ç³£ÊʺÏÓÚ´¦ÀíÀ´×Ô HDFS
µÄÐÅÏ¢£¬ÌرðÊʺÏÓÚÐÅÏ¢¹ã·º·Ö²¼ÔÚij¸ö´óÐͼ¯ÈºÖеÄÇé¿ö¡£
ÔÚ×î»ù±¾µÄ²ãÃæÉÏ£¬¿ÉÒÔʹÓà read() ºÍ write() º¯ÊýÔÚ Jaql ÖжÁдÊý¾Ý¡£ÔÚÕâ¸ö²ãÃæÊÔÓÃ
Jaql µÄ×î¼òµ¥·½·¨ÊÇʹÓà jaqlshell (/opt/ibm/biginsights/jaql/bin/jaqlshell)£¬ËüÌṩһ¸ö½»»¥Ê½½çÃæ£¬¿ÉÒÔÔÚ
Jaql Ìáʾ·ûÏÂÖ´ÐÐÓï¾ä£¬ÈçÇåµ¥ 10 Ëùʾ¡£
Çåµ¥ 10. Jaql Ìáʾ·û
[biadmin@bivm ~]$ /opt/ibm/biginsights/jaql/bin/jaqlshell jaql> |
ÀýÈ磬Ҫ¶Áȡһ¸ö±¾µØÎļþ£¬¿ÉʹÓà read('file:///chicago.csv');¡£Òª´Ó HDFS
¶ÁÈ¡Îļþ£¬¿ÉʹÓà read(hdfs('chicago.csv'));¡£
ÔÚĬÈÏÇé¿öÏ£¬Jaql »áÔ¤¶ÁÈ¡ Hadoop sequence Îļþ£¬µ« Jaql »¹°üÀ¨ÓÃÓÚ´¦Àí²»Í¬Îļþ¸ñʽ£¨°üÀ¨
CSV¡¢JSON µÈ£©µÄÌØ¶¨½âÎöÆ÷¡£ÕâÖÖÁé»îµÄģʽÌṩÁËÃ÷ÏÔµÄÓÅÊÆ£¬¿ÉÒÔÔÚ²»Í¬Ä¿±êÖжÁд²»Í¬Îļþ¸ñʽµÄÊý¾Ý¡£ÀýÈ磬Ҫ¶Áȡһ¸ö·Ö¸ô·ûµÄÎļþ£¬¿ÉÒÔʹÓÃ
del() º¯Êý£ºread(del('chicago.csv'));¡£¸Ãº¯Êý¶ÁÈëÊý¾Ý£¬±êʶÄÚÈÝ£¬²¢½«ËüÃÇ·ÅÔÚÒ»¸öÄÚ²¿
JSON ½á¹¹ÖУ¬ÈçÇåµ¥ 11 Ëùʾ¡£
Çåµ¥ 11. ½«Ëü·ÅÈëÒ»¸öÄÚ²¿ JSON ½á¹¹
[ [ "03/12/2011 12:20:44 AM", "28", "2", "11", "29.32" ], [ "03/12/2011 12:20:44 AM", "29", "8", "56", "19.940000000000001" ], ... ] |
ÕâÖÖ¸ñʽºÜÓÐÓ㬵«½øÒ»²½½²£¬Ëü¿ÉÒÔÔÚÎļþ×Ö¶ÎÃû³ÆÖзÖÅäµ¥¸ö×ֶΡ£´Ó³¤Ô¶À´¿´£¬×Ö¶ÎÃû³ÆÊ¹µÃÊý¾Ý¸ü¾ß¿ÉËÜÐÔ£¬ÒòΪËüʹµÃÎÒÃÇ¿ÉÒÔ»ùÓÚ×ֶζø²»ÊÇÒþº¬µÄÁбàºÅÀ´²éѯºÍÔËÐдúÂë¡£Çåµ¥
12 ÏÔʾÁËÈçºÎ·ÖÅä×Ö¶ÎÃû³Æ¡£
Çåµ¥ 12. ÔÚÎļþ×Ö¶ÎÃû³ÆÖзÖÅäµ¥¸ö×Ö¶Î
read(del('chicago/chicago.csv',{ schema: schema { logdate: string, region: long, buscount: long, logreads: long, speed: double}})); |
ʹÓÃÎı¾»ò JSON ÎļþµÄºÃ´¦ÊÇ£¬¿ÉÒÔ²ÉÓÃÐз½Ê½¶ÔËüÃǽøÐжÁд£¬Õâ·Ç³£ÊʺÏÅäºÏ HDFS ºÍ Hadoop
ʹÓá£Çë×¢ÒâÕâÀïʹÓÃÁ˲»Í¬µÄÀàÐÍ¡£ÊäÈëÖеÄÔʼÈÕÆÚ×Ö·û´®£¨È¡×Ô Chicago Traffic Tracker£©²»ÊÇ
JSON ¸ñʽµÄÈÕÆÚÐÅÏ¢£¬ËùÒÔÔÚ¶Ô´ËÈÕÆÚÀàÐͽøÐнøÒ»²½´¦Àí֮ǰ²»ÄÜʹÓÃËüÃÇ¡£Èç¹ûÐèÒª´ïµ½ÕâÖÖÏêϸ³Ì¶È£¬Jaql
ÌṩÁËÒ»¸öÍêÕûµÄ½âÎöÆ÷À´´¦ÀíÕâЩÊý¾Ý¡£
È»ºó£¬Êä³ö±ä³ÉÁËÒ»¸ö¼Ç¼Êý×飬ÈçÇåµ¥ 13 Ëùʾ¡£
Çåµ¥ 13. ¼Ç¼Êý×é
... { "logdate":"02/11/2013 09:51:23 AM", "region":26, "buscount":54, "logreads":910, "speed":26.59 }, { "logdate":"02/11/2013 09:51:23 AM", "region":27, "buscount":27, "logreads":336, "speed":30.0 }, ... |
ÒªÔÚ Jaql Öд¦ÀíÐÅÏ¢£¬Äú¿ÉÒÔ½«ÐÅÏ¢Ö¸¶¨¸øÄ³¸ö±äÁ¿£¬ÈçÇåµ¥ 14 Ëùʾ¡£
Çåµ¥ 14. ½«ÐÅÏ¢Ö¸¶¨¸øÄ³¸ö±äÁ¿
x = read(del('chicago/chicago.csv',{ schema: schema { logdate: string, region: long, buscount: long, logreads: long, speed: double}})); |
Jaql ÔÚÕâ·½ÃæµÄÁé»îÐÔÔÚ Hadoop Íⲿ·Ç³£ÓÐÓá£ÀýÈ磬Jaql ¿ÉÒÔÓÃÀ´¶ÁÈ¡±¾µØÎļþϵͳÖеÄÊý¾Ý£¬²¢½«Êý¾ÝдÈë
HDFS¡£ËüÒ²¿ÉÒÔÓÃÓÚͨ¹ý³ÌÐò½«Îı¾»ò CSV Îļþת»»Îª JSON¡£ÔÚÎÒÃÇµÄ Web Ó¦ÓóÌÐòÖУ¬ÎÒÃÇ¿ÉÒÔÀûÓÃÕâÖÖÁé»îÐÔ£¬´Ó
HDFS ¼ÓÔØÊý¾Ý£¬´¦ÀíËü£¬²¢Ð´³öÒ»¸ö JSON Îļþ£¬¿ÉÒÔ´Ó Web ½çÃæÖиü·½±ãµØÊ¹ÓøÃÎļþÀ´ÌṩºÍÏÔʾ²éѯµÄ½á¹û¡£
È»ºó£¬¿ÉÒÔÒýÓÃÊý¾Ý£¬²¢ÔÚÄÚ²¿×ª»»Ëü£¬Ê¹Óà x -> write(seq('chicago.seq'));£¬¿ÉÒÔ½«Ëüд³öΪһ¸öÐòÁÐÎļþ£¬»òʹÓÃ
jsonText() º¯ÊýÏÔʽµØ½«Ëüת»»ÎªÎļþÖеÄÏàÓ¦ JSON£ºx -> write(jsonText('chicago.json'));¡£²é¿´
Jaql µÄÎĵµ£¬»ñµÃ¸ü¶à¶Áд¶þ½øÖÆ¡¢ÐòÁС¢JSON ºÍÆäËû¸ñʽµÄ¸´ÔÓʾÀý£¨²ÎÔÄ ²Î¿¼×ÊÁÏ£©¡£
ÔÚ´Ó Web ½çÃæ±àд²éѯʱ£¬ÎÒÃǽ«Ê¹Óà JSON Êä³ö¸ñʽ£¬ÒÔ±ãд³öÒ»¸öÒÔºó¿ÉÒÔʹÓÃ
WebHDFS ·ÃÎ浀 JSON Îļþ¡£
Ö´ÐÐ Jaql ²éѯ
ÔÚ Jaql ÖÐÓÐÊý¾ÝÖ®ºó£¨Í¨¹ýÏÔʽ»òÒþʽµØ¶ÁÈ¡Êý¾Ý£©£¬¾Í¿ÉÒÔ¶ÔÊý¾Ý½á¹¹Ö´ÐÐת»»ºÍ²éѯ¡£¸Ãת»»¿ÉÒԺܼòµ¥£¬Ò²¿ÉÒÔÊÇ»ùÓÚ·ÖÎöµÄÊý¾Ý½á¹¹µÄÀàËÆÓÚÈ«
SQL µÄ²éѯ¡£
ÎÒÃǽ«Ìø¹ý»ù±¾×ª»»£¬ÒòΪÓë¶ÔÊý¾ÝÖ´ÐÐ SQL Óï¾äÏà±È£¬ËüÃǶÔÎÒÃDz»Ì«ÓÐÓá£À´×ÔÎÒÃǵÄÊý¾Ý´¦ÀíµÄÒѶ¨Òå×ֶν«³ÉΪÄú¿ÉÒÔÑ¡ÔñºÍ²éѯµÄ×ֶΣ¬¶øÇÒ±äÁ¿£¨ÉÏÃæÊ¾ÀýÖеÄ
x£©ÊDZí¸ñ¡£Òò´Ë£¬ÎÒÃÇ¿ÉÒÔÖ´ÐÐÑ¡ÔñÌØ¶¨×ֶεĻù±¾²éѯ£¬ÈçÇåµ¥ 15 Ëùʾ¡£
Çåµ¥ 15. Ö´ÐÐÑ¡ÔñÌØ¶¨×ֶεĻù±¾²éѯ
jaql> SELECT region FROM x; [ ... { "region":3 }, { "region":4 }, { "region":5 } ] |
Ö´ÐÐÉæ¼°º¯ÊýºÍ·Ö×éµÄ¸´ÔÓ²éѯ£¬Çë²Î¼ûÇåµ¥ 16¡£
Çåµ¥ 16. Ö´ÐÐÉæ¼°º¯ÊýºÍ·Ö×éµÄ¸´ÔÓ²éѯ
jsql> SELECT region,avg(speed) FROM x GROUP BY region; [ ... { "region":26, "#1":29.466060606060772 }, { "region":27, "#1":28.768625178975057 }, { "region":28, "#1":21.32377419211978 }, { "region":29, "#1":19.889688925634466 } ] |
Èç¹ûÄúÒѾÏòÄÚ²¿½á¹¹¼ÓÔØ¶à¸öÎļþ£¬ÄÇô¿ÉÒÔÖ´ÐÐÁª½Ó£¬½«Êý¾Ý×éºÏÔÚÒ»Æð¡£
Êä³ö¿ÉÒÔ·ÖÅä¸øÒ»¸ö±ä??Á¿£¬È»ºó½«²éѯµÄ½á¹ûд³öµ½Ä³¸öÎļþ£¬ÎÒÃÇ¿ÉÒÔ½«ÕâЩ²Ù×÷¶¼·ÅÔÚÒ»¸ö½Å±¾ÖУ¬ÈçÇåµ¥
17 Ëùʾ¡£
Çåµ¥ 17. ½«Êä³ö·ÖÅ䏸ij¸ö±äÁ¿²¢½«²éѯµÄ½á¹ûд³öµ½Ä³¸öÎļþ
x = read(del('chicago/chicago.csv',{ schema: schema { logdate: string,
region: long, buscount: long, logreads: long, speed: double}}));
y = SELECT region,avg(speed) FROM x GROUP BY region;
y -> write(jsonText('output.json')); |
Çåµ¥ 17 ÖеĽű¾½«Ö´ÐÐÈý¸ö²½Ö裺¶ÁȡԴÎļþ£¬Ö´Ðвéѯ£¬È»ºó½«ÐÅϢд³öµ½Ò»¸ö JSON ¸ñʽµÄÎļþ¡£
ÎÒÃÇÏÖÔÚÓÐÁËÈçºÎÌá½»×÷Òµ¡¢·ÃÎÊÆä״̬¡¢±àд²éѯ£¬ÒÔ¼°¶ÁÈ¡Éú³ÉµÄÊä³öÎļþµÄ»ù±¾½á¹¹¡£
¹¹½¨Ò»¸ö Web ½çÃæÀ´·ÃÎÊ´óÊý¾Ý´æ´¢
¸÷²¿·Ö¶¼×¼±¸ºÃÖ®ºó£¬ÎÒÃǾͿÉÒÔ±àдһ¸ö»ù±¾µÄ HTML ½çÃæÀ´·ÃÎÊ InfoSphere BigInsights
·þÎñÆ÷£¬Ëü½«Ö´ÐÐÈÎÒâ Jaql ½Å±¾²¢Ð´³öÒªÔÚÒ³ÃæÖв鿴µÄÊý¾Ý¡£´Ë½çÃæµÄ HTML ÈçÇåµ¥ 18 Ëùʾ¡£
Çåµ¥ 18. ½çÃæµÄ HTML
<html> <head> <title>JAQL Query for Chicago Traffic Tracker</title> <script src="jquery.js"></script> <script src="work.js"></script> </script> </head> <body> <h1>JAQL Query Executor</h1>
<div><textarea id="query" name="query" rows="10" columns="80"></textarea></div> <div>Output Filename:<input id="filename" name="filename"/></div> <a href="#" onclick="runquery();">Run Query</a> <div id="status"/> <div id="result"/>
|
ÎÒÃǽ«Ê¹Óà jQuery Ö´ÐÐ AJAX ·ç¸ñµÄÊý¾Ý¼ÓÔØ¡£
»ù±¾½á¹¹ÌṩÁËÁ½¸öÊäÈë¿ò£ºÒ»¸öÓÃÓÚÊäÈë Jaql ²éѯÎı¾£¬ÁíÒ»¸öÓÃÓÚÊäÈë±£´æÊä³öÐÅÏ¢µÄÎļþµÄÃû³Æ¡£ÎÒÃDz»Ï£Íû´Ó½Å±¾½âÎöÎļþÃû³Æ£¬ËùÒÔÎÒÃǽ«µ¥¶À»ñÈ¡Ëü£¬ÒÔÈ·±£ÎÒÃǼÓÔØÁËÕýÈ·µÄÊý¾Ý¡£Ò»¸ö¼òµ¥µÄÁ´½Ó¿ÉÓÃÓÚÖ´Ðвéѯ£¬×´Ì¬ºÍ½á¹û
DIV ÓÃÓÚ±£´æµ±Ç°»î¶¯ºÍ½á¹ûÎļþµÄÐÅÏ¢¡£
Èçͼ 3 ËùʾµÄÒ³Ãæ±¾ÉíÌî³äÁ˽ű¾µÄÐÅÏ¢£¬ÏÔʾÔÚµ¥»÷ Run Query ֮ǰµÄÓ¦ÓóÌÐò¡£

ͼ 3. µ¥»÷ Run Query ֮ǰµÄÓ¦ÓóÌÐò
¸ÃÓ¦ÓóÌÐò±³ºóµÄ JavaScript ±»·ÖΪËĸöÖ÷Òª³ÌÐò¿é¡£
µÚÒ»¿é¶¨ÒåÓ¦ÓóÌÐòËùʹÓõÄһЩȫ¾Ö±äÁ¿£º×÷Òµ ID£¨Ò»µ©ÒѾÌá½»ÁË×÷Òµ£©¡¢ÎļþÃû³Æ£¨ÓÃÓÚ¼ÓÔØ½á¹û£©£¬ÒÔ¼°Ò»¸öÓÃÓÚ±£´æ¼ä¸ô¶¨Ê±Æ÷¶ÔÏóµÄ±äÁ¿£¬Ôڵȴý×÷ÒµÍê³ÉµÄʱºò½«»áʹÓÃËü¡£
±¾½ÚµÄÖ÷º¯ÊýÊÇ runquery() º¯Êý£¬ÔÚµ¥»÷ Run Query ʱ±»µ÷Óá£runquery()
º¯Êý¿ÉÒÔ¸üÐÂ״̬²¢µ÷Óà submitquery() º¯Êý£¬Ëü½«»áʵ¼ÊÖ´ÐÐһЩ¹¤×÷£¬ÈçÇåµ¥ 19 Ëùʾ¡£
Çåµ¥ 19. ¸üÐÂ״̬²¢µ÷Óà submitquery() º¯Êý
var jobid; var checkinterval; var filename;
function runquery() { $('#status').html('Executing remote query'); $('#status').html($('#query').val()); submitquery(); } |
µÚ¶þ¿é°üº¬ submitquery() º¯ÊýµÄ¶¨Òå¡£¸Ãº¯Êý½«»á¶ÁÈ¡ TEXTAREA µÄÄÚÈÝ£¬ÆäÖаüº¬ÒªÖ´ÐеÄ
Jaql ½Å±¾£¬¸Ã½Å±¾Ç¶ÈëÔÚ XML ÖУ¬ÒÔ±ãÔËÐÐÅäÖã¬È»ºóͨ¹ý jQuery ajax() º¯Êý½«×÷ÒµÌá½»¸øÓ¦Ó÷þÎñÆ÷¡£
ÈôÌύʧ°Ü£¬ÎÒÃǽ«»á±¨¸æ´íÎ󣬵«ÊÇÈç¹ûÌá½»³É¹¦ÁË£¬ÎÒÃǾͻá´Ó·µ»ØµÄ JSON ½á¹¹ÖÐÌáÈ¡ Oozie
×÷Òµ ID£¬ÔËÐÐ checkjobstatus() º¯ÊýÀ´»ñµÃµ±Ç°ÔËÐÐ״̬£¬²¢´´½¨Ò»¸ö¼ä¸ô¶¨Ê±Æ÷£¬Ã¿ 5
Ãëµ÷ÓÃÒ»´ÎÏàͬµÄ checkjobstatus() º¯Êý£¬ÒÔ¼ì²é״̬¡£Çë¼Çס£¬×÷ÒµÌá½»ÊÇͨ¹ýÒ»¸ö°üº¬ XML
×÷Òµ¹æ·¶µÄ REST ÇëÇóÍê³ÉµÄ£¬±ØÐë¶Ô¸Ã¹æ·¶½øÐÐתÒ壨ʹÓà encodeURIComponent()£©£¬ÈçÇåµ¥
20 Ëùʾ¡£
Çåµ¥ 20. submitquery() º¯Êý
function submitquery() { var startfrag = "<runconfig><name>Jaql Remote Query</name><appid>3d420497-e1a6-411f-9644-c40db1c290b6 </appid> <properties><property><name>script</name><value paramtype='TEXTAREA'>"; var endfrag = "</value></property></properties></runconfig>";
var jobspec = startfrag + $('#query').val() + endfrag;
remoteurl = "http://192.168.0.20:8080/data/controller/ApplicationManagement ?actiontype=run_application&runconfig=" + encodeURIComponent(jobspec);
$.ajax(remoteurl, { error: function() { $('#status').html("Error submitting job"); }, success: function(data) { jobid = data.result.oozie_id; $('#status').html("Job Submitted:" + data.result.status + " ID:" + jobid); checkjobstatus(); checkinterval = setInterval(checkjobstatus,5000); }, }); } |
checkjobstatus() º¯ÊýʹÓà REST ½Ó¿Ú»ñÈ¡×÷ҵ״̬£¬ËüʹÓÃÁËÉÏÒ»²½ÖÐÌáÈ¡µÄ×÷Òµ ID¡£ÒòΪÕâ¸öº¯Êýÿ¸ô
5 Ãë±»µ÷ÓÃÒ»´Î£¬ËùÒÔËü±ØÐëÊǶÀÁ¢µÄ£¬Ò²¾ÍÊÇ˵£¬Ëü±ØÐëÌá½» REST ÇëÇ󣬸üÐÂ״̬£¬Èç¹û×÷Òµ±»È϶¨ÎªÊdzɹ¦µÄ£¬Ôò¹Ø±Õʱ¼ä¼ä¸ô²¢Ö´Ðк¯Êý£¨getoutputfile()£©£¬ÒÔ±ã¼ìË÷ÓÉ
Jaql ½Å±¾Éú³ÉµÄ²éѯÊä³ö£¬ÈçÇåµ¥ 21 Ëùʾ¡£
Çåµ¥ 21. checkjobstatus() º¯Êý
function checkjobstatus() { var checkurl = "http://192.168.0.20:8280/oozie/v1/job/" + jobid + "?show=info";
$.ajax(checkurl, { error: function() { $('#status').html("Error getting status"); }, success: function(data) { jobstatus = data.status; if (jobstatus) { $('#status').html("Current Job Status:" + jobstatus); if (jobstatus == 'SUCCEEDED') { clearTimeout(checkinterval); getoutputfile(); } } else { $('#status').html("Current Job Status:Unknown"); } }, }); } |
Ôڳɹ¦Ö´Ðнű¾Ö®ºó£¬×îºóÒ»¸öº¯Êý½«»áʹÓà WebHDFS ·ÃÎÊÓɽű¾Éú³ÉµÄÎļþ£¬ÈçÇåµ¥ 22 Ëùʾ¡£
Çåµ¥ 22. getoutputfile() º¯Êý
function getoutputfile() { filename = $('#filename').val(); var fileurl = "http://192.168.0.20:14000/webhdfs/v1/user/biadmin/" + filename + "?user.name=biadmin&op=OPEN";
$.ajax(fileurl, { error: function() { $('#status').html("Error getting result file"); }, success: function(data) { $('#result').html(data); }, }); |
Ö´ÐеÄ˳Ðò»ù±¾ÉϺÜÖ±¹Û£º
1.ÊäÈë²éѯ¡£
2.µ¥»÷ Run Query¡£
3.½«×÷ÒµÌá½»¸øÓ¦Ó÷þÎñÆ÷¡£
4.ͨ¹ý REST ¼ì²é×÷ҵ״̬¡£
5.ÖØ¸´²½Öè 4£¬Ö±µ½×÷ҵ״̬Ϊ "succeeded"¡£
6.ÏÂÔØËùÉú³ÉµÄÊä³öÎļþ²¢ÏÔʾËü¡£
¼ÙÉèÄúµÄ Jaql ½Å±¾Ã»ÓÐÎÊÌ⣬ÄÇôÄúÓ¦¸Ã»ñµÃÀàËÆÓÚͼ 4 µÄÊä³ö¡£

ͼ 4. Êä³ö
½áÊøÓï
InfoSphere BigInsights ×Ô´øÁËһϵÁÐÁîÈËÓ¡ÏóÉî¿ÌµÄÓ¦ÓóÌÐò£¬¿ÉÒÔÅäÖúÍÔËÐÐÄúÐèÒªµÄÈκνű¾¡£Í¨¹ýÀûÓÃÌṩ¸øÕâЩϵͳµÄ±ê×¼
REST ½Ó¿Ú£¬ÎÒÃÇ¿ÉÒÔΪ Hadoop ºÍµ×²ãµÄÊý¾Ý´¦Àíº¯Êý¹¹½¨Ò»¸öÍêÈ«»ùÓÚ Web µÄ½Ó¿Ú£¬ÎÞÐè±àд¸´ÔӵĴúÂë»ò¿ª·¢
MapReduce º¯Êý¡£Ê¹Óà Jaql ΪÎÒÃÇÌṩÁËÔÚÊý¾ÝÉÏÔËÐÐÈÎÒâ²éѯµÄÁé»îÐÔ£¬¼´Ê¹Êý¾ÝûÓб»Ö±½Ó¸ñʽ»¯Îª¿É´¦ÀíµÄ¸ñʽ£¬ÎÒÃÇÒ²¿ÉÒÔͨ¹ýÀàËÆÓÚ
SQL µÄ½Ó¿ÚÇáËɵؽ«Æäת»»³É¿É´¦ÀíµÄ½á¹¹¡£ÀûÓÃÒ»µãµã JavaScript ºÍ jQuery µÄħ·¨£¬Õû¸ö½çÃæ¾Í»á±äµÃ¸üС¡¢¸ü½ô´Õ£¬×ãÒÔÔÚÈκεط½ÔËÐС£
|