Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
hive+pythonÊý¾Ý·ÖÎöÈëÃÅ
 
À´Ô´£ºlife ·¢²¼ÓÚ£º 2015-9-1
  5712  次浏览      28
 

ΪʲôҪʹÓÃhive+pythonÀ´·ÖÎöÊý¾Ý

¾Ù¸öÀý×Ó,

µ±ÄêûÓÐÊý¾Ý¿âµÄʱºò, ÈËÃDZà³ÌÀ´²Ù×÷Îļþϵͳ, ÕâÏ൱ÓÚ ÎÒÃDZàдmapreduceÀ´·ÖÎöÊý¾Ý

ºóÀ´ÓÐÁËÊý¾Ý¿â, ÔÙûÈ˲Ù×÷ÎļþϵͳÁË(³ý·ÇÓÐÆäËüÐèÇó), ¶øÊÇÖ±½ÓʹÓÃsqlºÍһЩÓïÑÔ(php, java, python)À´²Ù×÷Êý¾Ý. Õâ¾ÍÏ൱ÓÚ hive + pythonÁË

hive + pythonÄܽâ¾ö´ó¶àµÄÐèÇó, ³ý·ÇÄãµÄÊý¾ÝÊǷǽṹ»¯Êý¾Ý, ´ËʱÄã¾Í»Øµ½ÁËÔ¶¹Åʱ´ú²»µÃ²»Ð´mapreduceÁË.

¶øÎªÊ²Ã´²»Ê¹ÓÃhive+java, hive+c, hive+...

ÒòΪ:

pythonÕæÊÇÌ«ºÃÓÃÁË, ½Å±¾ÓïÑÔ, ÎÞÐè±àÒë, ÓÐÇ¿´óµÄ»úÆ÷ѧϰ¿â, ÊʺϿÆÑ§¼ÆËã(Õâ¾ÍÊÇÊý¾Ý·ÖÎö°¡!!)

ʹÓÃhive+pythonÀ´·ÖÎöÊý¾Ý

hiveÓëpythonµÄ·Ö¹¤: ʹÓÃhive sql×÷ΪpythonµÄÊý¾ÝÔ´, pythonµÄÊä³ö×÷ΪmapµÄÊä³ö, ÔÙʹÓÃhiveµÄ¾ÛºÏº¯Êý×÷Ϊreduce.

ÏÂÃæÊ¹ÓÃÒ»¸öÀý×ÓÀ´·ÖÎö: ͳ¼ÆÃ¿¸öÈËÔÚijÈÕÆÚÈËϳԵĸ÷ÖÖʳƷµÄÊýÁ¿

½¨±í user_foods Óû§Ê³Æ·±í

hive> create table user_foods (user_id  string, food_type string, datetime string
) partitioned by(dt string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS TEXTFILE

# partitioned by(dt string) ÒÔÈÕÆÚ·ÖÇø
# бÌ岿·Ö±íʾÐÐÓëÐÐÖ®¼äÒÔ\n·Ö¸ô, ×Ö¶ÎÓë×ֶμäÒÔ\t·Ö¸ô.

¸ù¾ÝÒµÎñÐèÒª, ÒòΪÊǰ´ÌìÀ´Í³¼Æ, Ϊ¼õÉÙ·ÖÎöʱµÄÊý¾ÝÁ¿, ÉÏÊöhive±íÒÔdt(ÈÕÆÚ)Ϊ·ÖÇø.

´´½¨Hive±íºó, »áÔÚHDFS /hive/Ŀ¼Ï´´½¨Ò»¸öÓë±íÃûͬÃûµÄÎļþ¼Ð

µ¼ÈëÊý¾Ý

½¨Á¢·ÖÇø

hive> ALTER TABLE user_foods ADD PARTITION(dt='2014-06-07');

´´½¨·ÖÇøºó, hdfsĿ¼/hive/user_foods/϶àÁËÒ»¸ödf='2014-06-07'µÄĿ¼

´´½¨²âÊÔÊý¾Ý

´´½¨Ò»¸öÎļþÈçdata.txt, ¼ÓÈë²âÊÔÊý¾Ý

user_1	food1	2014-06-07 09:00
user_1 food1 2014-06-07 09:02
user_1 food2 2014-06-07 09:00
user_2 food2 2014-06-07 09:00
user_2 food23 2014-06-07 09:00

µ¼ÈëÊý¾Ý

hive> LOAD DATA LOCAL INPATH '/Users/life/Desktop/data.txt' 
OVERWRITE INTO TABLE user_foods PARTITION(dt='2014-06-07');

µ¼Èë³É¹¦ºó, ʹÓÃselect * from user_foods²é¿´ÏÂ.

»òʹÓÃ

hive> select * from user_foods where user_id='user_1'

Õâ»áÉú³ÉÒ»¸ömapreduce

½öʹÓÃhiveÀ´·ÖÎö

"ͳ¼ÆÃ¿¸öÈËÔÚijÈÕÆÚÈËϳԵĸ÷ÖÖʳƷµÄÊýÁ¿" Ì«¹ý¼òµ¥, ²»ÐèÒªpython¾Í¿ÉʵÏÖ:

hive> select user_id, food_type, count(*) from user_foods where dt='2014-06-07' group by user_id, food_type;

½á¹û:

½áºÏʹÓÃpython

Èç¹ûÐèÒª¶ÔÊý¾ÝÇåÏ´»ò¸ü½øÒ»²½´¦Àí, ÄÇô¿Ï¶¨ÐèÒª×Ô¶¨Òåmap, Õâ¾Í¿ÉÒÔʹÓÃpythonÀ´ÊµÏÖÁË.

±ÈÈçfood2Óëfood23ÈÏΪÊÇͬһÀàÐÍʳƷ, ´ËʱÀûÓÃpython½øÐÐÊý¾ÝÇåÏ´, pythonµÄ½Å±¾ÈçÏÂ: (m.py)

#!/usr/bin/env python
#encoding=utf-8

import sys

if __name__=="__main__":

# ½âÎöÿһÐÐÊý¾Ý
for line in sys.stdin:
# ÂÔ¹ý¿ÕÐÐ
if not line or not line.strip():
continue

# ÕâÀïÓÃtry ±ÜÃâÌØÊâÐнâÎö´íÎóµ¼ÖÂÈ«²¿³ö´í
try:
userId, foodType, dt = line.strip().split("\t")
except:
continue

# ÇåÏ´Êý¾Ý, ¿ÕÊý¾ÝÂÔ¹ý
if userId == '' or foodType == '':
continue

# ÇåÏ´Êý¾Ý
if(foodType == "food23"):
foodType = "food2"

# Êä³ö, ÒÔ\t·Ö¸ô, ¼´mapµÄÊä³ö
print userId + "\t" + foodType

ÔÙʹÓÃhql½áºÏpython½Å±¾À´·ÖÎö:

1. ¼ÓÈëpython½Å±¾, Ï൱ÓÚdistributed cache

2. Ö´ÐÐ, ʹÓÃtrnsformºÍusing

hive> add file /Users/life/Desktop/m.py;
hive> select user_id, food_type, count(*) from (
select transform (user_id, food_type, datetime) using 'python m.py' as (user_id, food_type)
from user_foods where dt='2014-06-07'
) tmp group by user_id, food_type;

½á¹û:

python½Å±¾µ÷ÊÔ½¨Òé

1. Ê×Ïȱ£Ö¤½Å±¾Ã»ÓÐÓï·¨´íÎó, ¿ÉÒÔÖ´ÐÐpython m.pyÀ´ÑéÖ¤

2. È·±£´úÂëûÓÐÆäËüÊä³ö

3. ¿ÉÒÔʹÓòâÊÔÊý¾ÝÀ´²âÊԽű¾, ±ÈÈç:

$> cat data.txt | python m.py
user_1 food1
user_1 food1
user_1 food2
user_2 food2
user_2 food2

1, 2, 3¶¼ÕýÈ·ºó, Èç¹ûÔÙʹÓÃhive+pythonÓдíÎó, ¿ÉÄܵĴíÎóÓÐ:

1. python½Å±¾¶ÔÊý¾ÝµÄ´¦Àí²»½¡×³, ÓÐЩ±ß½çÌõ¼þûÓп¼ÂÇ, µ¼ÖÂpython³öÏÖexception

2. ×Ô¼º×ܽá°É

ÆäËü

ÉÏÃæÕâ¸öÀý×ÓµÄpython½Å±¾³äµ±mapµÄ½ÇÉ«, µ±È»Ò²¿ÉÒÔÔÙ½¨Á¢Ò»¸öreduce.pyÀ´Í³¼ÆmapµÄÊä³ö¶ø²»Ê¹ÓÃhiveµÄ¾ÛºÏº¯Êý.

ÕâÊǽ¨Á¢ÔÚhiveÒѲ»ÄÜÂú×ãÄãµÄÐèÇóÖ®ÉϵÄ.

   
5712 ´Îä¯ÀÀ       28
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
DeepSeekÔÚÈí¼þ²âÊÔÓ¦ÓÃʵ¼ù 4-12[ÔÚÏß]
DeepSeek´óÄ£ÐÍÓ¦Óÿª·¢Êµ¼ù 4-19[ÔÚÏß]
UAF¼Ü¹¹ÌåϵÓëʵ¼ù 4-11[±±¾©]
AIÖÇÄÜ»¯Èí¼þ²âÊÔ·½·¨Óëʵ¼ù 5-23[ÉϺ£]
»ùÓÚ UML ºÍEA½øÐзÖÎöÉè¼Æ 4-26[±±¾©]
ÒµÎñ¼Ü¹¹Éè¼ÆÓ뽨ģ 4-18[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí