Äú¿ÉÒÔ¾èÖú£¬Ö§³ÖÎÒÃǵĹ«ÒæÊÂÒµ¡£

1Ôª 10Ôª 50Ôª





ÈÏÖ¤Â룺  ÑéÖ¤Âë,¿´²»Çå³þ?Çëµã»÷Ë¢ÐÂÑéÖ¤Âë ±ØÌî



  ÇóÖª ÎÄÕ ÎÄ¿â Lib ÊÓÆµ iPerson ¿Î³Ì ÈÏÖ¤ ×Éѯ ¹¤¾ß ½²×ù Modeler   Code  
»áÔ±   
 
   
 
 
     
   
 ¶©ÔÄ
  ¾èÖú
Spark SQLÉî¶ÈÀí½â
 
×÷Õß ÕŰü·å  »ðÁú¹ûÈí¼þ  ·¢²¼ÓÚ 2014-07-17
  11034  次浏览      31
 

Catalyst

CatalystÊÇÓëSpark½âñîµÄÒ»¸ö¶ÀÁ¢¿â£¬ÊÇÒ»¸öimpl-freeµÄÖ´Ðмƻ®µÄÉú³ÉºÍÓÅ»¯¿ò¼Ü¡£

ĿǰÓëSpark Core»¹ÊÇñîºÏµÄ£¬¶Ô´ËuserÓʼþ×éÀïÓÐÈ˶ԴËÌá³öÒÉÎÊ£¬¼ûmail¡£

ÒÔÏÂÊÇCatalyst½ÏÔçʱºòµÄ¼Ü¹¹Í¼£¬Õ¹Ê¾µÄÊÇ´úÂë½á¹¹ºÍ´¦ÀíÁ÷³Ì¡£

Catalyst¶¨Î»

ÆäËûϵͳÈç¹ûÏë»ùÓÚSpark×öһЩÀàsql¡¢±ê×¼sqlÉõÖÁÆäËû²éѯÓïÑԵIJéѯ£¬ÐèÒª»ùÓÚCatalystÌṩµÄ½âÎöÆ÷¡¢Ö´Ðмƻ®Ê÷½á¹¹¡¢Âß¼­Ö´Ðмƻ®µÄ´¦Àí¹æÔòÌåϵµÈÀàÌåϵÀ´ÊµÏÖÖ´Ðмƻ®µÄ½âÎö¡¢Éú³É¡¢ÓÅ»¯¡¢Ó³É乤×÷¡£

¶ÔÓ¦ÉÏͼÖУ¬Ö÷ÒªÊÇ×ó²àµÄTreeNodelib¼°ÖмäÈý´Îת»¯¹ý³ÌÖÐÉæ¼°µ½µÄÀà½á¹¹¶¼ÊÇCatalystÌṩµÄ¡£ÖÁÓÚÓÒ²àÎïÀíÖ´Ðмƻ®Ó³ÉäÉú³É¹ý³Ì£¬ÎïÀíÖ´Ðмƻ®»ùÓڳɱ¾µÄÓÅ»¯Ä£ÐÍ£¬¾ßÌåÎïÀíËã×ÓµÄÖ´Ðж¼ÓÉϵͳ×Ô¼ºÊµÏÖ¡£

CatalystÏÖ×´

ÔÚ½âÎöÆ÷·½ÃæÌṩµÄÊÇÒ»¸ö¼òµ¥µÄscalaдµÄsql parser£¬Ö§³ÖÓïÒåÓÐÏÞ£¬¶øÇÒÓ¦¸ÃÊDZê×¼sqlµÄ¡£

ÔÚ¹æÔò·½Ã棬ÌṩµÄÓÅ»¯¹æÔòÊDZȽϻù´¡µÄ(ºÍPig/Hive±ÈûÓÐÄÇô·á¸»)£¬²»¹ýһЩÓÅ»¯¹æÔòÆäʵÊÇ񻃾¼°µ½¾ßÌåÎïÀíËã×ӵģ¬ËùÒÔ²¿·Ö¹æÔòÐèÒªÔÚϵͳ·½ÄÇ×Ô¼ºÖƶ¨ºÍʵÏÖ(Èçspark-sqlÀïµÄSparkStrategy)¡£

CatalystÒ²ÓÐ×Ô¼ºµÄÒ»Ì×Êý¾ÝÀàÐÍ¡£

ÏÂÃæ½éÉÜCatalystÀXÌ×ÖØÒªµÄÀà½á¹¹¡£

TreeNodeÌåϵ

TreeNodeÊÇCatalystÖ´Ðмƻ®±íʾµÄÊý¾Ý½á¹¹£¬ÊÇÒ»¸öÊ÷½á¹¹£¬¾ß±¸Ò»Ð©scala collectionµÄ²Ù×÷ÄÜÁ¦ºÍÊ÷±éÀúÄÜÁ¦¡£Õâ¿ÃÊ÷Ò»Ö±ÔÚÄÚ´æÀïά»¤£¬²»»ádumpµ½´ÅÅÌÒÔijÖÖ¸ñʽµÄÎļþ´æÔÚ£¬ÇÒÎÞÂÛÔÚÓ³ÉäÂß¼­Ö´Ðмƻ®½×¶Î»¹ÊÇÓÅ»¯Âß¼­Ö´Ðмƻ®½×¶Î£¬Ê÷µÄÐÞ¸ÄÊÇÒÔÌæ»»ÒÑÓнڵãµÄ·½Ê½½øÐеġ£

TreeNode£¬ÄÚ²¿´øÒ»¸öchildren: Seq[BaseType]±íʾº¢×ӽڵ㣬¾ß±¸foreach¡¢map¡¢collectµÈÕë¶Ô½Úµã²Ù×÷µÄ·½·¨£¬ÒÔ¼°transformDown(ĬÈÏ£¬Ç°Ðò±éÀú)¡¢transformUpÕâÑùµÄ±éÀúÊ÷ÉϽڵ㣬¶ÔÆ¥Åä½Úµãʵʩ±ä»¯µÄ·½·¨¡£

ÌṩUnaryNode,BinaryNode, LeafNodeÈýÖÖtrait£¬¼´·ÇÒ¶×Ó½ÚµãÔÊÐíÓÐÒ»¸ö»òÁ½¸ö×ӽڵ㡣

TreeNodeÌṩµÄÊÇ·¶ÐÍ¡£

TreeNodeÓÐÁ½¸ö×ÓÀà¼Ì³ÐÌåϵ£¬QueryPlanºÍExpression¡£QueryPlanÏÂÃæÊÇÂß¼­ºÍÎïÀíÖ´Ðмƻ®Á½¸öÌåϵ£¬Ç°ÕßÔÚCatalystÀïÓÐÏêϸʵÏÖ£¬ºóÕßÐèÒªÔÚϵͳ×Ô¼ºÊµÏÖ¡£ExpressionÊDZí´ïʽÌåϵ£¬ºóÃæÕ½ڶ¼»áÕ¹¿ª½éÉÜ¡£

TreeµÄtransformationʵÏÖ£º

´«ÈëPartialFunction[TreeType,TreeType]£¬Èç¹ûÓë²Ù×÷·ûÆ¥Å䣬Ôò½Úµã»á±»½á¹ûÌæ»»µô£¬·ñÔò½Úµã²»»á±ä¶¯¡£Õû¸ö¹ý³ÌÊǶÔchildrenµÝ¹éÖ´Ðеġ£

Ö´Ðмƻ®±íʾģÐÍ

Âß¼­Ö´Ðмƻ®

QueryPlan¼Ì³Ð×ÔTreeNode£¬ÄÚ²¿´øÒ»¸öoutput: Seq[Attribute],¾ß±¸transformExpressionDown¡¢transformExpressionUp·½·¨¡£

ÔÚCatalystÖУ¬QueryPlanµÄÖ÷Òª×ÓÀàÌåϵÊÇLogicalPlan£¬¼´Âß¼­Ö´Ðмƻ®±íʾ¡£ÆäÎïÀíÖ´Ðмƻ®±íʾÓÉʹÓ÷½ÊµÏÖ(spark-sqlÏîÄ¿ÖÐ)¡£

LogicalPlan¼Ì³Ð×ÔQueryPlan£¬ÄÚ²¿´øÒ»¸öreference:Set[Attribute]£¬Ö÷Òª·½·¨Îªresolve(name:String): Option[NamedeExpression]£¬ÓÃÓÚ·ÖÎöÉú³É¶ÔÓ¦µÄNamedExpression¡£

LogicalPlanÓÐÐí¶à¾ßÌå×ÓÀ࣬Ҳ·ÖΪUnaryNode, BinaryNode, LeafNodeÈýÀ࣬¾ßÌåÔÚorg.apache.spark.sql.catalyst.plans.logical·¾¶Ï¡£

Âß¼­Ö´Ðмƻ®ÊµÏÖ

LeafNodeÖ÷Òª×ÓÀàÊÇCommandÌåϵ£º

¸÷commandµÄÓïÒå¿ÉÒÔ´Ó×ÓÀàÃû×Ö¿´³ö£¬´ú±íµÄÊÇϵͳ¿ÉÒÔÖ´ÐеÄnon-queryÃüÁÈçDDL¡£

UnaryNodeµÄ×ÓÀࣺ

BinaryNodeµÄ×ÓÀࣺ

ÎïÀíÖ´Ðмƻ®

ÁíÒ»·½Ã棬ÎïÀíÖ´Ðмƻ®½ÚµãÔÚ¾ßÌåϵͳÀïʵÏÖ£¬±ÈÈçspark-sql¹¤³ÌÀïµÄSparkPlan¼Ì³ÐÌåϵ¡£

ÎïÀíÖ´Ðмƻ®ÊµÏÖ

ÿ¸ö×ÓÀ඼ҪʵÏÖexecute()·½·¨£¬´óÖÂÓÐÒÔÏÂʵÏÖ×ÓÀà(²»È«)¡£

Ìáµ½ÎïÀíÖ´Ðмƻ®£¬»¹ÒªÌáÒ»ÏÂCatalystÌṩµÄ·ÖÇø±íʾģÐÍ¡£

Ö´Ðмƻ®Ó³Éä

Catalyst»¹ÌṩÁËÒ»¸öQueryPlanner[Physical <: TreeNode[PhysicalPlan]]³éÏóÀ࣬ÐèÒª×ÓÀàÖÆ¶¨Ò»Åústrategies: Seq[Strategy]£¬Æäapply·½·¨Ò²ÊÇÀàËÆ¸ù¾ÝÖÆ¶¨µÄ¾ßÌå²ßÂÔÀ´°ÑÂß¼­Ö´Ðмƻ®Ëã×ÓÓ³Éä³ÉÎïÀíÖ´Ðмƻ®Ëã×Ó¡£ÓÉÓÚÎïÀíÖ´Ðмƻ®µÄ½ÚµãÊÇÔÚ¾ßÌåϵͳÀïʵÏֵģ¬ËùÒÔQueryPlanner¼°ÀïÃæµÄstrategiesÒ²ÐèÒªÔÚ¾ßÌåϵͳÀïʵÏÖ¡£

ÔÚspark-sqlÏîÄ¿ÖУ¬SparkStrategies¼Ì³ÐÁËQueryPlanner[SparkPlan]£¬ÄÚ²¿Öƶ¨ÁËLeftSemiJoin, HashJoin,PartialAggregation, BroadcastNestedLoopJoin, CartesianProductµÈ¼¸ÖÖ²ßÂÔ£¬Ã¿ÖÖ²ßÂÔ½ÓÊܵͼÊÇÒ»¸öLogicalPlan£¬Éú³ÉµÄÊÇSeq[SparkPlan]£¬Ã¿¸öSparkPlanÀí½âΪ¾ßÌåRDDµÄËã×Ó²Ù×÷¡£

±ÈÈçÔÚBasicOperatorsÕâ¸öStrategyÀÒÔmatch-caseÆ¥ÅäµÄ·½Ê½´¦ÀíÁ˺ܶà»ù±¾Ëã×Ó£¨¿ÉÒÔÒ»¶ÔÒ»Ö±½ÓÓ³Éä³ÉRDDËã×Ó£©£¬ÈçÏ£º

ExpressionÌåϵ

Expression£¬¼´±í´ïʽ£¬Ö¸²»ÐèÒªÖ´ÐÐÒýÇæ¼ÆË㣬¶ø¿ÉÒÔÖ±½Ó¼ÆËã»ò´¦ÀíµÄ½Úµã£¬°üÀ¨Cast²Ù×÷£¬Projection²Ù×÷£¬ËÄÔòÔËË㣬Âß¼­²Ù×÷·ûÔËËãµÈ¡£

¾ßÌå¿ÉÒԲο¼org.apache.spark.sql.expressionspackageϵÄÀà¡£

RulesÌåϵ

·²ÊÇÐèÒª´¦ÀíÖ´Ðмƻ®Ê÷(Analyze¹ý³Ì£¬Optimize¹ý³Ì£¬SparkStrategy¹ý³Ì)£¬ÊµÊ©¹æÔòÆ¥ÅäºÍ½Úµã´¦ÀíµÄ£¬¶¼ÐèÒª¼Ì³ÐRuleExecutor[TreeType]³éÏóÀà¡£

RuleExecutorÄÚ²¿ÌṩÁËÒ»¸öSeq[Batch]£¬ÀïÃæ¶¨ÒåµÄÊǸÃRuleExecutorµÄ´¦Àí²½Ö衣ÿ¸öBatch´ú±í×ÅÒ»Ì×¹æÔò£¬Å䱸һ¸ö²ßÂÔ£¬¸Ã²ßÂÔ˵Ã÷Á˵ü´ú´ÎÊý(Ò»´Î»¹ÊǶà´Î)¡£

Rule[TreeType <: TreeNode[_]]ÊÇÒ»¸ö³éÏóÀ࣬×ÓÀàÐèÒª¸´Ð´apply(plan: TreeType)·½·¨À´Öƶ¨´¦ÀíÂß¼­¡£

RuleExecutorµÄapply(plan: TreeType): TreeType·½·¨»á°´ÕÕbatches˳ÐòºÍbatchÄÚµÄRules˳Ðò£¬¶Ô´«ÈëµÄplanÀïµÄ½Úµãµü´ú´¦Àí£¬´¦ÀíÂß¼­ÎªÓɾßÌåRule×ÓÀàʵÏÖ¡£

HiveÏà¹Ø

HiveÖ§³Ö·½Ê½

Spark SQL¶ÔhiveµÄÖ§³ÖÊǵ¥¶ÀµÄspark-hiveÏîÄ¿£¬¶ÔHiveµÄÖ§³Ö°üÀ¨HQL²éѯ¡¢hive metaStoreÐÅÏ¢¡¢hive SerDes¡¢hive UDFs/UDAFs/ UDTFs£¬ÀàËÆShark¡£

Ö»ÓÐÔÚHiveContextÏÂͨ¹ýhive api»ñµÃµÄÊý¾Ý¼¯£¬²Å¿ÉÒÔʹÓÃhql½øÐвéѯ£¬ÆähqlµÄ½âÎöÒÀÀµµÄÊÇorg.apache.hadoop.hive.ql.parse.ParseDriverÀàµÄparse·½·¨£¬Éú³ÉHive AST¡£

ʵ¼ÊÉÏsqlºÍhql£¬²¢²»ÊÇÒ»ÆðÖ§³ÖµÄ¡£¿ÉÒÔÀí½âΪhqlÊǶÀÁ¢Ö§³ÖµÄ£¬Äܱ»hql²éѯµÄÊý¾Ý¼¯±ØÐë¶ÁÈ¡×Ôhive api¡£ÏÂͼÖеÄparquet¡¢jsonµÈÆäËûÎļþÖ§³ÖÖ»·¢ÉúÔÚsql»·¾³ÏÂ(SQLContext)¡£

Hive on Spark

Hive¹Ù·½Ìá³öÁËHive onSparkµÄJIRA¡£Shark½áÊøÖ®ºó£¬²ð·ÖΪÁ½¸ö·½Ïò£º

´ÓÕâÀï¿´£¬¶ÔHiveµÄ¼æÈÝÖ§³Ö½«×ªÒƵ½Hive on SparkÉÏ£¬Ö®Ç°SharkµÄ¾­Ñ齫ÔÚHiveÉçÇøµÄÕâ¸öÖ§³ÖÉÏÌåÏÖ¡£ÎÒÀí½â£¬Ä¿Ç°SparkSQLÀïµÄÄÇÖÖHiveÖ§³Ö·½Ê½£¬Ö»ÊÇΪÁËÔÚSpark»·¾³Ï¼¯³É²Ù×ÝHiveÊý¾Ý£¬ËüµÄhqlÖ´ÐÐÊǵ÷ÓÃHive¿Í»§¶ËDriver£¬ÅÜÔÚhadoop MRÉϵ쬱¾Éí²»ÊÇHive on SparkµÄʵÏÖ£¬Ö»ÊÇΪÁËʹÓÃRDD¼ä½Ó²Ù×÷HiveÊý¾Ý¼¯¡£

ËùÒÔÈç¹ûÏëÒª°ÑÏÖÓÐHiveÈÎÎñÇ¨ÒÆµ½SparkÉÏ£¬Ó¦¸ÃʹÓÃShark»òÕߵȴýHive on Spark¡£

Spark SQLÀïµÄHiveÖ§³Ö²»ÊÇhive on sparkµÄʵÏÖ£¬¶ø¸üÏñÒ»¸ö¶ÁдHiveÊý¾ÝµÄ¿Í»§¶Ë¡£ÇÒÆähqlÖ§³ÖÖ»°üº¬hiveÊý¾Ý£¬Óësql»·¾³ÊÇ»¥Ïà¶ÀÁ¢µÄ¡£

ÒÔÉÏÁ½½ÚÊÇSpark SQL Hive¡¢Shark¡¢Hive on SparkµÄÇø±ðºÍÀí½â¡£

SQL Core

Spark SQLµÄºËÐÄÊǰÑÒÑÓеÄRDD£¬´øÉÏSchemaÐÅÏ¢£¬È»ºó×¢²á³ÉÀàËÆsqlÀïµÄ¡±Table¡±£¬¶ÔÆä½øÐÐsql²éѯ¡£ÕâÀïÃæÖ÷Òª·ÖÁ½²¿·Ö£¬Ò»ÊÇÉú³ÉSchemaRD£¬¶þÊÇÖ´Ðвéѯ¡£

Éú³ÉSchemaRDD

Èç¹ûÊÇspark-hiveÏîÄ¿£¬ÄÇô¶ÁÈ¡metadataÐÅÏ¢×÷ΪSchema¡¢¶ÁÈ¡hdfsÉÏÊý¾ÝµÄ¹ý³Ì½»¸øHiveÍê³É£¬È»ºó¸ù¾ÝÕâÁ©²¿·ÖÉú³ÉSchemaRDD£¬ÔÚHiveContextϽøÐÐhql()²éѯ¡£

¶ÔÓÚSpark SQLÀ´Ëµ£¬

Êý¾Ý·½Ã棬RDD¿ÉÒÔÀ´×ÔÈκÎÒÑÓеÄRDD£¬Ò²¿ÉÒÔÀ´×ÔÖ§³ÖµÄµÚÈý·½¸ñʽ£¬Èçjson file¡¢parquet file¡£

SQLContextÏ»á°Ñ´øcase classµÄRDDÒþʽת»¯ÎªSchemaRDD

ExsitingRddµ¥ÀýÀï»á·´Éä³öcase classµÄattributes£¬²¢°ÑRDDµÄÊý¾Ýת»¯³ÉCatalystµÄGenericRow£¬×îºó·µ»ØRDD[Row]£¬¼´Ò»¸öSchemaRDD¡£ÕâÀïµÄ¾ßÌåת»¯Âß¼­¿ÉÒԲο¼ExsitingRddµÄproductToRowRddºÍconvertToCatalyst·½·¨¡£

Ö®ºó¿ÉÒÔ½øÐÐSchemaRDDÌṩµÄ×¢²átable²Ù×÷¡¢Õë¶ÔSchema¸´Ð´µÄ²¿·ÖRDDת»¯²Ù×÷¡¢DSL²Ù×÷¡¢saveAs²Ù×÷µÈµÈ¡£

RowºÍGenericRowÊÇCatalystÀïµÄÐбíʾģÐÍ

RowÓÃSeq[Any]À´±íʾvalues£¬GenericRowÊÇRowµÄ×ÓÀ࣬ÓÃÊý×é±íʾvalues¡£RowÖ§³ÖÊý¾ÝÀàÐͰüÀ¨Int, Long, Double, Float, Boolean, Short, Byte, String¡£Ö§³Ö°´ÐòÊý(ordinal)¶Áȡijһ¸öÁеÄÖµ¡£¶ÁȡǰÐèÒª×öisNullAt(i: Int)µÄÅжϡ£

¸÷×Ô¶¼ÓÐMutableÀ࣬ÌṩsetXXX(i: int, value: Any)ÐÞ¸ÄijÐòÊýÉϵÄÖµ¡£

²ã´Î½á¹¹

ÏÂͼ´óÖ¶ԱÈÁËPig£¬Spark SQL£¬SharkÔÚʵÏÖ²ã´ÎÉϵÄÇø±ð£¬½ö×ö²Î¿¼¡£

²éѯÁ÷³Ì

SQLContextÀï¶ÔsqlµÄÒ»¸ö½âÎöºÍÖ´ÐÐÁ÷³Ì£º

1.µÚÒ»²½parseSql(sql: String)£¬simple sql parser×ö´Ê·¨Óï·¨½âÎö£¬Éú³ÉLogicalPlan¡£

2.µÚ¶þ²½analyzer(logicalPlan)£¬°Ñ×öÍê´Ê·¨Óï·¨½âÎöµÄÖ´Ðмƻ®½øÐгõ²½·ÖÎöºÍÓ³É䣬

ĿǰSQLContextÄÚµÄAnalyzerÓÉCatalystÌṩ£¬¶¨ÒåÈçÏ£º

new Analyzer(catalog, EmptyFunctionRegistry, caseSensitive =true)

catalogΪSimpleCatalog£¬catalogÊÇÓÃÀ´×¢²átableºÍ²éѯrelationµÄ¡£

¶øÕâÀïµÄFunctionRegistry²»Ö§³ÖlookupFunction·½·¨£¬ËùÒÔ¸Ãanalyzer²»Ö§³ÖFunction×¢²á£¬¼´UDF¡£

AnalyzerÄÚ¶¨ÒåÁ˼¸Åú¹æÔò£º

3.´ÓµÚ¶þ²½µÃµ½µÄÊdzõ²½µÄlogicalPlan£¬½ÓÏÂÀ´µÚÈý²½ÊÇoptimizer(plan)¡£

OptimizerÀïÃæÒ²ÊǶ¨ÒåÁ˼¸Åú¹æÔò£¬»á°´Ðò¶ÔÖ´Ðмƻ®½øÐÐÓÅ»¯²Ù×÷¡£

4.ÓÅ»¯ºóµÄÖ´Ðмƻ®£¬»¹Òª¶ª¸øSparkPlanner´¦Àí£¬ÀïÃæ¶¨ÒåÁËһЩ²ßÂÔ£¬Ä¿µÄÊǸù¾ÝÂß¼­Ö´Ðмƻ®Ê÷Éú³É×îºó¿ÉÒÔÖ´ÐеÄÎïÀíÖ´Ðмƻ®Ê÷£¬¼´µÃµ½SparkPlan¡£

5.ÔÚ×îÖÕÕæÕýÖ´ÐÐÎïÀíÖ´Ðмƻ®Ç°£¬×îºó»¹Òª½øÐÐÁ½´Î¹æÔò£¬SQLContextÀﶨÒåÕâ¸ö¹ý³Ì½ÐprepareForExecution£¬Õâ¸ö²½ÖèÊǶîÍâÔö¼ÓµÄ£¬Ö±½Ónew RuleExecutor[SparkPlan]½øÐеġ£

6.×îºóµ÷ÓÃSparkPlanµÄexecute()Ö´ÐмÆËã¡£Õâ¸öexecute()ÔÚÿÖÖSparkPlanµÄʵÏÖÀﶨÒ壬һ°ã¶¼»áµÝ¹éµ÷ÓÃchildrenµÄexecute()·½·¨£¬ËùÒԻᴥ·¢Õû¿ÃTreeµÄ¼ÆËã¡£

ÆäËûÌØÐÔ

ÄÚ´æÁд洢

SQLContextÏÂcache/uncache tableµÄʱºò»áµ÷ÓÃÁд洢ģ¿é¡£

¸ÃÄ£¿é½è¼ø×ÔShark£¬Ä¿µÄÊǵ±°Ñ±íÊý¾ÝcacheÔÚÄÚ´æµÄʱºò×öÐÐתÁвÙ×÷£¬ÒÔ±ãѹËõ¡£

ʵÏÖÀà

InMemoryColumnarTableScanÀàÊÇSparkPlan LeafNodeµÄʵÏÖ£¬¼´ÊÇÒ»¸öÎïÀíÖ´Ðмƻ®¡£´«ÈëÒ»¸öSparkPlan(È·ÈÏÁ˵ÄÎïÀíÖ´ÐмÆ)ºÍÒ»¸öÊôÐÔÐòÁУ¬ÄÚ²¿°üº¬Ò»¸öÐÐתÁС¢´¥·¢¼ÆËã²¢cacheµÄ¹ý³Ì(ÇÒÊÇlazyµÄ)¡£

ColumnBuilderÕë¶Ô²»Í¬µÄÊý¾ÝÀàÐÍ(boolean, byte, double, float, int, long, short, string)Óɲ»Í¬µÄ×ÓÀà°ÑÊý¾Ýдµ½ByteBufferÀ¼´°ü×°RowµÄÿ¸öfield£¬Éú³ÉColumns¡£ÓëÆä¶ÔÓ¦µÄColumnAccessorÊÇ·ÃÎÊcolumn£¬½«Æäת»ØRow¡£

CompressibleColumnBuilderºÍCompressibleColumnAccessorÊÇ´øÑ¹ËõµÄÐÐÁÐת»»builder£¬ÆäByteBufferÄÚ²¿´æ´¢½á¹¹ÈçÏÂ

CompressionScheme×ÓÀàÊDz»Í¬µÄѹËõʵÏÖ

¶¼ÊÇscalaʵÏֵģ¬Î´½èÖúµÚÈý·½¿â¡£²»Í¬µÄʵÏÖ£¬Ö¸¶¨ÁËÖ§³ÖµÄcolumn dataÀàÐÍ¡£ÔÚbuild()µÄʱºò£¬»á±È½ÏÿÖÖѹËõ£¬Ñ¡ÔñѹËõÂÊ×îСµÄ£¨ÈôÈÔ´óÓÚ0.8¾Í²»Ñ¹ËõÁË£©¡£

ÕâÀïµÄ¹ÀËãÂß¼­£¬À´×Ô×ÓÀàʵÏÖµÄgatherCompressibilityStats·½·¨¡£

CacheÂß¼­

cache֮ǰ£¬ÐèÒªÏȰѱ¾´ÎcacheµÄtableµÄÎïÀíÖ´Ðмƻ®Éú³É³öÀ´¡£

ÔÚcacheÕâ¸ö¹ý³ÌÀInMemoryColumnarTableScan²¢Ã»Óд¥·¢Ö´ÐУ¬µ«ÊÇÉú³ÉÁËÒÔInMemoryColumnarTableScanΪÎïÀíÖ´Ðмƻ®µÄSparkLogicalPlan£¬²¢´æ³ÉtableµÄplan¡£

ÆäʵÔÚcacheµÄʱºò£¬Ê×ÏÈÈ¥catalogÀïѰÕÒÕâ¸ötableµÄÐÅÏ¢ºÍtableµÄÖ´Ðмƻ®£¬È»ºó»á½øÐÐÖ´ÐУ¨Ö´Ðе½ÎïÀíÖ´Ðмƻ®Éú³É£©£¬È»ºó°ÑÕâ¸ötableÔÙ·Å»ØcatalogÀïά»¤ÆðÀ´£¬Õâ¸öʱºòµÄÖ´Ðмƻ®ÒѾ­ÊÇ×îÖÕÒªÖ´ÐеÄÎïÀíÖ´Ðмƻ®ÁË¡£µ«ÊÇ´ËʱColumnerÄ£¿éÏà¹ØµÄת»»µÈ²Ù×÷¶¼ÊÇûÓд¥·¢µÄ¡£

ÕæÕýµÄ´¥·¢»¹ÊÇÔÚexecute()µÄʱºò£¬Í¬ÆäËûSparkPlanµÄexecute()·½·¨´¥·¢³¡¾°ÊÇÒ»ÑùµÄ¡£

UncacheÂß¼­

UncacheTableµÄʱºò£¬³ýÁËɾ³ýcatalogÀïµÄtableÐÅÏ¢Ö®Í⣬»¹µ÷ÓÃÁËInMemoryColumnarTableScanµÄcacheColumnBuffers·½·¨£¬µÃµ½RDD¼¯ºÏ£¬²¢½øÐÐÁËunpersist()²Ù×÷¡£cacheColumnBuffersÖ÷Òª×öÁ˰ÑRDDÿ¸öpartitionÀïµÄROWµÄÿ¸öField´æµ½ÁËColumnBuilderÄÚ¡£

UDF£¨Ôݲ»Ö§³Ö£©

ÈçÇ°Ãæ¶ÔSQLContextÀïAnalyzerµÄ·ÖÎö£¬ÆäFunctionRegistryûÓÐʵÏÖlookupFunction¡£

ÔÚspark-hiveÏîÄ¿ÀHiveContextÀïÊÇʵÏÖÁËFunctionRegistryÕâ¸ötraitµÄ£¬ÆäʵÏÖΪHiveFunctionRegistry£¬ÊµÏÖÂß¼­¼ûorg.apache.spark.sql.hive.hiveUdfs

JSONÖ§³Ö

SQLContextÏ£¬Ôö¼ÓÁËjsonFileµÄ¶ÁÈ¡·½·¨£¬¶øÇÒĿǰ¿´£¬´úÂëÀïʵÏÖµÄÊÇhadoop textfileµÄ¶ÁÈ¡£¬Ò²¾ÍÊÇÕâ·ÝjsonÎļþÓ¦¸ÃÊÇÔÚHDFSÉϵġ£¾ßÌåÕâ·ÝjsonÎļþµÄÔØÈ룬InputFormatÊÇTextInputFormat£¬key classÊÇLongWritable£¬value classÊÇText£¬×îºóµÃµ½µÄÊÇvalue²¿·ÖµÄÄǶÎStringÄÚÈÝ£¬¼´RDD[String]¡£

¶ÁÈ¡jsonÎļþÖ®ºó£¬×ª»»³ÉSchemaRDD¡£JsonRDD.inferSchema(RDD[String])ÀïÓÐÏêϸµÄ½âÎöjsonºÍÓ³Éä³öschemaµÄ¹ý³Ì£¬×îºóµÃµ½¸ÃjsonµÄLogicalPlan¡£

JsonµÄ½âÎöʹÓõÄÊÇFasterXML/jackson-databind¿â£¬GitHubµØÖ·£¬wiki

°ÑÊý¾ÝÓ³Éä³ÉMap[String, Any]

JsonµÄÖ§³Ö·á¸»ÁËSpark SQLÊý¾Ý½ÓÈ볡¾°¡£

JDBCÖ§³Ö

Jdbc support branchis under going

SQL92

Spark SQLĿǰµÄSQLÓï·¨Ö§³ÖÇé¿ö¼ûSqlParserÀࡣĿ±êÊÇÖ§³ÖSQL92£¿£¿

1.»ù±¾Ó¦ÓÃÉÏ£¬sql server ºÍoracle¶¼×ñÑ­sql 92Óï·¨±ê×¼¡£

2.ʵ¼ÊÓ¦ÓÃÖдó¼Ò¶¼»á³¬³öÒÔÉϱê×¼£¬Ê¹Óø÷¼ÒÊý¾Ý¿â³§É̶¼ÌṩµÄ·á¸»µÄ×Ô¶¨Òå±ê×¼º¯Êý¿âºÍÓï·¨¡£

3.΢Èísql serverµÄsql À©Õ¹½ÐT-SQL(Transcate SQL).

4.Oracle µÄsql À©Õ¹½ÐPL-SQL.

×ܽá

ÒÔÉÏÕûÀíÁ˶ÔSpark SQL¸÷¸öÄ£¿éµÄʵÏÖÇé¿ö£¬´úÂë½á¹¹£¬Ö´ÐÐÁ÷³ÌÒÔ¼°×Ô¼º¶ÔSpark SQLµÄÀí½â¡£

   
11034 ´Îä¯ÀÀ       31
Ïà¹ØÎÄÕÂ

»ùÓÚEAµÄÊý¾Ý¿â½¨Ä£
Êý¾ÝÁ÷½¨Ä££¨EAÖ¸ÄÏ£©
¡°Êý¾Ýºþ¡±£º¸ÅÄî¡¢ÌØÕ÷¡¢¼Ü¹¹Óë°¸Àý
ÔÚÏßÉ̳ÇÊý¾Ý¿âϵͳÉè¼Æ ˼·+Ч¹û
 
Ïà¹ØÎĵµ

GreenplumÊý¾Ý¿â»ù´¡Åàѵ
MySQL5.1ÐÔÄÜÓÅ»¯·½°¸
ijµçÉÌÊý¾ÝÖÐ̨¼Ü¹¹Êµ¼ù
MySQL¸ßÀ©Õ¹¼Ü¹¹Éè¼Æ
Ïà¹Ø¿Î³Ì

Êý¾ÝÖÎÀí¡¢Êý¾Ý¼Ü¹¹¼°Êý¾Ý±ê×¼
MongoDBʵս¿Î³Ì
²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿âÉè¼ÆÓëÓÅ»¯
PostgreSQLÊý¾Ý¿âʵսÅàѵ
×îл¼Æ»®
ǶÈëʽÈí¼þ¼Ü¹¹Éè¼Æ 12-11[±±¾©]
LLM´óÄ£ÐÍÓëÖÇÄÜÌ忪·¢ÊµÕ½ 12-18[±±¾©]
ǶÈëʽÈí¼þ²âÊÔ 12-25[±±¾©]
AIÔ­ÉúÓ¦ÓõÄ΢·þÎñ¼Ü¹¹ 1-9[±±¾©]
AI´óÄ£Ðͱàд¸ßÖÊÁ¿´úÂë 1-14[±±¾©]
ÐèÇó·ÖÎöÓë¹ÜÀí 1-22[±±¾©]

MySQLË÷Òý±³ºóµÄÊý¾Ý½á¹¹
MySQLÐÔÄܵ÷ÓÅÓë¼Ü¹¹Éè¼Æ
SQL ServerÊý¾Ý¿â±¸·ÝÓë»Ö¸´
ÈÃÊý¾Ý¿â·ÉÆðÀ´ 10´óDB2ÓÅ»¯
oracleµÄÁÙʱ±í¿Õ¼äдÂú´ÅÅÌ
Êý¾Ý¿âµÄ¿çƽ̨Éè¼Æ


²¢·¢¡¢´óÈÝÁ¿¡¢¸ßÐÔÄÜÊý¾Ý¿â
¸ß¼¶Êý¾Ý¿â¼Ü¹¹Éè¼ÆÊ¦
HadoopÔ­ÀíÓëʵ¼ù
Oracle Êý¾Ý²Ö¿â
Êý¾Ý²Ö¿âºÍÊý¾ÝÍÚ¾ò
OracleÊý¾Ý¿â¿ª·¢Óë¹ÜÀí


GE Çø¿éÁ´¼¼ÊõÓëʵÏÖÅàѵ
º½Ìì¿Æ¹¤Ä³×Ó¹«Ë¾ Nodejs¸ß¼¶Ó¦Óÿª·¢
ÖÐÊ¢Òæ»ª ׿Խ¹ÜÀíÕß±ØÐë¾ß±¸µÄÎåÏîÄÜÁ¦
ijÐÅÏ¢¼¼Êõ¹«Ë¾ PythonÅàѵ
ij²©²ÊITϵͳ³§ÉÌ Ò×ÓÃÐÔ²âÊÔÓëÆÀ¹À
ÖйúÓÊ´¢ÒøÐÐ ²âÊÔ³ÉÊì¶ÈÄ£Ðͼ¯³É(TMMI)
ÖÐÎïÔº ²úÆ·¾­ÀíÓë²úÆ·¹ÜÀí