前言:
此刻咱们对“apache无法启动错误1053”大体比较关注,咱们都想要知道一些“apache无法启动错误1053”的相关知识。那么小编在网摘上收集了一些对于“apache无法启动错误1053””的相关资讯,希望姐妹们能喜欢,你们一起来学习一下吧!前言
在HDFS上存储文件,大量的小文件是非常消耗NameNode内存的,因为每个文件都会分配一个文件描述符,NameNode需要在启动的时候加载全部文件的描述信息,所以文件越多,对NameNode来说开销越大。我们可以考虑,将小文件压缩以后,再上传到HDFS中,这时只需要一个文件描述符信息,自然大大减轻了NameNode对内存使用的开销。MapReduce计算中,Hadoop内置提供了如下几种压缩格式:
DEFLATEgzipbzip2LZO
使用压缩文件进行MapReduce计算,它的开销在于解压缩所消耗的时间,在特定的应用场景中这个也是应该考虑的问题。不过对于海量小文件的应用场景,我们压缩了小文件,却换来的Locality特性。假如成百上千的小文件压缩后只有一个Block,那么这个Block必然存在一个DataNode节点上,在计算的时候输入一个InputSplit,没有网络间传输数据的开销,而且是在本地进行运算。
倘若直接将小文件上传到HDFS上,成百上千的小Block分布在不同DataNode节点上,为了计算可能需要“移动数据”之后才能进行计算。文件很少的情况下,除了NameNode内存使用开销以外,可能感觉不到网络传输开销,但是如果小文件达到一定规模就非常明显了。下面,我们使用gzip格式压缩小文件,然后上传到HDFS中,实现MapReduce程序进行任务处理。使用一个类实现了基本的Map任务和Reduce任务,代码如下所示:(原创:时延军(包含链接:))
package org.shirdrn.kodz.inaction.hadoop.smallfiles.compression; import java.io.IOException;import java.util.Iterator; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.compress.CompressionCodec;import org.apache.hadoop.io.compress.GzipCodec;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser; public class GzipFilesMaxCostComputation { public static class GzipFilesMapper extends Mapper<LongWritable, Text, Text, LongWritable> { private final static LongWritable costValue = new LongWritable(0); private Text code = new Text(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // a line, such as 'SG 253654006139495 253654006164392 619850464' String line = value.toString(); String[] array = line.split("\\s"); if (array.length == 4) { String countryCode = array[0]; String strCost = array[3]; long cost = 0L; try { cost = Long.parseLong(strCost); } catch (NumberFormatException e) { cost = 0L; } if (cost != 0) { code.set(countryCode); costValue.set(cost); context.write(code, costValue); } } } } public static class GzipFilesReducer extends Reducer<Text, LongWritable, Text, LongWritable> { @Override protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException { long max = 0L; Iterator<LongWritable> iter = values.iterator(); while (iter.hasNext()) { LongWritable current = iter.next(); if (current.get() > max) { max = current.get(); } } context.write(key, new LongWritable(max)); } } public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: gzipmaxcost <in> <out>"); System.exit(2); } Job job = new Job(conf, "gzip maxcost"); job.getConfiguration().setBoolean("mapred.output.compress", true); job.getConfiguration().setClass("mapred.output.compression.codec", GzipCodec.class, CompressionCodec.class); job.setJarByClass(GzipFilesMaxCostComputation.class); job.setMapperClass(GzipFilesMapper.class); job.setCombinerClass(GzipFilesReducer.class); job.setReducerClass(GzipFilesReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(LongWritable.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(LongWritable.class); job.setNumReduceTasks(1); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); int exitFlag = job.waitForCompletion(true) ? 0 : 1; System.exit(exitFlag); }}
上面程序就是计算最大值的问题,实现比较简单,而且使用gzip压缩文件。另外,如果考虑Mapper输出后,需要向Reducer拷贝的数据量比较大,可以考虑在配置Job的时候,指定
压缩选项,详见上面代码中的配置。
下面看运行上面程序的过程:
准备数据
xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ du -sh ../dataset/gzipfiles/*147M ../dataset/gzipfiles/data_10m.gz43M ../dataset/gzipfiles/data_50000_1.gz16M ../dataset/gzipfiles/data_50000_2.gzxiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -mkdir /user/xiaoxiang/datasets/gzipfilesxiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -copyFromLocal ../dataset/gzipfiles/* /user/xiaoxiang/datasets/gzipfilesxiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -ls /user/xiaoxiang/datasets/gzipfilesFound 3 items-rw-r--r-- 3 xiaoxiang supergroup 153719349 2013-03-24 12:56 /user/xiaoxiang/datasets/gzipfiles/data_10m.gz-rw-r--r-- 3 xiaoxiang supergroup 44476101 2013-03-24 12:56 /user/xiaoxiang/datasets/gzipfiles/data_50000_1.gz-rw-r--r-- 3 xiaoxiang supergroup 15935178 2013-03-24 12:56 /user/xiaoxiang/datasets/gzipfiles/data_50000_2.gz运行程序
xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop jar gzip-compression.jar org.shirdrn.kodz.inaction.hadoop.smallfiles.compression.GzipFilesMaxCostComputation /user/xiaoxiang/datasets/gzipfiles /user/xiaoxiang/output/smallfiles/gzip13/03/24 13:06:28 INFO input.FileInputFormat: Total input paths to process : 313/03/24 13:06:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library13/03/24 13:06:28 WARN snappy.LoadSnappy: Snappy native library not loaded13/03/24 13:06:28 INFO mapred.JobClient: Running job: job_201303111631_003913/03/24 13:06:29 INFO mapred.JobClient: map 0% reduce 0%13/03/24 13:06:55 INFO mapred.JobClient: map 33% reduce 0%13/03/24 13:07:04 INFO mapred.JobClient: map 66% reduce 11%13/03/24 13:07:13 INFO mapred.JobClient: map 66% reduce 22%13/03/24 13:07:25 INFO mapred.JobClient: map 100% reduce 22%13/03/24 13:07:31 INFO mapred.JobClient: map 100% reduce 100%13/03/24 13:07:36 INFO mapred.JobClient: Job complete: job_201303111631_003913/03/24 13:07:36 INFO mapred.JobClient: Counters: 2913/03/24 13:07:36 INFO mapred.JobClient: Job Counters13/03/24 13:07:36 INFO mapred.JobClient: Launched reduce tasks=113/03/24 13:07:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=7823113/03/24 13:07:36 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=013/03/24 13:07:36 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=013/03/24 13:07:36 INFO mapred.JobClient: Launched map tasks=313/03/24 13:07:36 INFO mapred.JobClient: Data-local map tasks=313/03/24 13:07:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=3441313/03/24 13:07:36 INFO mapred.JobClient: File Output Format Counters13/03/24 13:07:36 INFO mapred.JobClient: Bytes Written=133713/03/24 13:07:36 INFO mapred.JobClient: FileSystemCounters13/03/24 13:07:36 INFO mapred.JobClient: FILE_BYTES_READ=28812713/03/24 13:07:36 INFO mapred.JobClient: HDFS_BYTES_READ=21413102613/03/24 13:07:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=38572113/03/24 13:07:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=133713/03/24 13:07:36 INFO mapred.JobClient: File Input Format Counters13/03/24 13:07:36 INFO mapred.JobClient: Bytes Read=21413062813/03/24 13:07:36 INFO mapred.JobClient: Map-Reduce Framework13/03/24 13:07:36 INFO mapred.JobClient: Map output materialized bytes=910513/03/24 13:07:36 INFO mapred.JobClient: Map input records=1408000313/03/24 13:07:36 INFO mapred.JobClient: Reduce shuffle bytes=607013/03/24 13:07:36 INFO mapred.JobClient: Spilled Records=2283413/03/24 13:07:36 INFO mapred.JobClient: Map output bytes=15487849313/03/24 13:07:36 INFO mapred.JobClient: CPU time spent (ms)=9020013/03/24 13:07:36 INFO mapred.JobClient: Total committed heap usage (bytes)=68819353613/03/24 13:07:36 INFO mapred.JobClient: Combine input records=1409291113/03/24 13:07:36 INFO mapred.JobClient: SPLIT_RAW_BYTES=39813/03/24 13:07:36 INFO mapred.JobClient: Reduce input records=69913/03/24 13:07:36 INFO mapred.JobClient: Reduce input groups=23313/03/24 13:07:36 INFO mapred.JobClient: Combine output records=1374713/03/24 13:07:36 INFO mapred.JobClient: Physical memory (bytes) snapshot=76544819213/03/24 13:07:36 INFO mapred.JobClient: Reduce output records=23313/03/24 13:07:36 INFO mapred.JobClient: Virtual memory (bytes) snapshot=221123788813/03/24 13:07:36 INFO mapred.JobClient: Map output records=14079863运行结果
xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -ls /user/xiaoxiang/output/smallfiles/gzipFound 3 items-rw-r--r-- 3 xiaoxiang supergroup 0 2013-03-24 13:07 /user/xiaoxiang/output/smallfiles/gzip/_SUCCESSdrwxr-xr-x - xiaoxiang supergroup 0 2013-03-24 13:06 /user/xiaoxiang/output/smallfiles/gzip/_logs-rw-r--r-- 3 xiaoxiang supergroup 1337 2013-03-24 13:07 /user/xiaoxiang/output/smallfiles/gzip/part-r-00000.gzxiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -copyToLocal /user/xiaoxiang/output/smallfiles/gzip/part-r-00000.gz ./xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ gunzip -c ./part-r-00000.gzAD 999974516AE 999938630AF 999996180AG 999991085AI 999989595AL 999998489AM 999978568AO 999989628AQ 999995031AR 999999563AS 999935982AT 999999909AU 999937089AW 999965784AZ 999996557BA 999994828BB 999992177BD 999992272BE 999925057BF 999999220BG 999971528BH 999994900BI 999982573BJ 999977886BM 999991925BN 999986630BO 999995482BR 999989947BS 999983475BT 999992685BW 999984222BY 999998496BZ 999997173CA 999991096CC 999969761CD 999978139CF 999995342CG 999957938CH 999997524CI 999998864CK 999968719CL 999967083CM 999998369CN 999975367CO 999999167CR 999980097CU 999976352CV 999990543CW 999996327CX 999987579CY 999982925CZ 999993908DE 999985416DJ 999997438DK 999963312DM 999941706DO 999992176DZ 999973610EC 999971018EE 999960984EG 999980522ER 999980425ES 999949155ET 999987033FI 999989788FJ 999990686FK 999977799FM 999994183FO 999988472FR 999988342GA 999982099GB 999970658GD 999996318GE 999991970GF 999982024GH 999941039GI 999995295GL 999948726GM 999984872GN 999992209GP 999996090GQ 999988635GR 999999672GT 999981025GU 999975956GW 999962551GY 999999881HK 999970084HN 999972628HR 999986688HT 999970913HU 999997568ID 999994762IE 999996686IL 999982184IM 999987831IN 999973935IO 999984611IQ 999990126IR 999986780IS 999973585IT 999997239JM 999986629JO 999982595JP 999985598KE 999996012KG 999991556KH 999975644KI 999994328KM 999989895KN 999991068KP 999967939KR 999992162KW 999924295KY 999985907KZ 999992835LA 999989151LB 999989233LC 999994793LI 999986863LK 999989876LR 999984906LS 999957706LT 999999688LU 999999823LV 999981633LY 999992365MA 999993880MC 999978886MD 999997483MG 999996602MH 999989668MK 999983468ML 999990079MM 999989010MN 999969051MO 999978283MP 999995848MQ 999913110MR 999982303MS 999997548MT 999982604MU 999988632MV 999975914MW 999991903MX 999978066MY 999995010MZ 999981189NA 999976735NC 999961053NE 999990091NF 999989399NG 999985037NI 999965733NL 999988890NO 999993122NP 999972410NR 999956464NU 999987046NZ 999998214OM 999967428PA 999944775PE 999998598PF 999959978PG 999987347PH 999981534PK 999954268PL 999996619PM 999998975PR 999978127PT 999993404PW 999991278PY 999993590QA 999995061RE 999998518RO 999994148RS 999999923RU 999995809RW 999980184SA 999973822SB 999972832SC 999991021SD 999963744SE 999972256SG 999977637SH 999999068SI 999980580SK 999998152SL 999999269SM 999941188SN 999990278SO 999978960SR 999997483ST 999980447SV 999999945SX 999938671SY 999990666SZ 999992537TC 999969904TD 999999303TG 999977640TH 999979255TJ 999983666TK 999971131TM 999958998TN 999979170TO 999959971TP 999986796TR 999996679TT 999984435TV 999974536TW 999975092TZ 999992734UA 999972948UG 999980070UM 999998377US 999918442UY 999989662UZ 999982762VA 999987372VC 999991495VE 999997971VG 999954576VI 999990063VN 999974393VU 999976113WF 999961299WS 999970242YE 999984650YT 999994707ZA 999998692ZM 999993331ZW 999943540觉得文章还不错的话,可以转发此文关注小编,每天更新技术好文
标签: #apache无法启动错误1053