Linux 以某一列为标准分割文件

一谈到分割文件，linux里面首选就是split， head，tail，cut..加上管道来配合使用，在不行就awk处理～但是awk也能很完美的解决，算了，就perl吧…

Example:
pileup[size=40G]文件，由于在为了提高maping速度和效率，10 chr被cat到一个文本里，现在要分别统计每条染色体的coverage，那就的把每个chr的pileup文件过滤出来

all.pileup:

chr pos base cov….

A01 1 C 0 @ @ @

A01 2 C 0 @ @ @

A01 3 T 0 @ @ @

A01 4 A 3 @G,. @GIE @~A~

A01 5 T 3 @,,. @HIC @~A~

A01 6 T 3 @,,. @HIC @~A~

…

A10 …

Solution:

split 只能固定行or固定大小来分割文件，我也就在传输大文件的时候用过，这个问题split搞不定…
grep找到 A01-10 在all.pileup第一次出现的位置[行号]，然后 head -n line.start all.pileup|tail -n (line.end-line.start+1) >A0x.pileup 10条啊…懒得搞…
awk ‘{if($1==Ax) print $0;}’ all.pileup >Ax… 如果这么一行处理，岂不是要读10遍？要是蕨类植物呢？？[染色体巨多，极大值为1260]OMG不敢想象去tmd..
perl, IO::File

#! /usr/bin/perl
use strict;
use warnings;
use IO::File;
my %FH;
while () {
chomp;
my $name =(split /t/)[0];
$FH{$name}[0] = IO::File->new(“> $name.pileup”) unless exists $FH{$name};
$FH{$name}[0] -> print($_ . “n”);
$FH{$name}[1] += 1;
}
这下管他多少染色体都能搞定了
关于IO::File
或者在你的终端下输入 perldoc IO::File

Crazy DNA

Just A Life Record

Linux 以某一列为标准分割文件

Leave a Reply Cancel reply