本贴的目的是通过一个实例来说明*nix命令行工具如何简化一些常见的工作. 如果你愿意, 最后需要的代码仅仅只有3行, 分别是:
query.awk:
Quote:
{if ($1==id) {print $2}}
以及merge.awk:
Quote:
{
 if ( ((query_command " id=" $1 " " score_file)|getline score) <=0){score=na_string}
 print $0, score
}
awk是*nix下的一个文本处理语言, 一般用来处理简单的数据文件. a,w,k是发明这个语言的三人的名字首字母. 关键概念是records和fields: 默认每行是一个record, 而每行中由whitespace (比如空格, tab, etc.)分隔出fields.
关于awk的介绍请看 
http://www-900.ibm.com/developerWorks/cn/linux/shell/index.shtml 中的awk实例, 比如其中第一篇 
http://www-900.ibm.com/developerWorks/cn/linux/shell/awk/awk-1/index.shtml状况:
[*]一个由excel文件导出的csv文件(comma separated values),里面记录了学号, 姓名. 比如:
record.csv
Quote:
18851007,bohr
18790314,einstein
12345678,galilette
...
[*]一个文件, 记录了学号和成绩,中间用tab隔开, 比如
score.dat
Quote:
12345678   100
18790314   59
...
要求: 把成绩合并到记录文件中
疑难分析: 
[*]score.dat和record.csv中, 记录的顺序不同, 比如record中按姓氏笔划排, score.dat中按批改顺序排
[*]record中有所有学生, 但不是每个学生都会交作业. 比如这次bohr同学没交
Solution:
创建两个awk脚本, query.awk用来查询特定学号的同学的分数, merge.awk用来按record中的顺序调用query获得成绩(如果旷交, 用N/A代替)
# pseudo code for query.awk:
for each line in data file,
   if field_1 is equal to give ID, then return field_2
next
#! /bin/awk -f
# query.awk
## Notice: this script returns nothing if the ID NO is not found
# commandline provided variables: id
{
 if ($1==id) {
   print $2}
}
# pseudo code for merge.awk:
for each line in record,
 call query.awk with id=field_1
 if returned value is blank, 
   then set score = N/A
 else
   set score = value returned by query
 end if
 append score to current line, and print to stdout
next
#! /bin/awk -f
# merge.awk:
BEGIN {
 query_command = "./query.awk"
   score_file="./score.dat"
   na_string="N/A"
   #reset FieldSeparator
   FS=","
   #reset OutputFieldSeparator
   OFS=","
   }
{
 if ( ((query_command " id=" $1 " " score_file)|getline score) <=0){
     score=na_string
 }
 print $0, score
}
最后操作:
Quote:
galilette@socrate:~$ ls
merge.awk   query.awk   record.csv   score.dat
galilette@socrate:~$ ./merge.awk record.csv
18851007,bohr,N/A
18790314,einstein,59
12345678,galilette,100
...