在c语言中,如何提取一个txt数据库文件中的信息

最近在做一个生物类数据库,主要是从一个现成的swissProt数据库中提取数据,这个数据库是TXT格式的,格式为第一行以ID开始,最后一行以双斜杠//结束,这种格式的序列有几万个,我需要从这些数据中根据一些特定参数提取出需要的序列。
例如:
ID AAE16_ARATH Reviewed; 722 AA.
AC Q9LK39; Q8LRT1;
DT 22-FEB-2012, integrated into UniProtKB/Swiss-Prot.
DT 01-OCT-2000, sequence version 1.
DT 21-MAR-2012, entry version 59.
DE RecName: Full=Probable acyl-activating enzyme 16, chloroplastic;
DE EC=6.2.1.-;
DE Flags: Precursor;
GN Name=AAE16; OrderedLocusNames=At3g23790; ORFNames=MYM9.14;
OS Arabidopsis thaliana (Mouse-ear cress).
OC rosids; malvids; Brassicales; Brassicaceae; Camelineae; Arabidopsis.
OX NCBI_TaxID=3702;
RN [1]
RP NUCLEOTIDE SEQUENCE [MRNA].
RN [2]
RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA].
RC STRAIN=cv. Columbia;
RL DNA Res. 7:217-221(2000).
RX PubMed=16262711; DOI=10.1111/j.1365-313X.2005.02553.x;
RA Koo A.J., Fulda M., Browse J., Ohlrogge J.B.;
RT fatty acids.";
RL Plant J. 44:620-632(2005).
CC -!- FUNCTION: May be involved in the activation of fatty acids to

CC -----------------------------------------------------------------------
DR EMBL; AF503771; AAM28629.1; -; mRNA.
DR IPI; IPI00538918; -.
DR RefSeq; NP_189021.2; NM_113283.3.
DR PRIDE; Q9LK39; -.
DR eggNOG; COG1022; -.
DR InParanoid; Q9LK39; -.
DR Genevestigator; Q9LK39; -.
DR PRINTS; PR00154; AMPBINDING.
DR PROSITE; PS00455; AMP_BINDING; 1.
PE 2: Evidence at transcript level;
KW Chloroplast; Complete proteome; Fatty acid metabolism; Ligase;
KW Lipid metabolism; Plastid; Reference proteome; Transit peptide.
FT TRANSIT 1 47 Chloroplast (Potential).
FT CHAIN 48 722 Probable acyl-activating enzyme 16,
FT chloroplastic.
FT /FTId=PRO_0000415726.
FT CONFLICT 16 16 S -> C (in Ref. 1; AAM28629).
SQ SEQUENCE 722 AA; 81148 MW; 87F9FBCADFDBCE8F CRC64;
//

简单的用C读取文件的例子代码片段。
... ...
FILE *fp;
char buffer[1000];
fp=fopen( "input.txt", "r" );
while ( fread(&buffer, sizeof(buffer), 1, fp )==1)
{
... ...
//对从文件读出来的数据在此处进行处理
... ...
}
fclose(fp);
... ...
温馨提示:内容为网友见解,仅供参考
无其他回答
相似回答