The value of md5sum for the same file would be different after Gziped

I got a latest dataset from the collaborator yesterday, some files were already included in the previous version. However I failed in “md5sum -c md5sum.text”, that really tossed  me greatly. When I did the further check and  I found there was not any differences between the decompressed files, so only the compression step changed the MD5!!!

An email from Ray reminded me. I found the following lines on wiki page of gzip and I could have a sweet dream tonight~

“gzip” is often also used to refer to the gzip file format, which is:

  • a 10-byte header, containing a magic number, a version number and a timestamp
  • optional extra headers, such as the original file name,
  • a body, containing a DEFLATE-compressed payload
  • an 8-byte footer, containing a CRC-32 checksum and the length of the original uncompressed data

Configuration for Synology NAS 1812+

The first thing I want to complain is that synology assistant suit of linux is totally useless!  No matter I use it under CentOS or Fedora even Ubuntu(the most pop linux version among us), none of them could find DiskStation via the local network.

If you find NAS under Mac or Windows luckily, please just  hold on, you would find that you are being misleaded by an error message “An error occurred during
installation. The telnet service of the DiskStation is turned on for the error determination. Please configure your router to forward port 23 to
the DiskStation and contact Synology online support.” Normally, we might check our firewall, antivirus and reset router forwarding under such objectionable situation…

Finally, you wake up from disorderly and unsystematic errors and poor manual

OK, solution:

  1. ===How to conduct a direct Ethernet Connection with the Synology system===Remove all Ethernet connection on computer
    Disable Wireless Connections
    For Windows users, from Windows Command Prompt, type “ipconfig /renew”
    [Optional: set the IP Address/Subnet Address to 169.254.1.1/255.255.0.0 on the hardware LAN card]
  2. ==Synology system prep==
    Have the system on in the ready state
    Disconnect the LAN cord from the Synology system for 30 seconds
  3. ==Connecting between the computer and Synology system==
    Use a Straight Ethernet cord to connect the Synology system directly towards the Ethernet port of the computer
    Wait upwards for five minutes for the computer to detect and assign an IP address to the Synology system
    Use the Synology Assistant to detect the Synology system, and reconfigure the IP addressReady to use? Of course not! Synoloy Assistant is worth to be a good trouble making assistant, exp: Only CIFS protocal would be used in remote sync. As a problem be solved, a new comes.===========================================================

    Tips:

    If you create new directories to share them you should mark them as
    samba_share_t.
    Use: chcon -R -t samba_share_t /your/path

    Keep flash privilleges after you update ftpd
    Use: SELinux  ftp_home_dir -> on

当Zotero牵起了webdav的手

Zotero是火狐浏览器(Firefox)的一个文献管理插件,其实它不仅仅是个插件,Zotero 你值得拥有![哦,看了视频才知道,原来是这么读的,被人嘲笑…]

中文教程 站在别人的肩膀上就是看的远…

搭建Webdav

关于webdav

CentOS 配置webdav

如果是系统默认配置的LAMP,太简单,看看就好,

如果是自己搭建的,但愿你不要像我一样在apxs broken之后重新编译apache Help

Continue reading

关于SAM格式中的Flag [Sequence Alignment/Map]

SAM是Sequence Alignment/Map 的缩写,目的是为了个大家提供一个序列比对的通用格式,方便后续处理。SAM.pdf官方文档

ILLUMINA-57021F:5:1:1361:5913#0 81 chr9 103745559 0 40M chr11 51579596 0 ATTTCCTTCTCCTGCCTGATTGCCCTGGCCAGAACTTCCA bT^bbTT`_ccac`caa^bccccccccdddddc^Yad XT:A:R NM:i:0 SM:i:0 AM:i:0 X0:i:2172 XM:i:0 XO:i:0 XG:i:0 MD:Z:40

这个是由Fawn同学提供的SAM格式的比对结果的其中一行(这里就不扯每行的含义了),第二列就是SAM的Flag,它是按位来描述序列的比对模式,方向等信息。官方文档是这么说的,“Field <flag> is a bitwise flag. The meaning of predefined bits is shown in the following table:”

牢骚,忽略之...【就为了wp的table展示阿,我到现在还没睡...愤恨阿,这点比Joomla差远了...】

[table id=1 /]

1. Flag 0x02, 0x08, 0x20, 0x40 and 0x80 are only meaningful when flag 0x01 is present.
2. If in a read pair the information on which read is the first in the pair is lost in the upstream analysis, flag 0x01 should
be present and 0x40 and 0x80 are both zero.

这个注解是说,第2,4,6,7,8,10位有效的前提是,第一位必须是1

例子中的81换算成二进制就是00001010001(不足11位,首位位补0【转换方法这里

从表格中可以看出:

这条比对信息的就是说它是个PE测序的【第一位是1】

并且比对到reference上是反向的【第五位是1】

它本身是这对序列的1#【第七位是1】

后续处理关于验证这个flag,分别从位来判断比对的结果信息,要判断哪一位,就用2^(n-1) AND Flag,返回1就是true,0就是flase

扛不住了,睡觉...

Crazy DNA?

从Crazy DNA不难想到,我说白了就是个搞生物滴。只不过我在厌烦了挖地球、刷管子、跑板子之类的生活后,靠穿着数学的马甲、沾着IT的光环、凭借日益白菜价的测序服务火速爆发的bioinformatics维持生计而已。

本科读生物的我对计算机很感兴趣又不辞幸苦花了两年跑去读了计算机,为现在的生物信息学做了充分准备(其实当初是想放弃生物转IT…现在依旧后悔…没读bioinformatics的现在转IT还来得及哈!)。

做bioinformatics是干什么的呢?我浅薄的理解就是先把生物问题归结到数学的模型,再利用计算机程序实现模型加上生物数据进行分析,最终得出有意义的结论。刚才估计我说的太学术了,咱们简单点,做生物信息学分析就是面对一切皆有可能的生物现象,像数学家那样绞尽脑汁,像程序员那样埋头苦干,最后提炼出一篇PDF。用付出和报酬来总结,就是干着比程序员还累的活,拿着比农民工还少!

在这里,我愿意用blog记录我在数据海洋里的挣扎~

稍稍解释下,nomel=reverse(‘lemon’),我域名的由来!

Aside

23andme!!!

谷歌最近通知美国证交会称自己将向一家专门提供个人DNA测试服务的Web2.0网站23andMe投资260万美元,这家网站由Anne Wojcicki创办,而Anne Wojcicki则是Google创始人Sergey Brin的妻子。不过,为了避免遭致“走后门”、“怕老婆”等等之嫌,目前Sergey Brin已经与这件事情撇开了关系,不会插手这次投资事宜。

23andMe这个公司大有来头。《时代》杂志2008年年度最佳发明奖就颁给了23andMe,你要知道之前获得此殊荣的分别是iPhone和Youtube(MD被封杀了,我必须用代理才能登陆,强烈鄙视)!!!

23andMe是一家专门提供个人DNA测试服务的Web2.0网站,正如23andMe命名一样:“23”代表人体的23对染色体,“Me”代表个人,寓意23andMe宗旨在于为用户提供个人化的基因DNA测试服务。用户只要花上美金399元,对着试管吐吐口水,再把试管寄到23andMe公司,即可进行DNA测试,检测结果在4到6周后出炉,消费者可上23andMe网站查询。布林的妻子安妮·沃西基(Anne Wojcicki)是23andMe联合创始人。23andMe提供基因测试、DNA扫描服务,客户可以了解自己患某种疾病的风险以及血统。 23andMe希望通过招募1000名帕金森病患者,揭示帕金森病的致病基因。布林去年9月份透露,他比其他人更容易患上帕金森病。布林的母亲尤金妮亚·布林(Eugenia Brin)已经患上帕金森病。布林是在向23andMe提交其DNA样本后发现自己更容易患上帕金森病的。

 

哈哈哈,连Google都投资生物信息学了,由此可见bioinformatics的重要啊!!!不过可惜,我不是做人类医学方面的…唉…

 

感兴趣的可以去看看

https://www.23andme.com/

 

 

以下算是23andme的简介和广告吧:

 

 

20万年前:智人在地球上行走。

17万5千年前:现代人类的祖先诞生在非洲。

1866年:格雷戈尔•孟德尔发现遗传学基本原理。

1953年:沃森和克里克揭示了DNA双螺旋结构。

2003年:人类基因组计划图谱完成。

2007年:23andMe 介绍第一个个人基因组服务给你。今天,揭示你自己的DNA秘密。

欢迎来到23andMe ,这是一项基于Web的服务,可以帮助您阅读和理解您的DNA 。您只需要使用一个试剂盒在家收集唾液并提供给我们,之后,就可以使用我们的互动工具,来重新了解您遥远的祖先,您的家人,最重要的是,您自己。