Existing and future NA12878 datasets

In collaboration with NCBI and CDC, we've made a list of all of the existing whole genome sequencing, targeted sequencing, and other datasets for NA12878 that are publicly available or will be made publicly available. You can see this list here (https://docs.google.com/spreadsheet/ccc?key=0ArAo1qqJJDHQdHo0U1FzQV9JYVZ...). If you know about any datasets that are missing here or can fill in missing information, please feel free to edit the spreadsheet.

We've developed some methods to form consensus SNP genotype calls from 9 of the whole genome and 2 exome datasets for NA12878, as I briefly described at the August meeting (http://www.slideshare.net/GenomeInABottle/nist-work-developing-genomic-r...). We've submitted a manuscript based on this work and would be happy to share our consensus SNP calls if any of you would be interested. One of the tasks of the consortium will be to refine these methods and expand to additional types of variants, so please let us know if you have any suggestions. If you're interested, you're also welcome to start using these datasets to develop methods for characterizing the Reference Materials and understand the relative importance of datasets for characterizing future Reference Materials.

Thank you,
Justin