Chromatin Modification Data and Scripts


This is the raw data and some scripts used in the manuscript:
"Patterns of Chromatin Modification Discriminate Different Genomic Features in Arabidopsis"
by Srivastava, Zhang, LaMarca, Cai, and Malmberg

There are 4 Folders inside the “ProjectData_Scripts” folder. The descriptions of these are as follows:

Arabidopsis_Feature_Coordinates: It contains the TAIR 10 Non-Overlapping Feature Coordinates in GFF format
Epigenetic_Mapped_Data: This contains the epigenetic data in the form of probabilities’ (tilemap generated) for each type of modifications. This is all modification data that we used in this article.
RNA-Seq_TAIR9: This is RNA-Seq experiment mapped data (using bowtie) at TAIR 9 genome.
Scripts: It contains some of the useful scripts which were used in this project.

Script description: This script converts the modifications probabilities which are over a region into base specific probability. To run this, we need epigenetic mapped data probability file. We need to run this code separately for each chromosome.
Usage: perl C 20
Input: (Modification File)
Specify chromosome name: Here is ‘C’ (chloroplast)
Up/Down region to assign same probability: Here it is ‘20’
The file “H3K4me1_chrC.grProbabilityBaseSpecific.txt” will be created. This script check for the presence of RNA-Seq reads in the regions annotated as intergenic in Arabidopsis genome.
Usage: perl chr1_ExtractedIntergenic Chr1_RNASeq chrName[here is “chr1”..chr2....]

chr1_ExtractedIntergenic (Intergenic file in this format),
Chr1_RNASeq : RNA-Seq data file in 4 column bed format
Name of the chromosome: here is “chr1”
Here the output will be Chr1_Intergenic_Non_Transcribed, Chr1_Intergenic_Transcribed

Folder: AssignProbabilities This script assigns the modification probabilities to genomic features using their co-ordinates.
Usage: Perl CordFile ProbFile 1|2|3|4|.....7
ChrCCord_TAIR10_Final (Arabidopsis Chloroplast genome coordinates)
Probability file: Base specific probability file of particular modification for chloroplast
We have 7 type of modification and suppose, we name this number one so put 1 here.
We will get files with known/unknown extension for each type of genomic features (such as protein, RNA etc.). Known means we have modification probabilities present for that particular feature and vice-versa. We need to run this code for each chromosome separately (for every modification in same directory) and output will automatically append to previously created files. This script calculates the modification probability in the intergenic regions.
Usage: perl CordFile ProbFile 1|2|3|4|.....7

Statistics_From_SVM_Binary_Classifier_Out: This script calculates some stats from binary classifier results.
Usage: perl Test Test.predict
Test (Test file prepared for SVM prediction)
Test.predict (SVM output file)
STDOUT of stats


