Chromatin Modification Data and Scripts

Chromatin Modification Data and Scripts


Overview

This is the raw data and some scripts used in the manuscript:
"Patterns of Chromatin Modification Discriminate Different Genomic Features in Arabidopsis"
by Srivastava, Zhang, LaMarca, Cai, and Malmberg



There are 4 Folders inside the “ProjectData_Scripts” folder. The descriptions of these are as follows:

Arabidopsis_Feature_Coordinates: It contains the TAIR 10 Non-Overlapping Feature Coordinates in GFF format
Epigenetic_Mapped_Data: This contains the epigenetic data in the form of probabilities’ (tilemap generated) for each type of modifications. This is all modification data that we used in this article.
RNA-Seq_TAIR9: This is RNA-Seq experiment mapped data (using bowtie) at TAIR 9 genome.
Scripts: It contains some of the useful scripts which were used in this project.

Script description:
ProbabilityModifier.pl: This script converts the modifications probabilities which are over a region into base specific probability. To run this, we need epigenetic mapped data probability file. We need to run this code separately for each chromosome.
Usage: perl ProbabilityModifier.pl H3K4me1_chrC.gr C 20
Input:
H3K4me1_chrC.gr (Modification File)
Specify chromosome name: Here is ‘C’ (chloroplast)
Up/Down region to assign same probability: Here it is ‘20’
Output:
The file “H3K4me1_chrC.grProbabilityBaseSpecific.txt” will be created.

Check_Intergenic_Regions_For_RNA_Seq_Reads.pl: This script check for the presence of RNA-Seq reads in the regions annotated as intergenic in Arabidopsis genome.
Usage: perl Check_Intergenic_Regions_For_RNA_Seq_Reads.pl chr1_ExtractedIntergenic Chr1_RNASeq chrName[here is “chr1”..chr2....]

Input:
chr1_ExtractedIntergenic (Intergenic file in this format),
Chr1_RNASeq : RNA-Seq data file in 4 column bed format
Name of the chromosome: here is “chr1”
Output:
Here the output will be Chr1_Intergenic_Non_Transcribed, Chr1_Intergenic_Transcribed

Folder: AssignProbabilities
Protein_RNA_Pseudo_TransposonCalculate_Prob.pl: This script assigns the modification probabilities to genomic features using their co-ordinates.
Usage: Perl Protein_RNA_Pseudo_TransposonCalculate_Prob.pl CordFile ProbFile 1|2|3|4|.....7
Input:
ChrCCord_TAIR10_Final (Arabidopsis Chloroplast genome coordinates)
Probability file: Base specific probability file of particular modification for chloroplast
We have 7 type of modification and suppose, we name this number one so put 1 here.
Output:
We will get files with known/unknown extension for each type of genomic features (such as protein, RNA etc.). Known means we have modification probabilities present for that particular feature and vice-versa. We need to run this code for each chromosome separately (for every modification in same directory) and output will automatically append to previously created files.
IntergenicCalculateProb.pl: This script calculates the modification probability in the intergenic regions.
Usage: perl IntergenicCalculateProb.pl CordFile ProbFile 1|2|3|4|.....7

Statistics_From_SVM_Binary_Classifier_Out: This script calculates some stats from binary classifier results.
Usage: perl CheckSVMoutByMeasurement.pl Test Test.predict
Input:
Test (Test file prepared for SVM prediction)
Test.predict (SVM output file)
Output:
STDOUT of stats


Citation

  • Patterns of Chromatin Modification Discriminate Different Genomic Features in Arabidopsis"
    Srivastava, Zhang, LaMarca, Cai, and Malmberg