This vignette serves as a getting started guide for the protein8k package. Through detailed explanations and example code, you as the user should be able to gain a good understanding of how this package works and the kinds of analysis you can perform with it.
While it is technically possible to hand encode your own Protein
Object in this package, it is not recommended. The best way to get
started is by reading in a Protein Data Bank file, otherwise known as a
PDB. For those that are not familiar with bi0logical macromolecules,
these files are used to store information regarding protein structures
and experimental data. They also come with a handful of metadata. To
read in one of these files, use read.pdb()
and give it a
file directory to a PDB file. A function call may look like this:
This will create a new Protein Object in your global environment.
There are other options for creating a new Protein Object, for example
createAsS4
controls whether the new object is created as an
S4 or S3 object. The default is to create an S4 object and is the
recommended option. If you are doing development that extends this
package for example, and need the object to be S3, then this gives you
that option. For more information see the help page for
read.pdb()
.
Now that you have this object you will want to learn more about what
you actually have. While looking at the structure you could interpret
what everything means. Maybe look at some documentation on what is
inside the object or what was in the PDB. A better option is to use
summary()
just like how you would with any other R
Object.
## S4 Object of class Protein
## ID Code: 1AIE Deposition Date: 1997-04-17
## Classification: P53 TETRAMERIZATION
## Title: P53 TETRAMERIZATION DOMAIN CRYSTAL STRUCTURE
## Atomic Record Contains 590 rows
The function will give the above output. You can see if it is an S3 or S4 object. It gives the ID Code of the PDB, which is a unique 4 character string for every PDB. Next is the date the PDB was deposited into the World Wide Protein Data Bank (wwPDB). You can see the classification and title of the PDB, and also the number of records in the Atomic Record.
The Protein object is constructed of a list of other lists and data
frames. Currently there are 2 elements in this list, the header or title
section, and the coordinate section in the Atomic Record. The header is
a list containing the header line and the title of the PDB. The atomic
record is a dataframe with 16 variables. See the documentation on
Protein class for more details. ?Protein-class
PDB files include a lot of data on a protein, however tables of data are not very informative. With the protein8k package it is possible to create several different visualizations of Protein Objects.
One visualization that not only looks cool, but can also provides a
good understanding of the structure of an atom is a 3D plot. A call to
the function plot3D()
will plot every record in the atomic
coordinate section from the PDB file.
This function uses the lattice package to create a 3D cloud of points. It is essentially a wrapper function for plotting the atomic record of points.
Take this example:
With the animated
parameter, you can have the protein
animated spinning around the y axis. This will temporarily create a
series of .png files in your working directory, then put together the
animation as a magick object before removing said .png files. If
plot3D()
is not assigned to a variable, it will generate
the visualization in the viewer.
The plot3D()
function can take further parameters to
improve the quality of the plot, however there are numerous
transformations to the plot that are possible and are outside of the
scope of this vignette. Please see the vignette specifically about using
plot3D()
for more information.
Another useful visualization is to apply smoothing lines to the point
in the atomic record. With the plotModels()
function, one
can create a series of visualizations that look at each plane of the
structure and apply a smoothing line.
For example:
Thank you for reading this Intro to protein8k vignette.