Title: | Perform Analysis and Create Visualizations of Proteins |
---|---|
Description: | Read Protein Data Bank (PDB) files, performs its analysis, and presents the result using different visualization types including 3D. The package also has additional capability for handling Virus Report data from the National Center for Biotechnology Information (NCBI) database. |
Authors: | Simon Liles |
Maintainer: | Simon Liles <[email protected]> |
License: | CC0 |
Version: | 0.0.1 |
Built: | 2025-02-05 06:12:41 UTC |
Source: | https://github.com/simonliles/protein8k |
Decode a JSON List into an R List Object.
fromJSONL(filepath, maxLines = -1)
fromJSONL(filepath, maxLines = -1)
filepath |
A character string indicating the filepath from the working directory to the desired file. |
maxLines |
An integer representing the max number of lines to read. Negative values indicate that one should read up to the end of input on the connection. |
a large list, each element containg the contents of a JSON file after being converted.
Retrieve the Atomic Record from a Protein Object
getAtomicRecord(protein)
getAtomicRecord(protein)
protein |
input for a a protein object |
Dataframe with 16 columns:
record_type:Type of record in this section. Generally ATOM or HETATM
serial_num: The serial number for the position of the atom in the sequence
atom_name: A name to identify the atom in a structure
alt_location_id:
residue_name: 3 character identifier for a residue
chain_id:
residue_seq_num: Number representing where in the sequence a residue is.
insert_residue_code:
x_ortho_coord: X coordinate in Ångstroms on an orthogonal plane
y_ortho_coord: Y coordinate in Ångstroms on an orthogonal plane
z_ortho_coord: Z coordinate in Ångstroms on an orthogonal plane
occupancy:
temp_factor: The amount of overall error in the measurement of an atom.
segment_id:
element_symbol: Periodic symbol representing an atom.
charge: Charge of the given atom. Can be +, -, or none at all
This is an accessor function for retrieving the Atomic Record from a Protein object.
Returns a dataframe containing the atomic record. There are 16 variables in this data frame.
Retrieve the title section from a Protein Object
getTitleSection(protein)
getTitleSection(protein)
protein |
input for a a protein object |
This is an accessor function for retrieving the title section from a Protein object.
Returns a list containing elements from the title section.
Formal class protein representing data from a PDB, code 1AIE, p53 tetramerization Domain Crystal Structure. This is a small and simple R object of example data for users to play with and is used in example vignettes.
p53_tetramerization
p53_tetramerization
A Protein S4 object. List comprised of several sublists and dataframes
header: List of 2, Header Line and Title
header_line: List of 3, Classification, depDate, and idCode
classifiation: Classification of the Protein in the PDB
depDat: Date the PDB was deposited or created
idCode: 4 digit identifier for the PDB. Always unique.
title: The title of the PDB.
structure: Dataframe of 16 variables
record_type:Type of record in this section. Generally ATOM or HETATM
serial_num: The serial number for the position of the atom in the sequence
atom_name: A name to identify the atom in a structure
alt_location_id:
residue_name: 3 character identifier for a residue
chain_id:
residue_seq_num: Number representing where in the sequence a residue is.
insert_residue_code:
x_ortho_coord: X coordinate in Ångstroms on an orthogonal plane
y_ortho_coord: Y coordinate in Ångstroms on an orthogonal plane
z_ortho_coord: Z coordinate in Ångstroms on an orthogonal plane
occupancy:
temp_factor: The amount of overall error in the measurement of an atom.
segment_id:
element_symbol: Periodic symbol representing an atom.
charge: Charge of the given atom. Can be +, -, or none at all
plot the protein structure in 3D
plot3D( protein, animated = FALSE, type = "p", groups = NULL, screen = list(x = -60, z = 0, y = 0), image_width = 480, image_height = 480 )
plot3D( protein, animated = FALSE, type = "p", groups = NULL, screen = list(x = -60, z = 0, y = 0), image_width = 480, image_height = 480 )
protein |
Protein object to be plotted. Can be either of S3 or S4 Protien object type. |
animated |
logical indicating whether the object is to be animated in the viewer. Will spin the plot on the Z axis. |
type |
character vector indicating the type of cloud plot. Can include one or more of "p", "l", "h", or "b". "p" and "l" mean points and lines respectively, and "b" means both. "h" stands for histogram and draws lines from each point to the XY plane, either lower or upper bounding box face, whichever is closer. |
groups |
the name of a column from the Atomic Record of the protein. Causes the points to be colored by the different values in that group. |
screen |
A list determining the sequence of rotations to be applied to the data before plotting. Each componenet of the list should be one of "x", "y" or "z", repetitions are allowed with values indicating amount of rotation in degrees. |
image_width |
width of the resulting image in pixels. Currently only applies when 'animated = TRUE'. Defaults to 480 pixels. |
image_height |
hieght of the resulting image in pixels. Currently only applies when 'animated = TRUE'. Defaults to 480 pixels. |
This function uses lattice and magick to create the 3D plot and animate it.
Currently this function is incomplete and will change dramatically as new features and documentation are added.
An object to be plotted. If not assigned to a variable, it will plot directly in the viewer.
plot models of the protein structure using ggplot.
plotModels(protein, separate = FALSE)
plotModels(protein, separate = FALSE)
protein |
Protein object to be plotted |
separate |
indicate wether to plot each plane separately or as one visual. |
Create a plot of each plane and model the shape of the protein.
This function uses ggplot and grid to create 3 plots, one for each plane, of the protein model, and then create a smoothing model.
Currently this function is incomplete and will change dramatically as new features and documentation are added.
An object to be plotted. If not assigned to a variable, it will plot directly in the viewer.
Protein Class used to Define Protein Objects of S3 and S4 Types. Currently still in development, Integrity checks still need to be added.
Breakdown of a Protein Object's structure:
header: List of 2, Header Line and Title
header_line: List of 3, Classification, depDate, and idCode
classifiation: Classification of the Protein in the PDB
depDat: Date the PDB was deposited or created
idCode: 4 digit identifier for the PDB. Always unique.
title: The title of the PDB.
structure: Dataframe of 16 variables
record_type:Type of record in this section. Generally ATOM or HETATM
serial_num: The serial number for the position of the atom in the sequence
atom_name: A name to identify the atom in a structure
alt_location_id:
residue_name: 3 character identifier for a residue
chain_id:
residue_seq_num: Number representing where in the sequence a residue is.
insert_residue_code:
x_ortho_coord: X coordinate in Ångstroms on an orthogonal plane
y_ortho_coord: Y coordinate in Ångstroms on an orthogonal plane
z_ortho_coord: Z coordinate in Ångstroms on an orthogonal plane
occupancy:
temp_factor: The amount of overall error in the measurement of an atom.
segment_id:
element_symbol: Periodic symbol representing an atom.
charge: Charge of the given atom. Can be +, -, or none at all
Opens a viewer for exploratory and interactive analysis of a protein structure.
Protein3D(protein)
Protein3D(protein)
protein |
Protein object to use in plotting |
Does not return a value.
Read in a Protein Data Bank file
read.pdb(fileName, createAsS4 = TRUE)
read.pdb(fileName, createAsS4 = TRUE)
fileName |
character string for location and name of file to be read. |
createAsS4 |
Logical indicating whether to create the new protein object as S4 or not. Defaults to TRUE if not specified. This argument is optional. |
A Protein object. List comprised of several sublists and dataframes
header: List of 2, Header Line and Title
header_line: List of 3, Classification, depDate, and idCode
classifiation: Classification of the Protein in the PDB
depDat: Date the PDB was deposited or created
idCode: 4 digit identifier for the PDB. Always unique.
title: The title of the PDB.
structure: Dataframe of 16 variables
record_type:Type of record in this section. Generally ATOM or HETATM
serial_num: The serial number for the position of the atom in the sequence
atom_name: A name to identify the atom in a structure
alt_location_id:
residue_name: 3 character identifier for a residue
chain_id:
residue_seq_num: Number representing where in the sequence a residue is.
insert_residue_code:
x_ortho_coord: X coordinate in Ångstroms on an orthogonal plane
y_ortho_coord: Y coordinate in Ångstroms on an orthogonal plane
z_ortho_coord: Z coordinate in Ångstroms on an orthogonal plane
occupancy:
temp_factor: The amount of overall error in the measurement of an atom.
segment_id:
element_symbol: Periodic symbol representing an atom.
charge: Charge of the given atom. Can be +, -, or none at all
Reads a Protein Data Bank file (PDB) from the given location. The function then parses the file and creates a new object of the Protein class. This object can be either defined as an S3 or S4 object if different capabilities are required.
A new protein object as either an S3 or S4 object.
In general terms, the new object will be a list of two, a data frame containing the atomic record, and a list of header elements.
Function to transform a list of NCBI Virus Report metadata into a table.
report_as_dataframe(report, records = c(1:length(report)))
report_as_dataframe(report, records = c(1:length(report)))
report |
a list derived a vaccine report from NCBI Datasets. |
records |
a vector of indices to pull from the report. |
A large dataframe with 23 variables containig metadata from NCBI Virus report.
summary.Protein
## S4 method for signature 'Protein' summary(object) ## S4 method for signature 'Protein,ANY' summary(object,...)
## S4 method for signature 'Protein' summary(object) ## S4 method for signature 'Protein,ANY' summary(object,...)
object |
A Protein object of either S3 or S4 type. |
... |
other objects passed to 'summary()'. Currently not supported. |
Prints a description of the protein object to the console. The lines of out put are as follows.
Prints if it is S3 or S4 object type.
ID Code of the PDB and the Data it was deposited in the Data Bank.
The Classification of the protein.
The title of the PDB.
The number of rows in the Atomic Record.
Does not return a value.
summary.Protein
## S3 method for class 'Protein' summary(object, ...)
## S3 method for class 'Protein' summary(object, ...)
object |
A Protein object of either S3 or S4 type. |
... |
other objects passed to 'summary()'. Currently not supported. |
Prints a description of the protein object to the console. The lines of out put are as follows.
Prints if it is S3 or S4 object type.
ID Code of the PDB and the Data it was deposited in the Data Bank.
The Classification of the protein.
The title of the PDB.
The number of rows in the Atomic Record.
Does not return a value.
Wrapper function for writing images to the disk. This function comes from the magick package under the same name.
write_viz(image, path = "my_image", format = "png")
write_viz(image, path = "my_image", format = "png")
image |
magick image object or trellis object. |
path |
a file path starting from the working directory |
format |
file type to save the image as. Can be "png", "jpeg", "gif", "rgb", or "rgba". |
Does not return a value.