Skip to contents

Bin a GRanges and allow to apply a summary method (e.g: 'mean', 'median', 'sum', 'max, 'min' ...) to a chossen numericals variables of ranges in a same bin.

Usage

BinGRanges(
  gRange.gnr = NULL,
  chromSize.dtf = NULL,
  binSize.num = NULL,
  method.chr = "mean",
  variablesName.chr_vec = NULL,
  na.rm = TRUE,
  cores.num = 1,
  reduce.bln = TRUE,
  verbose.bln = FALSE
)

Arguments

gRange.gnr

<GRanges>: A GRanges to bin.

chromSize.dtf

<data.frame>: A data.frame where first colum correspond to the chromosomes names, and the second column correspond to the chromosomes lengths in base pairs.

binSize.num

<numerical>: Width of the bins.

method.chr

<character>: Name of a summary method as 'mean', 'median', 'sum', 'max, 'min'. (Default 'mean')

variablesName.chr_vec

<character> : A character vector that specify the metadata columns of GRanges on which apply the summary method.

na.rm

<logical> : A logical value indicating whether 'NA' values should be stripped before the computation proceeds. (Default TRUE)

cores.num

<numerical> : The number of cores. (Default 1)

reduce.bln

<logical> : Whether duplicated Bin must been reduced with de summary method. (Default TRUE)

verbose.bln

<logical>: If TRUE show the progression in console. (Default FALSE)

Value

A binned GRanges.

Details

BinGRanges

Examples

GRange.gnr <- GenomicRanges::GRanges(
    seqnames = S4Vectors::Rle(c("chr1", "chr2"), c(3, 1)),
    ranges = IRanges::IRanges(
        start = c(1, 201, 251, 1),
        end = c(200, 250, 330, 100),
        names = letters[seq_len(4)]
    ),
    strand = S4Vectors::Rle(BiocGenerics::strand(c("*")), 4),
    score = c(50, NA, 100, 30)
)
GRange.gnr
#> GRanges object with 4 ranges and 1 metadata column:
#>     seqnames    ranges strand |     score
#>        <Rle> <IRanges>  <Rle> | <numeric>
#>   a     chr1     1-200      * |        50
#>   b     chr1   201-250      * |        NA
#>   c     chr1   251-330      * |       100
#>   d     chr2     1-100      * |        30
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome; no seqlengths
BinGRanges(
    gRange.gnr = GRange.gnr,
    chromSize.dtf = data.frame(c("chr1", "chr2"), c(350, 100)),
    binSize.num = 100,
    method.chr = "mean",
    variablesName.chr_vec = "score",
    na.rm = TRUE
)
#> GRanges object with 5 ranges and 2 metadata columns:
#>       seqnames    ranges strand |     score         bin
#>          <Rle> <IRanges>  <Rle> | <numeric> <character>
#>   [1]     chr1     1-100      * |        50      chr1:1
#>   [2]     chr1   101-200      * |        50      chr1:2
#>   [3]     chr1   201-300      * |       100      chr1:3
#>   [4]     chr1   301-350      * |       100      chr1:4
#>   [5]     chr2     1-100      * |        30      chr2:1
#>   -------
#>   seqinfo: 2 sequences from an unspecified genome