SamBuilder for Test SAM Files and Records

Mar 31, 2025

Have you ever written Python code to manipulate a BAM file?
How can you be sure the code does what you intended?
Does your code handle all edge cases properly?

Maybe you’ve run your code with existing BAM files,
or added those BAMs to a suite of test data.
But does the data in those BAMs really flex all edge cases of your code?

How easy is it to maintain a suite of test data files?
Are you tempted to re-use a data file across multiple tests?

In this post I’m going to show you how to use fgpyo’s SamBuilder to quickly generate SAM/BAM data to assert your bioinformatics Python code does what you think it does.

Motivation for Programmatic Test Data

Why use programmatically generated test data instead of flat files?

Keeps like concerns together. Viewing a separate test data file to see the specifics of a test case violates keeping like concerns together. Modifying existing test files to suit new unit tests can cause bugs. Modular code is easier to maintain and review.
Declarative approach. Using a programmatic API to build up a test file is declarative—a reader knows exactly which field or parameter is relevant to the test.
Inline comments. SAM/BAM files don’t support inline comments, and for other file types, comments in a test file violates (1) since the comment is separate from the unit test.
Avoids coupling multiple files (potentially of different types). If your function/tool manipulates SAM records based on information from another file type, e.g. filtering a BAM according to intervals in a BED file, then having multiple test files violates (1). Instead, you can generate the test data programmatically prior to each test.

Basic Usage of `SamBuilder`

Common Test Patterns Using `SamBuilder`

There are two common test patterns that make SamBuilder valuable.

First we’re going to look at generating test data files on-the-fly. This pattern is most useful for integration testing. In bioinformatics-land, this usually means either:

Testing a standalone CLI tool, or
Testing a workflow that combines several tools (e.g. Nextflow or Snakemake)

Second, we’ll use SamBuilder to write unit tests for Python functions. This pattern builds up reads in memory without ever writing them to disk and is most useful for unit testing. I’ll use this example to demonstrate some of the most commonly used SamBuilder options.

Build the Working Environment

First, let’s build our working environment using the Conda/Mamba package manager. I’m using Mamba here, but the Conda package manager will also work. The YAML environment file and the following shell commands will create the environment sam_builder_example:

        
  name: sam_builder_example
dependencies:
  - python=3.12
  - fgpyo=0.3.0
  - samtools=1.20
  - pysam=0.22.1
  - pytest=8.2.2

view raw SamBuilder_7967f31.yml hosted with ❤ by GitHub

        
  mamba env create -f "environment.yml"
mamba activate "sam_builder_example"

view raw SamBuilder_70e4d5f.sh hosted with ❤ by GitHub

Basic Usage of `SamBuilder`

First thing’s first; we’ll look at a simple script that uses SamBuilder to generate aligned single-end reads in BAM format. Download the script and try it out.

python "SamBuilder_create_se_test_bam.py" --output "test.bam" --num_reads 5

view raw SamBuilder_b410e5f.sh hosted with ❤ by GitHub

On line 4 the script imports SamBuilder from the fgpyo library.

from fgpyo.sam.builder import SamBuilder

view raw SamBuilder_a160659.py hosted with ❤ by GitHub

On lines 18-23, a SamBuilder object is created to build “num_reads“ single-end reads, and finally the script outputs those reads to a results file.

        
  builder = SamBuilder(r1_len=10)

for offset in range(num_reads):
    builder.add_single(chrom="chr1", start=100 + offset)

builder.to_path(test_bam_file)

view raw SamBuilder_0698fbd.py hosted with ❤ by GitHub

Let’s take a closer look at the important pieces.

First Create a `SamBuilder` Object

builder = SamBuilder(r1_len=10)

view raw SamBuilder_fce9bf6.py hosted with ❤ by GitHub

This line creates a SamBuilder object with default values for all parameters except the R1 length, which is set to 10.

You can set quite a few useful values at this time:

        
    ParameterDefinition
r1_lenThe default length of read 1
r2_lenThe default length of read 2 (if paired end)
base_qualityThe default base quality
sdA sequence dictionary defining reference sequences and lengths
rgA default read group; some tools are strict about read groups
extra_headerAny extra headers for advanced usage
seedSeed for anything randomly generated
sort_orderThe output sort order which defaults to coordinate sort

	Parameter	Definition
	r1_len	The default length of read 1
	r2_len	The default length of read 2 (if paired end)
	base_quality	The default base quality
	sd	A sequence dictionary defining reference sequences and lengths
	rg	A default read group; some tools are strict about read groups
	extra_header	Any extra headers for advanced usage
	seed	Seed for anything randomly generated
	sort_order	The output sort order which defaults to coordinate sort

view raw SamBuilder_3ac53c5.tsv hosted with ❤ by GitHub

Add an Aligned Single-end Read

builder.add_single(chrom="chr1", start=100 + offset)

view raw SamBuilder_7cbcfd1.py hosted with ❤ by GitHub

Adds a single-end read to the builder (returns it too, but we’ll come to that later!) with specified mapping on chr1 starting at position 100 plus the loop index as a positional offset.

Write the Reads to a BAM file

builder.to_path(test_bam_file)

view raw SamBuilder_88e4cb5.py hosted with ❤ by GitHub

This line writes all the accumulated reads to a BAM file, with a header determined in part by the sequence dictionary and read group information that can be passed to the constructor, and with sort order determined by the constructor. Writes an index by default if coordinate-sorted.

Run the script and take a look at the output file this script generates.

Yours will be the same as mine, because we set the random seed to the same value—convenient for me writing this blog post, essential for developers using SamBuilder to generate test data!

samtools view "test.bam"

view raw SamBuilder_b3c76d2.sh hosted with ❤ by GitHub

        
    QNAMEFLAGRNAMEPOSCIGARMAPQCIGARRNEXTTLENSEQQUALRG
q00000chr11016010M*00GACAGGTACA??????????RG:Z:1
q00010chr11026010M*00AGAAGGAGTA??????????RG:Z:1
q00020chr11036010M*00TGCATCAATG??????????RG:Z:1
q00030chr11046010M*00TGGTCGTGTG??????????RG:Z:1
q00040chr11056010M*00GAACAAACGC??????????RG:Z:1

QNAME	RNAME	POS	CIGAR	MAPQ	CIGAR	SEQ	QUAL	RG
q0000	chr1	101	60	10M	*	GACAGGTACA	??????????	RG:Z:1
q0001	chr1	102	60	10M	*	AGAAGGAGTA	??????????	RG:Z:1
q0002	chr1	103	60	10M	*	TGCATCAATG	??????????	RG:Z:1
q0003	chr1	104	60	10M	*	TGGTCGTGTG	??????????	RG:Z:1
q0004	chr1	105	60	10M	*	GAACAAACGC	??????????	RG:Z:1

view raw SamBuilder_1619f7b.tsv hosted with ❤ by GitHub

Test Pattern 1: Test Data Files On-The-Fly

Let’s say we want to test the functionality of a command that performs simple read filtering based on SAM flags with samtools view. Let’s test the following command:

samtools view -F 0x4 -hb

view raw SamBuilder_b76cabb.sh hosted with ❤ by GitHub

This command removes unmapped reads from an input BAM file (-F 0x4) and outputs the retained reads in BAM format (-b) with the header (-h). Our expected input will be any valid BAM file, but ideally one with a mix of mapped and unmapped reads. Our expected output will be a BAM file with only mapped reads retained. So, we need to generate two BAM files for integration testing:

Test Input: a BAM file with mapped and unmapped reads
Expected Output: a BAM file with only the mapped reads from Test Input

We will modify SamBuilder_create_se_test_bam.py to perform this functionality. The final version of the script is available here.

Output a BAM File Containing Test Records

Essential arguments to parse will be added to the command line parser:

        
  parser.add_argument(
    "-e", "–expected", help="Output expected BAM file",
    type=str, required=True
) 
parser.add_argument(
    "-u", "–num_unmapped",
    help="Number of unmapped reads to write in test BAM file",
    type=int, required=True
)

view raw SamBuilder_5bc4bab.py hosted with ❤ by GitHub

And a create_test_bam function needs to be defined:

        
  def create_test_bam(
    test_bam_file: Path,
    num_reads: int,
    num_unmapped: int,
) -> None:

view raw SamBuilder_55997fb.py hosted with ❤ by GitHub

Finally, the new create_test_bam function in a script main() is called:

        
  create_test_bam(
    test_bam_file=Path(args.output),
    num_reads=args.num_reads,
    num_unmapped=args.num_unmapped,
)

view raw SamBuilder_b70abcf.py hosted with ❤ by GitHub

Add Unmapped Reads BAM to the Output

The expected output file will include only the mapped reads. The first num_reads that are added to builder are all mapped, because they all have chr and start specified. Let’s output those to expect_bam_file by replacing test_bam_file with expect_bam_file in the call to builder.to_path().

builder.to_path(expect_bam_file)

view raw SamBuilder_3d2a1af.py hosted with ❤ by GitHub

We can add the unmapped reads to the builder with another loop, this time adding num_unmapped reads. Note that chr and start are not specified, so SamBuilder knows to use ‘*’ in the output BAM file to indicate they are unmapped.

        
  for offset in range(num_unmapped):
    builder.add_single()

view raw SamBuilder_f1a26b8.py hosted with ❤ by GitHub

Now output all the reads to the test_bam_file.

builder.to_path(test_bam_file)

view raw SamBuilder_6e8f00a.py hosted with ❤ by GitHub

SamBuilder is particularly useful in these scenarios where some reads should be filtered out using the pattern:

Build and add all reads that should be output
Output expected reads to an expected data file
Build and add all reads that should not be output
Output all reads to an input data file

You can use this pattern for unit testing as well, such as if you have a function that filters a list of reads. Instead of outputting the reads to a file, collect them in a list, saving a copy as expected output before adding the reads that should be filtered out.

Test Simple Read Filtering

Create the input and expected data files:

python "SamBuilder_create_samtools_view_test_data.py" \
--num_reads 5 \
--num_unmapped 2 \
--output "test.bam" \
--expected "expect.bam"

view raw SamBuilder_ca980eb.sh hosted with ❤ by GitHub

Run the command:

samtools view -F 0x4 "test.bam" > "output.sam"

view raw SamBuilder_a8c56e7.sh hosted with ❤ by GitHub

Compare the expected and output files:

samtools view "expect.bam" | diff - "output.sam"

view raw SamBuilder_0d250a7.sh hosted with ❤ by GitHub

In this example, the tool behaves as expected since the difference between the files is zero. Good! In the real world, it might not. But now you have a chance to debug and rerun the test until you are certain the behavior is correct.

Test Pattern 2: Unit Test FASTQ to BAM Functions

Download SamBuilder_bam2fastq.py which is a toy script for converting a BAM to a FASTQ file. To see how this script works, let’s first create a new test BAM file with 3 single-end reads.

python "SamBuilder_create_se_test_bam.py" --output "test.bam" --num_reads 3

view raw SamBuilder_5531d55.sh hosted with ❤ by GitHub

Then convert that BAM to FASTQ with default parameters.

python "SamBuilder_bam2fastq.py" --input "test.bam" --output "test.fastq"

view raw SamBuilder_02c03cbc.sh hosted with ❤ by GitHub

Compare the BAM File and the FASTQ File

samtools view "test.bam"

view raw SamBuilder_68a2f92.sh hosted with ❤ by GitHub

        
    QNAMEFLAGRNAMEPOSCIGARMAPQCIGARRNEXTTLENSEQQUALRG
q00000chr11016010M*00GACAGGTACA??????????RG:Z:1
q00010chr11026010M*00AGAAGGAGTA??????????RG:Z:1
q00020chr11036010M*00TGCATCAATG??????????RG:Z:1

QNAME	RNAME	POS	CIGAR	MAPQ	CIGAR	SEQ	QUAL	RG
q0000	chr1	101	60	10M	*	GACAGGTACA	??????????	RG:Z:1
q0001	chr1	102	60	10M	*	AGAAGGAGTA	??????????	RG:Z:1
q0002	chr1	103	60	10M	*	TGCATCAATG	??????????	RG:Z:1

view raw SamBuilder_d134f3d.tsv hosted with ❤ by GitHub

Look at the output of SamBuilder_bam2fastq.py and notice that each of the entries in the BAM file were converted successfully to FASTQ format.

cat "test.fastq"

view raw SamBuilder_259a27c.sh hosted with ❤ by GitHub

        
  @q0000
GACAGGTACA
+
??????????
@q0001
AGAAGGAGTA
+
??????????
@q0002
TGCATCAATG
+
??????????

view raw SamBuilder_a6a6f52.fastq hosted with ❤ by GitHub

Unit Tests for `SamBuilder_bam2fastq.py`

There are three functions in addition to main() in the script. We’ll use SamBuilder to test all three. Download the the script SamBuilder_test_bam2fastq.py and we’ll walk through it together.

Testing the Function `alignment_to_fastq`

This function converts an alignment as represented by a pysam AlignedSegment into a FASTQ string representation:

        
  def alignment_to_fastq(alignment: AlignedSegment) -> str:
    """
    Converts a pysam AlignedSegment to a FASTQ string representation.
    Args:
        alignment: pysam AlignedSegment object to convert.
    Returns:
        FASTQ string representation of the input alignment.
    """
    qualities: str = "".join(map(lambda qual: chr(qual + 33), alignment.query_qualities))
    fastq_string: str = f"@{alignment.query_name}\n" + f"{alignment.query_sequence}\n" + f"+\n{qualities}\n"
    return fastq_string

view raw SamBuilder_1833e6f.py hosted with ❤ by GitHub

Building the Unit Test

To test this function we’ll need some helper functions and classes including SamBuilder and AlignedSegment.

from fgpyo.sam.builder import SamBuilder
from pysam import AlignedSegment
from bam2fastq import bam2fastq

view raw SamBuilder_8e1eb9b.py hosted with ❤ by GitHub

We then need an AlignedSegment with defined sequence bases and quality scores. We could rely on the default query name being of the format qNNNN with NNNN auto-incremented starting from 0, and the default quality score being 30, but it’s better to be explicit in a unit test. Note that this is an unmapped read because chrom and start were not specified. The function we are testing doesn’t care whether or not a read is mapped.

        
  def test_alignment_to_fastq() -> None:
    """Tests generating a FASTQ string from a pysam AlignedSegment object."""
    builder = SamBuilder(r1_len=10, base_quality=30)
    alignment: AlignedSegment = builder.add_single(name="query", bases="A" * 10)

view raw SamBuilder_9757c96.py hosted with ❤ by GitHub

Call alignment_to_fastq:

test_fastq: str = alignment_to_fastq(alignment)

view raw SamBuilder_403578e.py hosted with ❤ by GitHub

Compare the output to the expected FASTQ representation based on the known data. Note that expected_fastq contains the ASCII representation of base quality 30 (?).

expected_fastq = f"@query\n{‘A’*10}\n+\n{‘?’*10}\n"
assert test_fastq == expected_fastq

view raw SamBuilder_1b1c008.py hosted with ❤ by GitHub

Run the Test with `pytest`

Your output should look something like this:

pytest "SamBuilder_test_bam2fastq.py"

view raw SamBuilder_81545f7.sh hosted with ❤ by GitHub

        
  ==================== test session starts ====================
platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/ameynert/scratch/sam_builder_blog
collected 1 item                                             
SamBuilder_test_bam2fastq.py .                         [100%]
===================== 1 passed in 0.04s =====================

view raw SamBuilder_1c8ddcc.txt hosted with ❤ by GitHub

Testing the Function `include_alignment`

This function checks if an alignment should be included in the output according to the given flags. It would not be a function in production code, but I’ve included it here as it’s useful for demonstration.

        
  def include_alignment(
    alignment: AlignedSegment, include_supplementary: bool = False
) -> bool:
    """
    Determines whether an alignment should be included in the output.
    Args:
        alignment: pysam AlignedSegment object to check.
        include_supplementary: Whether to include supplementary reads
          in the output (default: False).
    Returns:
        True if the alignment should be included, False otherwise.
    """
    return include_supplementary or not alignment.is_supplementary

view raw SamBuilder_d2e16a6.py hosted with ❤ by GitHub

Building the Unit Test

For this unit test, we’ll create alignments that are either primary or supplementary.

We’ll need to import our function and pytest too for some advanced testing features:

import pytest
from bam2fastq import include_alignment

view raw SamBuilder_f7ffd34.py hosted with ❤ by GitHub

pytest.mark.parametrize will be used to generate the list of parameters to test over. This ensures each test gets run and that a failure of one assert does not prevent the others from being tested. We parametrize over ("is_supplementary", "include_supplementary", "expected"). A primary alignment should be included regardless of the value of include_supplementary. A supplementary alignment should only be included if include_supplementary=True

        
  @pytest.mark.parametrize(
    ("is_supplementary", "include_supplementary", "expected"),
    [
        (False, True, True),
        (False, False, True),
        (True, True, True),
        (True, False, False),
    ],
)
def test_include_alignment_primary_only(
    is_supplementary: bool, include_supplementary: bool, expected: bool
) -> None:

view raw SamBuilder_edeef7a.py hosted with ❤ by GitHub

In the body of the test function, build an AlignedSegment object and test include_alignment.

        
  builder = SamBuilder()
alignment = builder.add_single(chrom="chr2", start=200, supplementary=is_supplementary)
assert include_alignment(alignment=alignment, include_supplementary=include_supplementary) == expected

view raw SamBuilder_370ee9e6.py hosted with ❤ by GitHub

Run the Test with `pytest`

pytest "SamBuilder_test_bam2fastq.py"

view raw SamBuilder_a27866f.sh hosted with ❤ by GitHub

        
  ==================== test session starts ====================
platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/ameynert/scratch/sam_builder_blog
collected 5 items                                            
SamBuilder_test_bam2fastq.py .....                     [100%]
===================== 5 passed in 0.09s =====================

view raw SamBuilder_0b3adde.txt hosted with ❤ by GitHub

Notice there are now 5 tests that have been run–the first test from above and an additional 4 tests from the parameterized test we just wrote.

Testing the Function `bam2fastq`

The final function is the one that’s actually reading in the BAM file and writing it to FASTQ.

        
  def bam2fastq(
    bam_file: Path,
    fastq_file: Path,
    include_supplementary: bool = False,
) -> None:
    """
    Converts reads in a BAM file to FASTQ format.
    Excludes supplementary reads by default.
    Conversion to ASCII encoded quality scores adds 33 to each integer score and converts it to an
    ASCII character, then joins the converted quality scores together into a string.
    Args:
        bam_file: Path to the input BAM file.
        fastq_file: Path to the output FASTQ file.
        include_supplementary: Whether to include supplementary reads in the output (default:
            False).
    """
    with reader(bam_file) as bam, open(fastq_file, "w") as fastq:
        for alignment in bam:
            if include_alignment(
                alignment=alignment,
                include_supplementary=include_supplementary,
            ):
                fastq.write(f"{alignment_to_fastq(alignment)}")

view raw SamBuilder_b43c096.py hosted with ❤ by GitHub

Building the Unit Test

Given that we already have tests for the two helper functions, a sufficient unit test for this function is to create a BAM file with a mix of primary and supplementary alignments, call the function with include_supplementary=False, and read back in the FASTQ to see if it only includes the primary alignments.

Add imports for reading FASTQ files from pysam:

from pathlib import Path
from pysam import FastqProxyfrom pysam import FastxFile
from bam2fastq import bam2fastq

view raw SamBuilder_b8c31f3.py hosted with ❤ by GitHub

Create a BAM file with a mix of reads:

        
  def test_bam2fastq(tmp_path: Path) -> None:
    """
    Tests converting a BAM file to FASTQ format, excluding supplementary
    reads.
    """
    builder = SamBuilder(r1_len=10, base_quality=30)
    builder.add_single(name="query1", bases="A" * 10)
    builder.add_single(name="query2", bases="C" * 10)
    builder.add_single(name="query3", bases="G" * 10, supplementary=True)
    bam_file = tmp_path / "test.bam"
    builder.to_path(bam_file)

view raw SamBuilder_f700983.py hosted with ❤ by GitHub

Convert it to FASTQ:

fastq_file = tmp_path / “test.fastq”

bam2fastq(bam_file=bam_file, fastq_file=fastq_file, include_supplementary=False)

view raw SamBuilder_cc755cf.py hosted with ❤ by GitHub

Read in the FASTQ file and verify the contents are as expected:

        
  with FastxFile(fastq_file) as fh:
    query_to_fastq: dict[str, FastqProxy] = {entry.name: entry for entry in fh}

    assert len(query_to_fastq.keys()) == 2
    assert "query1" in query_to_fastq
    assert "query2" in query_to_fastq
    assert "query3" not in query_to_fastq

view raw SamBuilder_ff39cf6.py hosted with ❤ by GitHub

Run the Test with `pytest`

pytest "SamBuilder_test_bam2fastq.py"

view raw SamBuilder_b5f192b.sh hosted with ❤ by GitHub

        
  ==================== test session starts ====================
platform darwin -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
rootdir: /Users/ameynert/scratch/sam_builder_blog
collected 6 items                                            
SamBuilder_test_bam2fastq.py ......                    [100%]
===================== 6 passed in 0.04s =====================

view raw SamBuilder_d075083.txt hosted with ❤ by GitHub

Additional SamBuilder Functionality

Creating Paired-end Read Alignments

Take a look at SamBuilder_create_pe_test_bam.py. The only functional differences between this and the previously written scripts to generate single-end read data are in the call to SamBuilder, where a read length for R2 is provided, and the substitution of add_pair() for add_single(). The default strand orientation for add_pair() is F1R2 (strand1='+', strand2='-').

        
  builder = SamBuilder(r1_len=10, r2_len=10)

for offset in range(num_reads): 
    builder.add_pair(
        chrom1=”chr1″,
        start1=100 + offset,
        chrom2=”chr2″,
        start2=200 – offset
    )

for offset in range(num_unmapped): 
    builder.add_pair()

builder.to_path(test_bam_file)

view raw SamBuilder_93eee9e.py hosted with ❤ by GitHub

Create a paired-end BAM file.

python "SamBuilder_create_pe_test_bam.py" -o "test_pe.bam" -n 3 -u 2

view raw SamBuilder_442f2077.sh hosted with ❤ by GitHub

View the SAM records with samtools view.

samtools view "test_pe.bam"

view raw SamBuilder_6a209f9.sh hosted with ❤ by GitHub

        
    QNAMEFLAGRNAMEPOSCIGARMAPQCIGARRNEXTTLENSEQQUALRGMATE_CIGAR
q000097chr11016010Mchr22010GACAGGTACA??????????RG:Z:1MC:Z:10M
q000197chr11026010Mchr22000TGCATCAATG??????????RG:Z:1MC:Z:10M
q000297chr11036010Mchr21990GAACAAACGC??????????RG:Z:1MC:Z:10M
q0002145chr21996010Mchr11030CACTGGAGAC??????????RG:Z:1MC:Z:10M
q0001145chr22006010Mchr11020TGGTCGTGTG??????????RG:Z:1MC:Z:10M
q0000145chr22016010Mchr11010AGAAGGAGTA??????????RG:Z:1MC:Z:10M
q000377*00**00TGGGTTAACC??????????RG:Z:1
q0003141*00**00ATTCGCTCCA??????????RG:Z:1

view raw SamBuilder_c7a3649.tsv hosted with ❤ by GitHub

When unit testing, builder.add_pair returns the read pair as a Tuple of two AlignedSegment objects.

        
  builder = SamBuilder()
(read1, read2) = builder.add_pair()

view raw SamBuilder_7f5b1bc.py hosted with ❤ by GitHub

The `Template` Container

fgpyo has a handy Template container for all the AlignedSegment objects belonging to the same original sequence template. The Template.build() function is used to group these together with some basic validation.

        
  builder = SamBuilder()
(read1, read2) = builder.add_pair()
template = Template.build([read1, read2])

view raw SamBuilder_ae8666a.py hosted with ❤ by GitHub

Template has only 7 properties, a name plus 3 for each of R1 and R2:

r[1,2]: optional primary alignment
r[1,2]_supplementals: supplementary alignments
r[1,2]_secondaries: secondary alignments

This blog hasn’t looked at secondary alignments at all, but just to remind you a secondary alignment is an alternative mapping of the representative alignment, and a supplementary alignment is an alignment of a part of the read that was not aligned in the primary alignment.

Supplementary Alignments

The examples above with supplementary alignments were creating standalone alignments that were not associated with a given primary alignment. Template can be useful to group these together.

Here’s an example where R1 has a supplementary alignment.

        
  builder = SamBuilder(r1_len=20, r2_len=20)
(read1, read2) = builder.add_pair(
    name="query",
    chrom1="chr1",
    start1=100,
    bases1="A" * 10,
    chrom2="chr1",
    start2=200,
    bases2="C" * 20,
)
read1_supp = builder.add_single(
    name="query",
    read_num=1,
    chrom="chr2",
    start=190,
    supplementary=True
)
template = Template.build([read1, read2, read1_supp])

view raw SamBuilder_2f215d2.py hosted with ❤ by GitHub

Complex sets of pairs and supplementary alignments can be built to represent split-read and read pair evidence for testing tools that detect structural variation or genome editing.

Suggested Advanced Uses

Use cigar strings to test clipping or variant detection. Check out the cigar submodule in fgpyo for some additional handy manipulation functions.
Use the optional predicate function in builder.to_path() to filter SAM records on-the-fly.
Add header and read group information to mock the output from picky tools.
Use a custom sequence dictionary for testing code that’s designed to work on PCR constructs or transcripts.

	python "SamBuilder_create_samtools_view_test_data.py" \
	--num_reads 5 \
	--num_unmapped 2 \
	--output "test.bam" \
	--expected "expect.bam"

	expected_fastq = f"@query\n{‘A’10}\n+\n{‘?’10}\n"
	assert test_fastq == expected_fastq

SamBuilder for Test SAM Files and Records

Motivation for Programmatic Test Data

Basic Usage of SamBuilder

Common Test Patterns Using SamBuilder

Build the Working Environment

Basic Usage of SamBuilder

First Create a SamBuilder Object

Add an Aligned Single-end Read

Write the Reads to a BAM file

Test Pattern 1: Test Data Files On-The-Fly

Output a BAM File Containing Test Records

Add Unmapped Reads BAM to the Output

Test Simple Read Filtering

Test Pattern 2: Unit Test FASTQ to BAM Functions

Compare the BAM File and the FASTQ File

Unit Tests for SamBuilder_bam2fastq.py

Testing the Function alignment_to_fastq

Building the Unit Test

Run the Test with pytest

Testing the Function include_alignment

Building the Unit Test

Run the Test with pytest

Testing the Function bam2fastq

Building the Unit Test

Run the Test with pytest

Additional SamBuilder Functionality

Creating Paired-end Read Alignments

The Template Container

Supplementary Alignments

Suggested Advanced Uses

Further Reading

Basic Usage of `SamBuilder`

Common Test Patterns Using `SamBuilder`

Basic Usage of `SamBuilder`

First Create a `SamBuilder` Object

Unit Tests for `SamBuilder_bam2fastq.py`

Testing the Function `alignment_to_fastq`

Run the Test with `pytest`

Testing the Function `include_alignment`

Run the Test with `pytest`

Testing the Function `bam2fastq`

Run the Test with `pytest`

The `Template` Container