sctools.test package

Submodules

sctools.test.test_bam module

sctools.test.test_bam.bamfile(request)[source]

sctools.test.test_bam.indices(request)[source]: fixture returns indices from a SubsetAlignments objects for testing

sctools.test.test_bam.make_records_from_values(tag_keys, tags_and_query_name)[source]

sctools.test.test_bam.n_nonspecific()[source]: the number of non-specific records to extract

sctools.test.test_bam.n_specific()[source]: the number of specific records to extract

sctools.test.test_bam.sa_object(request)[source]: fixture returns SubsetAlignments objects for testing

sctools.test.test_bam.tagged_bam()[source]

sctools.test.test_bam.test_chromosome_19_comes_before_21(indices)[source]: chromosome 19 comes before 21 in the test file, this should be replicated in the output

sctools.test.test_bam.test_correct_number_of_indices_are_extracted(sa_object, n_specific, n_nonspecific)[source]

sctools.test.test_bam.test_get_barcode_for_alignment(tagged_bam)[source]

sctools.test.test_bam.test_get_barcode_for_alignment_raises_error_for_missing_tag(tagged_bam)[source]

sctools.test.test_bam.test_get_barcodes_from_bam(tagged_bam)[source]

sctools.test.test_bam.test_get_barcodes_from_bam_with_raise_missing_true_raises_warning_without_cr_barcode_passed(tagged_bam)[source]

sctools.test.test_bam.test_incorrect_extension_does_not_raise_when_open_mode_is_specified()[source]

sctools.test.test_bam.test_incorrect_extension_without_open_mode_raises_value_error()[source]

sctools.test.test_bam.test_indices_are_all_greater_than_zero(sa_object, n_specific, n_nonspecific)[source]

sctools.test.test_bam.test_sort_by_tags_and_queryname_sorts_correctly_from_file()[source]

sctools.test.test_bam.test_sort_by_tags_and_queryname_sorts_correctly_from_file_no_tag_keys()[source]

sctools.test.test_bam.test_sort_by_tags_and_queryname_sorts_correctly_no_tag_keys()[source]

sctools.test.test_bam.test_split_bam_raises_value_error_when_passed_bam_without_barcodes(bamfile)[source]

sctools.test.test_bam.test_split_on_tagged_bam(tagged_bam)[source]

sctools.test.test_bam.test_split_succeeds_with_raise_missing_false_and_no_cr_barcode_passed(tagged_bam)[source]

sctools.test.test_bam.test_split_with_large_chunk_size_generates_one_file(tagged_bam)[source]

sctools.test.test_bam.test_split_with_raise_missing_true_raises_warning_without_cr_barcode_passed(tagged_bam)[source]

sctools.test.test_bam.test_str_and_int_chromosomes_both_function(sa_object)[source]

sctools.test.test_bam.test_tag_sortable_record_eq_is_false_when_any_difference_exists()[source]

sctools.test.test_bam.test_tag_sortable_record_eq_is_true_for_identical_records()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_empty_query_name_is_smaller()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_empty_tag_is_smaller()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_is_false_for_equal_records()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_is_true_for_smaller_query_name()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_is_true_for_smaller_tag()[source]

sctools.test.test_bam.test_tag_sortable_record_lt_is_true_for_smaller_tag_regardless_of_query_name()[source]

sctools.test.test_bam.test_tag_sortable_record_missing_tag_value_is_empty_string()[source]

sctools.test.test_bam.test_tag_sortable_records_compare_correctly()[source]

sctools.test.test_bam.test_tag_sortable_records_raises_error_on_different_tag_lists()[source]

sctools.test.test_bam.test_tag_sortable_records_sort_correctly()[source]

sctools.test.test_bam.test_tag_sortable_records_sort_correctly_when_already_sorted()[source]

sctools.test.test_bam.test_tag_sortable_records_str()[source]

sctools.test.test_bam.test_verify_sort_on_unsorted_records_raises_error()[source]

sctools.test.test_bam.test_verify_sort_raises_no_error_on_sorted_records()[source]

sctools.test.test_bam.test_write_barcodes_to_bins(tagged_bam)[source]

sctools.test.test_barcode module

sctools.test.test_barcode.barcode_set()[source]

sctools.test.test_barcode.short_barcode_set_from_encoded()[source]

sctools.test.test_barcode.short_barcode_set_from_iterable(request)[source]

sctools.test.test_barcode.tagged_bamfile()[source]

sctools.test.test_barcode.test_barcode_diversity_is_in_range(barcode_set)[source]

sctools.test.test_barcode.test_base_frequency_sums_are_all_equal_to_barcode_set_length(barcode_set)[source]

sctools.test.test_barcode.test_correct_bam_produces_cb_tags(tagged_bamfile, truncated_whitelist_from_10x)[source]

sctools.test.test_barcode.test_correct_barcode_finds_and_corrects_1_base_errors(trivial_whitelist)[source]

sctools.test.test_barcode.test_correct_barcode_raises_keyerror_when_barcode_has_more_than_one_error(trivial_whitelist)[source]

sctools.test.test_barcode.test_correct_barcode_raises_keyerror_when_barcode_not_correct_length(trivial_whitelist)[source]

sctools.test.test_barcode.test_incorrect_input_raises_errors(trivial_whitelist)[source]

sctools.test.test_barcode.test_iterable_produces_correct_barcodes(short_barcode_set_from_encoded)[source]

sctools.test.test_barcode.test_reads_barcodes_from_file(barcode_set)[source]

sctools.test.test_barcode.test_summarize_hamming_distances_gives_reasonable_results(short_barcode_set_from_iterable)[source]

sctools.test.test_barcode.trivial_whitelist()[source]

sctools.test.test_barcode.truncated_whitelist_from_10x()[source]

sctools.test.test_encodings module

sctools.test.test_encodings.encoder(request)[source]

sctools.test.test_encodings.encoder_2bit(sequence)[source]

sctools.test.test_encodings.encoder_3bit()[source]

sctools.test.test_encodings.sequence()[source]

sctools.test.test_encodings.simple_barcodes()[source]: simple barcode set with min_hamming = 1, max_hamming = 2

sctools.test.test_encodings.simple_hamming_distances(simple_barcodes)[source]

sctools.test.test_encodings.test_encoded_hamming_distance_is_accurate(simple_hamming_distances, simple_barcodes, encoder)[source]

sctools.test.test_encodings.test_three_bit_encode_decode_produces_same_string(sequence, encoder_3bit)[source]

sctools.test.test_encodings.test_three_bit_encoder_gets_correct_gc_content(sequence, encoder_3bit)[source]

sctools.test.test_encodings.test_three_bit_encodes_unknown_nucleotides_as_N(encoder_3bit)[source]

sctools.test.test_encodings.test_two_bit_encode_decode_produces_same_string_except_for_N(sequence, encoder_2bit)[source]

sctools.test.test_encodings.test_two_bit_encoder_gets_correct_gc_content(encoder_2bit)[source]

sctools.test.test_encodings.test_two_bit_throws_errors_when_asked_to_encode_unknown_nucleotide(encoder_2bit)[source]

sctools.test.test_entrypoints module

sctools.test.test_entrypoints.test_Attach10XBarcodes_entrypoint()[source]

sctools.test.test_entrypoints.test_Attach10XBarcodes_entrypoint_with_whitelist()[source]

sctools.test.test_entrypoints.test_AttachBarcodes_entrypoint_with_whitelist()[source]

sctools.test.test_entrypoints.test_count_merge()[source]

sctools.test.test_entrypoints.test_split_bam()[source]

sctools.test.test_entrypoints.test_tag_sort_bam()[source]

sctools.test.test_entrypoints.test_tag_sort_bam_dash_t_specified_multiple_times()[source]

sctools.test.test_entrypoints.test_tag_sort_bam_no_tags()[source]

sctools.test.test_entrypoints.test_verify_bam_sort()[source]

sctools.test.test_entrypoints.test_verify_bam_sort_raises_error_on_unsorted()[source]

sctools.test.test_fastq module

sctools.test.test_fastq.barcode_generator_with_corrected_cell_barcodes()[source]

sctools.test.test_fastq.bytes_fastq_record()[source]

sctools.test.test_fastq.embedded_barcode_generator()[source]

sctools.test.test_fastq.i7_files_compressions_and_modes(request)[source]: generates different compression types and modes for testing

sctools.test.test_fastq.reader_all_compressions(request)[source]: generates open fastq reader files for each compression and read mode

sctools.test.test_fastq.string_fastq_record()[source]

sctools.test.test_fastq.test_bytes_fastq_record_quality_score_parsing(bytes_fastq_record)[source]

sctools.test.test_fastq.test_corrects_barcodes(barcode_generator_with_corrected_cell_barcodes)[source]

sctools.test.test_fastq.test_embedded_barcode_generator_produces_outputs_of_expected_size(embedded_barcode_generator)[source]

sctools.test.test_fastq.test_fastq_returns_correct_filesize_for_single_and_multiple_files()[source]

sctools.test.test_fastq.test_fields_populate_properly(reader_all_compressions)[source]

sctools.test.test_fastq.test_invalid_open_mode_raises_valueerror()[source]

sctools.test.test_fastq.test_mixed_filetype_read_gets_correct_record_number()[source]

sctools.test.test_fastq.test_non_string_filename_in_iterable_raises_typeerror()[source]

sctools.test.test_fastq.test_non_string_filename_raises_typeerror()[source]

sctools.test.test_fastq.test_printing_bytes_record_generates_valid_fastq_record(bytes_fastq_record)[source]

sctools.test.test_fastq.test_printing_string_record_generates_valid_fastq_record(string_fastq_record)[source]

sctools.test.test_fastq.test_reader_properly_subsets_based_on_indices()[source]

sctools.test.test_fastq.test_reader_reads_correct_number_of_records_across_multiple_files(reader_all_compressions)[source]

sctools.test.test_fastq.test_reader_reads_first_record(reader_all_compressions)[source]

sctools.test.test_fastq.test_reader_skips_header_character_raises_value_error(i7_files_compressions_and_modes)[source]

test should skip the first name line, shifting each record up 1. As a result, the: first sequence should be found in the name field

sctools.test.test_fastq.test_reader_stores_filenames()[source]

sctools.test.test_fastq.test_string_fastq_record_quality_score_parsing(string_fastq_record)[source]

sctools.test.test_fastq.test_zipping_readers_generates_expected_output()[source]

sctools.test.test_fastq.test_zipping_readers_with_indices_generates_expected_output()[source]

sctools.test.test_gtf module

sctools.test.test_gtf.files(request)[source]: returns a filename

sctools.test.test_gtf.test_opens_file_parses_size(files)[source]

sctools.test.test_gtf.test_opens_file_populates_fields_properly(files)[source]

sctools.test.test_gtf.test_opens_file_reads_first_line(files)[source]

sctools.test.test_gtf.test_set_attribute_verify_included_in_output_string(files)[source]

sctools.test.test_metrics module

sctools.test.test_metrics.mergeable_cell_metrics()[source]

sctools.test.test_metrics.mergeable_gene_metrics()[source]

sctools.test.test_metrics.split_metrics_file(metrics_file)[source]: produces two mergeable on-disk metric files from a single file that contain the first 3/4 of the file in the first output and the last 3/4 of the file in the second output, such that 1/2 of the metrics in the two files overlap

sctools.test.test_metrics.test_calculate_cell_metrics_cli()[source]: test the sctools cell metrics CLI invocation

sctools.test.test_metrics.test_calculate_gene_metrics_cli()[source]: test the sctools gene metrics CLI invocation

sctools.test.test_metrics.test_cell_metrics_mean_n_genes_observed()[source]: test that the GatherCellMetrics method identifies the correct number of genes per cell, on average.

sctools.test.test_metrics.test_duplicate_records(metrics, expected_value)[source]: Duplicate records are identified by the 1024 bit being set in the sam flag

sctools.test.test_metrics.test_fragments_number_is_greater_than_molecule_number(metrics)[source]: There should always be more fragments than molecules, as the minimum definition of a molecule is a fragment covered by a single read

sctools.test.test_metrics.test_gene_metrics_n_genes()[source]: Test that GatherGeneMetrics identifies the total number of genes in the test file

sctools.test.test_metrics.test_gzip_compression(bam: str, gatherer: Callable)[source]: gzip compression should produce a .gz file which is identical when uncompressed to the uncompressed version

sctools.test.test_metrics.test_higher_order_metrics_by_gene(metrics, key, expected_value)[source]

Test metrics that depend on other metrics

This class tests a very large number of higher-order metrics that examine the functionality of the test suite across all measured instances of the metric class. E.g. for cell metrics (class), each test will verify the value for each cell (instance).

Parameters

metrics (pd.DataFrame) – Output from subclass of sctools.metrics.MetricAggregator
key (str) – The column of metrics to interrogate in the parametrized test
expected_value (np.ndarray) – An array of expected values

sctools.test.test_metrics.test_merge_cell_metrics_cli(mergeable_cell_metrics)[source]: test the sctools merge cell metrics CLI invocation

sctools.test.test_metrics.test_merge_cell_metrics_does_not_correct_duplicates(mergeable_cell_metrics)[source]: test takes offset cell metrics outputs and merges them. Cell metrics does not check for duplication, so should return a 2x length file.

sctools.test.test_metrics.test_merge_gene_metrics_averages_over_multiply_detected_genes(mergeable_gene_metrics)[source]

sctools.test.test_metrics.test_merge_gene_metrics_cli(mergeable_gene_metrics)[source]: test the sctools merge gene metrics CLI invocation

sctools.test.test_metrics.test_metrics_highest_expression_class(metrics, expected_value)[source]: for gene metrics, this is the highest expression gene. For cell metrics, this is the highest expression cell.

sctools.test.test_metrics.test_metrics_highest_read_count(metrics, expected_value)[source]: Test that each metric identifies the what the highest read count associated with any single entity

sctools.test.test_metrics.test_metrics_n_fragments(metrics, expected_value)[source]

Test that each metric identifies the total number of fragments in the test file.

Fragments are defined as a unique combination of {cell barcode, molecule barcode, strand, position, chromosome}

sctools.test.test_metrics.test_metrics_n_molecules(metrics, expected_value)[source]

Test that each metric identifies the total number of molecules in the test file

Molecules are defined as a unique combination of {cell barcode, molecule barcode, gene}

sctools.test.test_metrics.test_metrics_n_reads(metrics, expected_value)[source]: test that the metrics identify the correct read number

sctools.test.test_metrics.test_metrics_number_perfect_cell_barcodes(metrics, expected_value)[source]: Test that each metric correctly identifies the number of perfect cell barcodes where CB == CR

sctools.test.test_metrics.test_metrics_number_perfect_molecule_barcodes(metrics, expected_value)[source]: Test that each metric correctly identifies the number of perfect molecule barcodes where UB == UR

sctools.test.test_metrics.test_reads_mapped_exonic(metrics, expected_value)[source]: Test that each metric identifies the number of reads mapped to an exon (XF==’CODING’)

sctools.test.test_metrics.test_reads_mapped_intronic(metrics, expected_value)[source]: Test that each metric identifies the number of reads mapped to an intron (XF==’INTRONIC’)

sctools.test.test_metrics.test_reads_mapped_uniquely(metrics, expected_value)[source]: Uniquely mapping reads will be tagged with NH==1

sctools.test.test_metrics.test_reads_mapped_utr(metrics, expected_value)[source]: Test that each metric identifies the number of reads mapped to a UTR (XF==’UTR’)

sctools.test.test_metrics.test_single_read_evidence(metrics, key, expected_value)[source]: We want to determine how many molecules and fragments are covered by only one read, as reads covered by multiple reads have much lower probabilities of being the result of error processes.

sctools.test.test_metrics.test_spliced_reads(metrics, expected_value)[source]: This pipeline defines spliced reads as containing an N segment of any length in the cigar string

sctools.test.test_stats module

sctools.test.test_stats.test_balanced_data_produces_entropy_1()[source]

sctools.test.test_stats.test_balanced_unnormalized_data_produces_entropy_1()[source]

sctools.test.test_stats.test_concentrated_data_produces_entropy_0()[source]

sctools.test.test_stats.test_concentrated_unnormalized_data_produces_entropy_0()[source]