1. De novo OTU picking from long amplicons with LACA
  2. LACA installation
    1. Docker image
    2. Installation from GitHub repository
  3. A demo run with LACA
    1. Example with a quick start
    2. Get familiar with LACA usage
    3. Run LACA with a demo dataset

De novo OTU picking from long amplicons with LACA

LACA is a reproducible and scalable workflow for Long Amplicon Consensus Analysis, e.g., 16S rRNA gene amplicon analysis. It uses snakemake to manage the workflow and conda to manage the environment.

LACA installation

The full installation guide of LACA is available here.

You can choose to install LACA with docker image (which is only solution for MacOS users) or from the GitHub repository according to your OS and preference.

Docker image

The easiest way to use LACA is to pull the docker image from Docker Hub for cross-platform support.

docker pull yanhui09/laca

LACA is built for linux/amd64 platform, with cross-platform support through docker.

MacOS users needs to use docker container to run LACA.

Installation from GitHub repository

1. Clone the Github repository and create an isolated conda environment

git clone https://github.com/yanhui09/laca.git
cd laca
mamba env create -n laca -f env.yaml 

2. Install LACA with pip

To avoid inconsistency, we suggest installing LACA in the above conda environment

conda activate laca
pip install --editable .

A demo run with LACA

Find a full usage guide here.

Example with a quick start

laca init -b /path/to/basecalled_fastqs -d /path/to/database    # init config file and check
laca run all                                         # start analysis

Get familiar with LACA usage

LACA is easy to use. You can start a new analysis in two steps using laca init and laca run .

Remember to activate the conda environment if LACA is installed in a conda environment.

conda activate laca
laca -h

To use the docker image, you need to mount your data directory, e.g., pwd, to the /home in the container.

docker run -it -v `pwd`:/home --privileged yanhui09/laca
laca -h

1. Intialize a config file with laca init

laca init will generate a config file in the working directory, which contains the necessary parameters to run LACA.

laca init -h

2. Start analysis with laca run

laca run will trigger the full workflow or a specfic module under defined resource accordingly. Get a dry-run overview with -n. Snakemake arguments can be appened to laca run as well.

laca run -h

Run LACA with a demo dataset

0. Make sure you have downlowded the required demo dataset from here. And the enter the directory with cd.

E.g., Enter a directory with an absolute path (“long path”) is /home/me/MAC2023-extra.

cd /home/me/MAC2023-extra

If you haven’t downloaded the data yet and with Git installed,

git clone https://github.com/yanhui09/MAC2023-extra.git
cd ./MAC2023-extra 

1. Check where you are and try laca init, check the genereated config.yaml file.

pwd
laca init -b ./data/ont16s -d ./database -w ./laca_output --fqs-min 50
cat ./laca_output/config.yaml

2. Start LACA in a dry and real run

laca run all -w ./laca_output -n 
laca run kmerCon -j 4 -w ./laca_output      

LACA is able to generate otu table, taxonomy table and phylogenetic tree if you run the full workflow with laca run all. But it takes time to prepare the database and installation for the first use.

As an example, here we only run the kmerCon module to extract consensus sequences based on kmer frequency.

Take a look at these consensus sequences, take the first one for BLAST search against rRNA/ITS database.

head -n2 ./laca_output/kmerCon/kmerCon.fna

Expected output:

>pooled_0b000_0cand1
CACAATGGGCGCAAGCCTGATGCAGCGACGCCGCGTGCGGGATGACGGCCTTCGGGTTGTAAACCGCTTTTGACTGGGAGCAAGCCCTTCGGGGTGAGTGTACCTTTCGAATAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCAAGCGTTATCCGGAATTATTGGGCGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGTCCATCGCTTAACGGTGGATCCGCGCCGGGTACGGGCGGGCTTGAGTGCGGTAGGGGAGACTGGAATTCCCGGTGTAACGGTGGAATGTGTAGATATCGGGAAGAACACCAATGGCGAAGGCAGGTCTCTGGGCCGTCACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGGTGGATGCTGGATGTGGGGACCATTCCACGGTCTCCGTGTCGGAGCCAACGCGTTAAGCATCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGAAATTGACGGGGGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCTGGGCTTGACATGTTCCCGACAGCCGTAGAGATACGGCCTCCCTTCGGGGCGGGTTCACAGGTGGTGCATGGTCGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTCGCCCTGTGTTGCCAGCACGTCGTGGTGGGAACTCACGGGGGACCGCCGGGGTCAACTCGGAGGAAGGTGGGGATGACGTCAGATCATCATGCCCCTTACGTCCAGGGCTTCACGCATGCTACAATGGCCGGTACAACGGGATGCGACCTCGCGAGGGGGAGCGGATCCCTTAAAACCGGTCTCAGTTCGGATTGGAGTCTGCAACCCGACTCCATGAAGGCGGAGTCGCTAGTAATCGCGGATCAGCAACGCCGCGGTGAATGCGTTCCCGGGCC

BLAST

The BLAST result indicates that this sequence is a 16S rRNA gene fragment from Bifidobacterium with over 99% identity.

Take one ONT read and do the same BLAST search. What do you expect to see?

zcat ./data/ont16s/*.fastq.gz | head

© 2023 Yan Hui. Released under the CC BY-SA license