diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md new file mode 100644 index 0000000..b143cb7 --- /dev/null +++ b/.github/CONTRIBUTING.md @@ -0,0 +1,249 @@ +# Contributing to CCBR Tools + +## Proposing changes with issues + +If you want to make a change, it's a good idea to first +[open an issue](https://code-review.tidyverse.org/issues/) +and make sure someone from the team agrees that it’s needed. + +If you've decided to work on an issue, +[assign yourself to the issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/assigning-issues-and-pull-requests-to-other-github-users#assigning-an-individual-issue-or-pull-request) +so others will know you're working on it. + +## Pull request process + +We use [GitHub Flow](https://docs.github.com/en/get-started/using-github/github-flow) +as our collaboration process. +Follow the steps below for detailed instructions on contributing changes to +CCBR Tools. + +![GitHub Flow diagram](./img/GitHub-Flow_bg-white.png) + +### Clone the repo + +If you are a member of [CCBR](https://github.com/CCBR), +you can clone this repository to your computer or development environment. +Otherwise, you will first need to +[fork](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) +the repo and clone your fork. You only need to do this step once. + +```sh +git clone https://github.com/CCBR/tools +``` + +> Cloning into 'tools'...
+> remote: Enumerating objects: 1136, done.
+> remote: Counting objects: 100% (463/463), done.
+> remote: Compressing objects: 100% (357/357), done.
+> remote: Total 1136 (delta 149), reused 332 (delta 103), pack-reused 673
+> Receiving objects: 100% (1136/1136), 11.01 MiB | 9.76 MiB/s, done.
+> Resolving deltas: 100% (530/530), done.
+ +```sh +cd tools +``` + +### If this is your first time cloning the repo, you may need to install dependencies + +- Install the python dependencies with pip + + ```sh + pip install .[[dev,test]] + ``` + +- Install [`pre-commit`](https://pre-commit.com/#install) if you don't already + have it. Then from the repo's root directory, run + + ```sh + pre-commit install + ``` + + This will install the repo's pre-commit hooks. + You'll only need to do this step the first time you clone the repo. + +### Create a branch + +Create a Git branch for your pull request (PR). Give the branch a descriptive +name for the changes you will make, such as `iss-10` if it is for a specific +issue. + +```sh +# create a new branch and switch to it +git branch iss-10 +git switch iss-10 +``` + +> Switched to a new branch 'iss-10' + +### Make your changes + +Edit the code, write and run tests, and update the documentation as needed. + +#### test + +Changes to the **python package** code will also need unit tests to demonstrate +that the changes work as intended. +We write unit tests with pytest and store them in the `tests/` subdirectory. +Run the tests with `python -m pytest`. + +#### document + +If you have added a new feature or changed the API of an existing feature, +you will likely need to update the documentation in `docs/`. + +### Commit and push your changes + +If you're not sure how often you should commit or what your commits should +consist of, we recommend following the "atomic commits" principle where each +commit contains one new feature, fix, or task. +Learn more about atomic commits here: + + +First, add the files that you changed to the staging area: + +```sh +git add path/to/changed/files/ +``` + +Then make the commit. +Your commit message should follow the +[Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) +specification. +Briefly, each commit should start with one of the approved types such as +`feat`, `fix`, `docs`, etc. followed by a description of the commit. +Take a look at the [Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/#summary) +for more detailed information about how to write commit messages. + +```sh +git commit -m 'feat: create function for awesome feature' +``` + +pre-commit will enforce that your commit message and the code changes are +styled correctly and will attempt to make corrections if needed. + +> Check for added large files..............................................Passed
+> Fix End of Files.........................................................Passed
+> Trim Trailing Whitespace.................................................Failed
+> +> - hook id: trailing-whitespace
+> - exit code: 1
+> - files were modified by this hook
>
+> Fixing path/to/changed/files/file.txt
>
+> codespell................................................................Passed
+> style-files..........................................(no files to check)Skipped
+> readme-rmd-rendered..................................(no files to check)Skipped
+> use-tidy-description.................................(no files to check)Skipped
+ +In the example above, one of the hooks modified a file in the proposed commit, +so the pre-commit check failed. You can run `git diff` to see the changes that +pre-commit made and `git status` to see which files were modified. To proceed +with the commit, re-add the modified file(s) and re-run the commit command: + +```sh +git add path/to/changed/files/file.txt +git commit -m 'feat: create function for awesome feature' +``` + +This time, all the hooks either passed or were skipped +(e.g. hooks that only run on R code will not run if no R files were +committed). +When the pre-commit check is successful, the usual commit success message +will appear after the pre-commit messages showing that the commit was created. + +> Check for added large files..............................................Passed
+> Fix End of Files.........................................................Passed
+> Trim Trailing Whitespace.................................................Passed
+> codespell................................................................Passed
+> style-files..........................................(no files to check)Skipped
+> readme-rmd-rendered..................................(no files to check)Skipped
+> use-tidy-description.................................(no files to check)Skipped
+> Conventional Commit......................................................Passed
> [iss-10 9ff256e] feat: create function for awesome feature
+> 1 file changed, 22 insertions(+), 3 deletions(-)
+ +Finally, push your changes to GitHub: + +```sh +git push +``` + +If this is the first time you are pushing this branch, you may have to +explicitly set the upstream branch: + +```sh +git push --set-upstream origin iss-10 +``` + +> Enumerating objects: 7, done.
+> Counting objects: 100% (7/7), done.
+> Delta compression using up to 10 threads
+> Compressing objects: 100% (4/4), done.
+> Writing objects: 100% (4/4), 648 bytes | 648.00 KiB/s, done.
+> Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
+> remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
+> remote:
+> remote: Create a pull request for 'iss-10' on GitHub by visiting:
+> remote: https://github.com/CCBR/tools/pull/new/iss-10
+> remote:
+> To https://github.com/CCBR/tools
>
> [new branch] iss-10 -> iss-10
+> branch 'iss-10' set up to track 'origin/iss-10'.
+ +We recommend pushing your commits often so they will be backed up on GitHub. +You can view the files in your branch on GitHub at +`https://github.com/CCBR/tools/tree/` +(replace `` with the actual name of your branch). + +### Create the PR + +Once your branch is ready, create a PR on GitHub: + + +Select the branch you just pushed: + +![Create a new PR from your branch](./img/new-PR.png) + +Edit the PR title and description. +The title should briefly describe the change. +Follow the comments in the template to fill out the body of the PR, and +you can delete the comments (everything between ``) as you go. +Be sure to fill out the checklist, checking off items as you complete them or +striking through any irrelevant items. +When you're ready, click 'Create pull request' to open it. + +![Open the PR after editing the title and description](./img/create-PR.png) + +Optionally, you can mark the PR as a draft if you're not yet ready for it to +be reviewed, then change it later when you're ready. + +### Wait for a maintainer to review your PR + +We will do our best to follow the tidyverse code review principles: +. +The reviewer may suggest that you make changes before accepting your PR in +order to improve the code quality or style. +If that's the case, continue to make changes in your branch and push them to +GitHub, and they will appear in the PR. + +Once the PR is approved, the maintainer will merge it and the issue(s) the PR +links will close automatically. +Congratulations and thank you for your contribution! + +### After your PR has been merged + +After your PR has been merged, update your local clone of the repo by +switching to the main branch and pulling the latest changes: + +```sh +git checkout main +git pull +``` + +It's a good idea to run `git pull` before creating a new branch so it will +start from the most recent commits in main. + +## Helpful links for more information + +- [GitHub Flow](https://docs.github.com/en/get-started/using-github/github-flow) +- [semantic versioning guidelines](https://semver.org/) +- [changelog guidelines](https://keepachangelog.com/en/1.1.0/) +- [tidyverse code review principles](https://code-review.tidyverse.org) +- [reproducible examples](https://www.tidyverse.org/help/#reprex) diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml new file mode 100644 index 0000000..baecdb1 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -0,0 +1,45 @@ +name: Bug report +description: Report something that is broken or incorrect +labels: bug +body: + - type: markdown + attributes: + value: | + Before you post this issue, please check the documentation: + + + + - type: textarea + id: description + attributes: + label: Description of the bug + description: A clear and concise description of what the bug is. + validations: + required: true + + - type: textarea + id: command_used + attributes: + label: Command used and terminal output + description: Steps to reproduce the behaviour. Please paste the code you used + render: console + placeholder: | + $ ccbr_tools ... + + Some output where something broke + + - type: textarea + id: files + attributes: + label: Relevant files + description: | + Please drag and drop any relevant files here. Create a `.zip` archive if the extension is not allowed. + + - type: textarea + id: system + attributes: + label: System information + description: | + - Version of CCBR Tools _(eg. 1.0, 1.8.2)_ + - Python version _(eg. 3.11)_ + - Environment _(eg. local macOS, biowulf HPC)_ diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000..fb0237e --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,4 @@ +contact_links: + - name: Discussions + url: https://github.com/CCBR/Tools/discussions + about: Please ask and answer questions here. diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml new file mode 100644 index 0000000..ab99a5a --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -0,0 +1,11 @@ +name: Feature request +description: Suggest an idea for the tool +labels: enhancement +body: + - type: textarea + id: description + attributes: + label: Description of feature + description: Please describe your suggestion for a new feature. It might help to describe a problem or use case, plus any alternatives that you have considered. + validations: + required: true diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 0000000..6d6e802 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,22 @@ +## Changes + + + +## Issues + + + +## PR Checklist + +(~Strikethrough~ any points that are not applicable.) + +- [ ] This comment contains a description of changes with justifications, with any relevant issues linked. +- [ ] Write unit tests for any new features, bug fixes, or other code changes. +- [ ] Update docs if there are any API changes. +- [ ] Update `CHANGELOG.md` with a short description of any user-facing changes and reference the PR number. Guidelines: https://keepachangelog.com/en/1.1.0/ diff --git a/.github/workflows/auto-format.yml b/.github/workflows/auto-format.yml new file mode 100644 index 0000000..592ae32 --- /dev/null +++ b/.github/workflows/auto-format.yml @@ -0,0 +1,58 @@ +name: auto-format + +on: + workflow_dispatch: + pull_request: + paths: + - "src/**" + - "README.qmd" + +env: + GH_TOKEN: ${{ github.token }} + +jobs: + auto-format: + runs-on: ubuntu-latest + strategy: + fail-fast: false + matrix: + python-version: ["3.11"] + + steps: + - uses: actions/checkout@v4 + if: github.event_name == 'pull_request' + with: + fetch-depth: 0 + ref: ${{ github.event.pull_request.head.ref }} + + - uses: actions/checkout@v4 + if: github.event_name == 'push' + with: + fetch-depth: 0 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + cache: "pip" + - name: Install dependencies + run: | + python -m pip install .[dev] --upgrade pip + - name: format + uses: psf/black@stable + with: + options: "--verbose" + use_pyproject: true + - name: commit & push + run: | + git config --global user.name "github-actions[bot]" + git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com" + git add . + git commit -m "ci: πŸ€– black formatting" && git push || echo "nothing to commit" + - uses: quarto-dev/quarto-actions/setup@v2 + with: + version: 1.4.515 + - name: quarto render readme + run: | + quarto render README.qmd + git add README.md + git commit -m "ci: πŸ€– render readme" && git push || echo "nothing to commit" diff --git a/.github/workflows/black.yml b/.github/workflows/black.yml deleted file mode 100644 index 30c469b..0000000 --- a/.github/workflows/black.yml +++ /dev/null @@ -1,32 +0,0 @@ -name: black - -on: workflow_dispatch - -env: - GH_TOKEN: ${{ github.token }} - -jobs: - black: - runs-on: ubuntu-latest - strategy: - fail-fast: false - matrix: - python-version: ["3.11"] - - steps: - - uses: actions/checkout@v4 - - name: Set up Python ${{ matrix.python-version }} - uses: actions/setup-python@v3 - with: - python-version: ${{ matrix.python-version }} - - name: format - uses: psf/black@stable - with: - options: "--verbose" - use_pyproject: true - - name: commit & push - run: | - git config --global user.name "github-actions[bot]" - git config --global user.email "41898282+github-actions[bot]@users.noreply.github.com" - git add . - git commit -m "ci: πŸ€– black formatting" && git push || echo "nothing to commit" diff --git a/.github/workflows/build-python.yml b/.github/workflows/build-python.yml new file mode 100644 index 0000000..dda3d32 --- /dev/null +++ b/.github/workflows/build-python.yml @@ -0,0 +1,68 @@ +# This workflow will install Python dependencies, run tests and lint with a variety of Python versions +# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python + +name: build + +on: + push: + branches: + - main + - master + pull_request: + branches: + - main + - master + +env: + GH_TOKEN: ${{ github.token }} + +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Set up Python + uses: actions/setup-python@v3 + with: + python-version: 3.11 + - name: Lint + uses: psf/black@stable + with: + options: "--check --verbose" + use_pyproject: true + test: + runs-on: ubuntu-latest + if: always() + strategy: + fail-fast: false + matrix: + python-version: ["3.11"] + + steps: + - uses: actions/checkout@v4 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v5 + with: + python-version: ${{ matrix.python-version }} + cache: "pip" + - name: Install dependencies + run: | + python -m pip install .[dev,test] --upgrade pip + - name: Test + run: | + python -m pytest --cov ccbr_tools + - uses: codecov/codecov-action@v4 + with: + token: ${{ secrets.CODECOV_TOKEN }} + + build-status: # https://github.com/orgs/community/discussions/4324#discussioncomment-3477871 + runs-on: ubuntu-latest + needs: [lint, test] + if: always() + steps: + - name: Successful build + if: ${{ !(contains(needs.*.result, 'failure')) }} + run: exit 0 + - name: Failing build + if: ${{ contains(needs.*.result, 'failure') }} + run: exit 1 diff --git a/.gitignore b/.gitignore index 12d5cc7..fa53c4e 100644 --- a/.gitignore +++ b/.gitignore @@ -3,10 +3,6 @@ .Rapp.history .RData -# Python Byte-compiled / optimized / DLL files -__pycache__/ -*.py[cod] - # OS generated files .DS_Store .DS_Store? @@ -16,3 +12,167 @@ __pycache__/ *~ *.bak **/.kop* + +# python files +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +share/python-wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.nox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +*.py,cover +.hypothesis/ +.pytest_cache/ +cover/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 +db.sqlite3-journal + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +.pybuilder/ +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# IPython +profile_default/ +ipython_config.py + +# pyenv +# For a library or package, you might want to ignore these files since the code is +# intended to run in multiple environments; otherwise, check them in: +# .python-version + +# pipenv +# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. +# However, in case of collaboration, if having platform-specific dependencies or dependencies +# having no cross-platform support, pipenv may install dependencies that don't work, or not +# install all needed dependencies. +#Pipfile.lock + +# poetry +# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. +# This is especially recommended for binary packages to ensure reproducibility, and is more +# commonly ignored for libraries. +# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control +#poetry.lock + +# pdm +# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. +#pdm.lock +# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it +# in version control. +# https://pdm.fming.dev/latest/usage/project/#working-with-version-control +.pdm.toml +.pdm-python +.pdm-build/ + +# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm +__pypackages__/ + +# Celery stuff +celerybeat-schedule +celerybeat.pid + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ +.dmypy.json +dmypy.json + +# Pyre type checker +.pyre/ + +# pytype static type analyzer +.pytype/ + +# Cython debug symbols +cython_debug/ + +# PyCharm +# JetBrains specific template is maintained in a separate JetBrains.gitignore that can +# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore +# and can be added to the global gitignore or merged into this file. For a more nuclear +# option (not recommended) you can uncomment the following to ignore the entire idea folder. +#.idea/ diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index a66140f..464516a 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -17,6 +17,10 @@ repos: rev: 23.7.0 hooks: - id: black + # - repo: https://github.com/numpy/numpydoc + # rev: v1.7.0 + # hooks: + # - id: numpydoc-validation # R formatting - repo: https://github.com/lorenzwalthert/precommit rev: v0.1.2 diff --git a/Biowulf/Box.md b/Biowulf/Box.md deleted file mode 100644 index 8e9598f..0000000 --- a/Biowulf/Box.md +++ /dev/null @@ -1,48 +0,0 @@ -## Setting up Box on helix: - -**Step 1.** _ssh_ into helix and `module load rclone/1.53.1` - -**Step 2**. Follow the instructions listed here until step **7.**: https://www.sussex.ac.uk/its/help/guide.php?id=245 - -**Step 3.** Once you get step **7.** in the guide above (where it mentions _Remote config_), please answer `n` and follow the instructions listed here: https://rclone.org/remote_setup/ - -**Step 4.** Open a new terminal (and keep your shell open on helix, you will need this later). From your local machine, you will need to install rclone. -If you have _brew_ installed on your local machine, you can simply run: -``` -brew install rclone -``` - -**Step 5.** Please ensure you are connected to VPN. If you are not, please connect now! Once rclone has finished installing on your local computer, please run: -``` -rclone authorize "box" -``` - -If your browser doesn't open automatically go to the following link: http://127.0.0.1:53682/auth - -**Step 6.** You maybe asked to login into Box. In your local terminal, there will be a json string, you want to copy the entire json string: - -> _**Paste the following into your remote machine --->**_ -``` -{"access_token":"XYZ","token_type":"bearer","refresh_token":"XYZ","expiry":"XYZ"} -``` -> _**<---End paste**_ - -**Step 7.** Copy everything between the bold, italicized print above, and then _paste it into your terminal on helix_, and then enter `y` - -That's it! Everything should be setup! - -rlcone is a module on Biowulf but just make sure to use the latest version of it (the default module is kind of old). I always use `rclone/1.53.1`. Once it is setup, you can access Box from helix/biowulf from the command line. rclone is really nice and has great documentation! It has a ton of functionality that I haven't explored. - -And then you can check how much space you have available or free by running this command: -``` -$ rclone about Box:/ -Total: 909.495T -Used: 63.867G -Free: 909.432T -``` - -Or list all your files: -``` -$ rclone ls Box:/ -``` - diff --git a/Biowulf/example.snakemake.log b/Biowulf/example.snakemake.log deleted file mode 100644 index 22971d1..0000000 --- a/Biowulf/example.snakemake.log +++ /dev/null @@ -1,3220 +0,0 @@ -Building DAG of jobs... -Pulling singularity image docker://nciccbr/ccbr_clear:latest. -Pulling singularity image docker://nciccbr/ccbr_venn:latest. -Using shell: /usr/bin/bash -Provided cluster nodes: 500 -Job counts: - count jobs - 8 alignment_stats - 1 all - 8 annotate_circRNA - 8 annotate_clear_output - 8 ciri - 8 clear - 8 create_BSJ_bam - 1 create_circexplorer_count_matrix - 1 create_ciri_count_matrix - 8 create_spliced_reads_bam - 8 cutadapt - 8 estimate_duplication - 8 fastqc - 1 merge_SJ_tabs - 1 merge_genecounts - 1 multiqc - 8 split_BAM_create_BW - 8 split_splice_reads_BAM_create_BW - 8 star1p - 8 star2p - 8 venn - 126 - -[Wed Mar 24 17:44:43 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-NT.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz - jobid: 8 - wildcards: sample=G_INPUT-293T-NT - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-NT.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-NT.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz -fi - -Submitted job 8 with external jobid '11285152'. - -[Wed Mar 24 17:44:48 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293T-T.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz - jobid: 2 - wildcards: sample=G_293T-T - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293T-T.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293T-T.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz -fi - -Submitted job 2 with external jobid '11285219'. - -[Wed Mar 24 17:44:50 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-R.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz - jobid: 3 - wildcards: sample=G_iSLK-R - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-R.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-R.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz -fi - -Submitted job 3 with external jobid '11285242'. - -[Wed Mar 24 17:44:53 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-UR.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz - jobid: 5 - wildcards: sample=G_INPUT-iSLK-UR - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-UR.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-UR.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz -fi - -Submitted job 5 with external jobid '11285257'. - -[Wed Mar 24 17:44:56 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-T.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz - jobid: 7 - wildcards: sample=G_INPUT-293T-T - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-T.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-T.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz -fi - -Submitted job 7 with external jobid '11285258'. - -[Wed Mar 24 17:44:57 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-R.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz - jobid: 1 - wildcards: sample=G_INPUT-iSLK-R - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-R.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-R.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz -fi - -Submitted job 1 with external jobid '11285260'. - -[Wed Mar 24 17:44:58 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-UR.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz - jobid: 4 - wildcards: sample=G_iSLK-UR - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-UR.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-UR.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz -fi - -Submitted job 4 with external jobid '11285262'. - -[Wed Mar 24 17:44:59 2021] -rule cutadapt: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293t-NT.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz - jobid: 6 - wildcards: sample=G_293t-NT - threads: 56 - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT;fi -if [ "SE" == "PE" ];then - ## Paired-end - cutadapt --pair-filter=any --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35:35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -B file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293t-NT.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy -else - ## Single-end - cutadapt --nextseq-trim=2 --trim-n -n 5 -O 5 -q 10,10 -m 35 -b file:/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/TruSeq_and_nextera_adapters.consolidated.fa -j 56 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293t-NT.R1.fastq.gz - touch /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz -fi - -Submitted job 6 with external jobid '11285264'. -[Wed Mar 24 18:13:25 2021] -Finished job 8. -1 of 126 steps (0.79%) done - -[Wed Mar 24 18:13:25 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1.SJ.out.tab - jobid: 24 - wildcards: sample=G_INPUT-293T-NT - threads: 56 - - -if [ -d /dev/shm/G_INPUT-293T-NT ];then rm -rf /dev/shm/G_INPUT-293T-NT;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-NT_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-293T-NT --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-NT_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-293T-NT --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1.Aligned.out.bam - -Submitted job 24 with external jobid '11287594'. - -[Wed Mar 24 18:13:26 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out - jobid: 74 - wildcards: sample=G_INPUT-293T-NT - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz > G_INPUT-293T-NT.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz > G_INPUT-293T-NT.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_INPUT-293T-NT.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.log -T 56 -samtools view -@56 -bS G_INPUT-293T-NT.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.bwa.bam -rm -rf G_INPUT-293T-NT.bwa.sam - -Submitted job 74 with external jobid '11287595'. - -[Wed Mar 24 18:13:28 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-NT.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-NT.R1.trim_fastqc.zip - jobid: 16 - wildcards: sample=G_INPUT-293T-NT - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-NT.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 16 with external jobid '11287596'. -[Wed Mar 24 18:27:21 2021] -Finished job 5. -2 of 126 steps (2%) done - -[Wed Mar 24 18:27:21 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-UR.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-UR.R1.trim_fastqc.zip - jobid: 13 - wildcards: sample=G_INPUT-iSLK-UR - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-UR.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 13 with external jobid '11288435'. - -[Wed Mar 24 18:27:22 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out - jobid: 71 - wildcards: sample=G_INPUT-iSLK-UR - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz > G_INPUT-iSLK-UR.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz > G_INPUT-iSLK-UR.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_INPUT-iSLK-UR.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.log -T 56 -samtools view -@56 -bS G_INPUT-iSLK-UR.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.bwa.bam -rm -rf G_INPUT-iSLK-UR.bwa.sam - -Submitted job 71 with external jobid '11288436'. - -[Wed Mar 24 18:27:24 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1.SJ.out.tab - jobid: 21 - wildcards: sample=G_INPUT-iSLK-UR - threads: 56 - - -if [ -d /dev/shm/G_INPUT-iSLK-UR ];then rm -rf /dev/shm/G_INPUT-iSLK-UR;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-UR_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-iSLK-UR --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-UR_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-iSLK-UR --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1.Aligned.out.bam - -Submitted job 21 with external jobid '11288437'. -[Wed Mar 24 18:27:43 2021] -Finished job 7. -3 of 126 steps (2%) done -[Wed Mar 24 18:27:43 2021] -Finished job 16. -4 of 126 steps (3%) done - -[Wed Mar 24 18:27:43 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-T.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-T.R1.trim_fastqc.zip - jobid: 15 - wildcards: sample=G_INPUT-293T-T - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-293T-T.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 15 with external jobid '11288438'. - -[Wed Mar 24 18:27:44 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out - jobid: 73 - wildcards: sample=G_INPUT-293T-T - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz > G_INPUT-293T-T.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz > G_INPUT-293T-T.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_INPUT-293T-T.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.log -T 56 -samtools view -@56 -bS G_INPUT-293T-T.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.bwa.bam -rm -rf G_INPUT-293T-T.bwa.sam - -Submitted job 73 with external jobid '11288439'. - -[Wed Mar 24 18:27:46 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1.SJ.out.tab - jobid: 23 - wildcards: sample=G_INPUT-293T-T - threads: 56 - - -if [ -d /dev/shm/G_INPUT-293T-T ];then rm -rf /dev/shm/G_INPUT-293T-T;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-T_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-293T-T --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-T_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-293T-T --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1.Aligned.out.bam - -Submitted job 23 with external jobid '11288440'. -[Wed Mar 24 18:28:38 2021] -Finished job 4. -5 of 126 steps (4%) done - -[Wed Mar 24 18:28:38 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-UR.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-UR.R1.trim_fastqc.zip - jobid: 12 - wildcards: sample=G_iSLK-UR - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-UR.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 12 with external jobid '11288444'. - -[Wed Mar 24 18:28:39 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1.SJ.out.tab - jobid: 20 - wildcards: sample=G_iSLK-UR - threads: 56 - - -if [ -d /dev/shm/G_iSLK-UR ];then rm -rf /dev/shm/G_iSLK-UR;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-UR_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_iSLK-UR --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-UR_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_iSLK-UR --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1.Aligned.out.bam - -Submitted job 20 with external jobid '11288445'. - -[Wed Mar 24 18:28:41 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out - jobid: 70 - wildcards: sample=G_iSLK-UR - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz > G_iSLK-UR.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz > G_iSLK-UR.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_iSLK-UR.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.log -T 56 -samtools view -@56 -bS G_iSLK-UR.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.bwa.bam -rm -rf G_iSLK-UR.bwa.sam - -Submitted job 70 with external jobid '11288446'. -[Wed Mar 24 18:29:33 2021] -Finished job 2. -6 of 126 steps (5%) done - -[Wed Mar 24 18:29:33 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1.SJ.out.tab - jobid: 18 - wildcards: sample=G_293T-T - threads: 56 - - -if [ -d /dev/shm/G_293T-T ];then rm -rf /dev/shm/G_293T-T;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293T-T_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_293T-T --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293T-T_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_293T-T --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1.Aligned.out.bam - -Submitted job 18 with external jobid '11288669'. - -[Wed Mar 24 18:29:34 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out - jobid: 68 - wildcards: sample=G_293T-T - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz > G_293T-T.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz > G_293T-T.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_293T-T.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.log -T 56 -samtools view -@56 -bS G_293T-T.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.bwa.bam -rm -rf G_293T-T.bwa.sam - -Submitted job 68 with external jobid '11288671'. - -[Wed Mar 24 18:29:36 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293T-T.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293T-T.R1.trim_fastqc.zip - jobid: 10 - wildcards: sample=G_293T-T - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293T-T.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 10 with external jobid '11288672'. -[Wed Mar 24 18:33:58 2021] -Finished job 1. -7 of 126 steps (6%) done - -[Wed Mar 24 18:33:58 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out - jobid: 67 - wildcards: sample=G_INPUT-iSLK-R - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz > G_INPUT-iSLK-R.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz > G_INPUT-iSLK-R.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_INPUT-iSLK-R.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.log -T 56 -samtools view -@56 -bS G_INPUT-iSLK-R.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.bwa.bam -rm -rf G_INPUT-iSLK-R.bwa.sam - -Submitted job 67 with external jobid '11289106'. - -[Wed Mar 24 18:33:59 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-R.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-R.R1.trim_fastqc.zip - jobid: 9 - wildcards: sample=G_INPUT-iSLK-R - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_INPUT-iSLK-R.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 9 with external jobid '11289107'. - -[Wed Mar 24 18:34:00 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1.SJ.out.tab - jobid: 17 - wildcards: sample=G_INPUT-iSLK-R - threads: 56 - - -if [ -d /dev/shm/G_INPUT-iSLK-R ];then rm -rf /dev/shm/G_INPUT-iSLK-R;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-R_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-iSLK-R --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-R_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_INPUT-iSLK-R --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1.Aligned.out.bam - -Submitted job 17 with external jobid '11289108'. -[Wed Mar 24 18:36:55 2021] -Finished job 12. -8 of 126 steps (6%) done -[Wed Mar 24 18:37:49 2021] -Finished job 15. -9 of 126 steps (7%) done -[Wed Mar 24 18:38:11 2021] -Finished job 13. -10 of 126 steps (8%) done -[Wed Mar 24 18:40:34 2021] -Finished job 3. -11 of 126 steps (9%) done - -[Wed Mar 24 18:40:34 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-R.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-R.R1.trim_fastqc.zip - jobid: 11 - wildcards: sample=G_iSLK-R - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_iSLK-R.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -[Wed Mar 24 18:40:35 2021] -Finished job 10. -12 of 126 steps (10%) done -Submitted job 11 with external jobid '11290772'. - -[Wed Mar 24 18:40:35 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1.SJ.out.tab - jobid: 19 - wildcards: sample=G_iSLK-R - threads: 56 - - -if [ -d /dev/shm/G_iSLK-R ];then rm -rf /dev/shm/G_iSLK-R;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-R_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_iSLK-R --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-R_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_iSLK-R --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1.Aligned.out.bam - -Submitted job 19 with external jobid '11290773'. - -[Wed Mar 24 18:40:37 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out - jobid: 69 - wildcards: sample=G_iSLK-R - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz > G_iSLK-R.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz > G_iSLK-R.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_iSLK-R.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.log -T 56 -samtools view -@56 -bS G_iSLK-R.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.bwa.bam -rm -rf G_iSLK-R.bwa.sam - -Submitted job 69 with external jobid '11290775'. -[Wed Mar 24 18:44:25 2021] -Finished job 6. -13 of 126 steps (10%) done - -[Wed Mar 24 18:44:25 2021] -rule star1p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1.SJ.out.tab - jobid: 22 - wildcards: sample=G_293t-NT - threads: 56 - - -if [ -d /dev/shm/G_293t-NT ];then rm -rf /dev/shm/G_293t-NT;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p;fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293t-NT_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_293t-NT --sjdbOverhang $overhang - -else - -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293t-NT_p1. --chimSegmentMin 20 --chimMultimapNmax 10 --chimOutType Junctions --alignTranscriptsPerReadNmax 20000 --outSAMtype None --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --outTmpDir=/dev/shm/G_293t-NT --sjdbOverhang $overhang - -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1._STARgenome -# rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1.Aligned.out.bam - -Submitted job 22 with external jobid '11292002'. - -[Wed Mar 24 18:44:27 2021] -rule ciri: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.bwa.log, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.bwa.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out - jobid: 72 - wildcards: sample=G_293t-NT - threads: 56 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri -if [ "SE" == "PE" ];then - ## paired-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz > G_293t-NT.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.bwa.log -else - ## single-end - bwa mem -t 56 -T 19 /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz > G_293t-NT.bwa.sam 2> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.bwa.log -fi -perl /data/Ziegelbauer_lab/tools/CIRI_v2.0.6/CIRI2.pl -I G_293t-NT.bwa.sam -O /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out -F /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -A /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf -G /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.log -T 56 -samtools view -@56 -bS G_293t-NT.bwa.sam > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.bwa.bam -rm -rf G_293t-NT.bwa.sam - -Submitted job 72 with external jobid '11292003'. - -[Wed Mar 24 18:44:28 2021] -rule fastqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293t-NT.R1.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293t-NT.R1.trim_fastqc.zip - jobid: 14 - wildcards: sample=G_293t-NT - threads: 16 - - -files="" -for f in /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/fastqs/G_293t-NT.R1.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/dummy /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz;do -if [ "$(wc $f|awk '{print $1}')" != "0" ];then -files=$(echo -ne "$files $f") -fi -done -fastqc $files -t 16 -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc - -Submitted job 14 with external jobid '11292004'. -[Wed Mar 24 18:48:17 2021] -Finished job 74. -14 of 126 steps (11%) done -[Wed Mar 24 18:49:46 2021] -Finished job 9. -15 of 126 steps (12%) done -[Wed Mar 24 18:51:13 2021] -Finished job 24. -16 of 126 steps (13%) done -[Wed Mar 24 18:54:43 2021] -Finished job 11. -17 of 126 steps (13%) done -[Wed Mar 24 18:54:53 2021] -Finished job 71. -18 of 126 steps (14%) done -[Wed Mar 24 18:55:59 2021] -Finished job 73. -19 of 126 steps (15%) done -[Wed Mar 24 19:00:02 2021] -Finished job 70. -20 of 126 steps (16%) done -[Wed Mar 24 19:00:46 2021] -Finished job 20. -21 of 126 steps (17%) done -[Wed Mar 24 19:01:08 2021] -Finished job 23. -22 of 126 steps (17%) done -[Wed Mar 24 19:01:19 2021] -Finished job 21. -23 of 126 steps (18%) done -[Wed Mar 24 19:01:19 2021] -Finished job 18. -24 of 126 steps (19%) done -[Wed Mar 24 19:02:40 2021] -Finished job 17. -25 of 126 steps (20%) done -[Wed Mar 24 19:05:20 2021] -Finished job 19. -26 of 126 steps (21%) done -[Wed Mar 24 19:06:00 2021] -Finished job 67. -27 of 126 steps (21%) done -[Wed Mar 24 19:07:50 2021] -Finished job 68. -28 of 126 steps (22%) done -[Wed Mar 24 19:10:31 2021] -Finished job 14. -29 of 126 steps (23%) done -[Wed Mar 24 19:18:51 2021] -Finished job 22. -30 of 126 steps (24%) done - -[Wed Mar 24 19:18:51 2021] -rule merge_SJ_tabs: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1.SJ.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - jobid: 25 - - -cat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1.SJ.out.tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1.SJ.out.tab | python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/apply_junction_filters.py --regions /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions --filter1regions hg38,rRNA,ERCC --filter1_noncanonical True --filter1_unannotated True --filter2_noncanonical False --filter2_unannotated True | cut -f1-4 | sort -k1,1 -k2,2n | uniq > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - -Submitted job 25 with external jobid '11294722'. -[Wed Mar 24 19:19:31 2021] -Finished job 25. -31 of 126 steps (25%) done - -[Wed Mar 24 19:19:31 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.ReadsPerGene.out.tab - jobid: 29 - wildcards: sample=G_iSLK-UR - threads: 56 - - -if [ -d /dev/shm/G_iSLK-UR ];then rm -rf /dev/shm/G_iSLK-UR;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-UR_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_iSLK-UR --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-UR_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_iSLK-UR --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 29 with external jobid '11295006'. - -[Wed Mar 24 19:19:33 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.ReadsPerGene.out.tab - jobid: 33 - wildcards: sample=G_INPUT-293T-NT - threads: 56 - - -if [ -d /dev/shm/G_INPUT-293T-NT ];then rm -rf /dev/shm/G_INPUT-293T-NT;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-NT_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-293T-NT --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-NT_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-293T-NT --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 33 with external jobid '11295008'. - -[Wed Mar 24 19:19:34 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.ReadsPerGene.out.tab - jobid: 26 - wildcards: sample=G_INPUT-iSLK-R - threads: 56 - - -if [ -d /dev/shm/G_INPUT-iSLK-R ];then rm -rf /dev/shm/G_INPUT-iSLK-R;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-R_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-iSLK-R --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-R_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-iSLK-R --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 26 with external jobid '11295010'. - -[Wed Mar 24 19:19:35 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.ReadsPerGene.out.tab - jobid: 30 - wildcards: sample=G_INPUT-iSLK-UR - threads: 56 - - -if [ -d /dev/shm/G_INPUT-iSLK-UR ];then rm -rf /dev/shm/G_INPUT-iSLK-UR;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-UR_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-iSLK-UR --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-iSLK-UR_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-iSLK-UR --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 30 with external jobid '11295012'. - -[Wed Mar 24 19:19:36 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.ReadsPerGene.out.tab - jobid: 27 - wildcards: sample=G_293T-T - threads: 56 - - -if [ -d /dev/shm/G_293T-T ];then rm -rf /dev/shm/G_293T-T;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293T-T_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_293T-T --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293T-T_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_293T-T --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 27 with external jobid '11295014'. - -[Wed Mar 24 19:19:38 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.ReadsPerGene.out.tab - jobid: 31 - wildcards: sample=G_293t-NT - threads: 56 - - -if [ -d /dev/shm/G_293t-NT ];then rm -rf /dev/shm/G_293t-NT;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293t-NT_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_293t-NT --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_293t-NT_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_293t-NT --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 31 with external jobid '11295015'. - -[Wed Mar 24 19:19:39 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.ReadsPerGene.out.tab - jobid: 28 - wildcards: sample=G_iSLK-R - threads: 56 - - -if [ -d /dev/shm/G_iSLK-R ];then rm -rf /dev/shm/G_iSLK-R;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-R_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_iSLK-R --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_iSLK-R_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_iSLK-R --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 28 with external jobid '11295016'. - -[Wed Mar 24 19:19:40 2021] -rule star2p: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.ReadsPerGene.out.tab - jobid: 32 - wildcards: sample=G_INPUT-293T-T - threads: 56 - - -if [ -d /dev/shm/G_INPUT-293T-T ];then rm -rf /dev/shm/G_INPUT-293T-T;fi -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p;fi -limitSjdbInsertNsj=$(wc -l /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab|awk '{print $1+1}') -if [ "$limitSjdbInsertNsj" -lt "400000" ];then limitSjdbInsertNsj="400000";fi -if [ "SE" == "PE" ];then -# paired-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-T_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-293T-T --sjdbOverhang $overhang - -else -#single-end - overhang=$(zcat /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz | awk -v maxlen=100 'NR%4==2 {if (length($1) > maxlen+0) maxlen=length($1)}; END {print maxlen-1}') - echo "sjdbOverhang for STAR: ${overhang}" - cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p - STAR --genomeDir /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/STAR_index_no_GTF_2.7.6a --outSAMstrandField None --outFilterType BySJout --outFilterMultimapNmax 20 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.3 --alignIntronMin 20 --alignIntronMax 2000000 --alignMatesGapMax 2000000 --readFilesIn /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz --readFilesCommand zcat --runThreadN 56 --outFileNamePrefix G_INPUT-293T-T_p2. --sjdbFileChrStartEnd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab --chimSegmentMin 20 --chimOutType Junctions WithinBAM --chimMultimapNmax 10 --limitSjdbInsertNsj $limitSjdbInsertNsj --alignTranscriptsPerReadNmax 20000 --outSAMtype BAM SortedByCoordinate --alignEndsProtrude 10 ConcordantPair --outFilterIntronMotifs None --sjdbGTFfile /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.gtf --quantMode GeneCounts --outTmpDir=/dev/shm/G_INPUT-293T-T --sjdbOverhang $overhang -fi -rm -rf /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2._STARgenome -## ensure the star2p file is indexed ... is should already be sorted by STAR -if [ ! -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam.bai ];then -sambamba index /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam -fi - -Submitted job 32 with external jobid '11295017'. -[Wed Mar 24 19:21:08 2021] -Finished job 69. -32 of 126 steps (25%) done -[Wed Mar 24 19:28:20 2021] -Finished job 72. -33 of 126 steps (26%) done - -[Wed Mar 24 19:28:20 2021] -rule create_ciri_count_matrix: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/ciri_count_matrix.txt - jobid: 75 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/Create_ciri_count_matrix.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt - -Submitted job 75 with external jobid '11296218'. -[Wed Mar 24 19:29:00 2021] -Finished job 75. -34 of 126 steps (27%) done -[Wed Mar 24 19:33:10 2021] -Finished job 29. -35 of 126 steps (28%) done - -[Wed Mar 24 19:33:10 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam - jobid: 121 - wildcards: sample=G_iSLK-UR - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_iSLK-UR.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam /dev/shm/G_iSLK-UR.SR.bam -rm -f /dev/shm/G_iSLK-UR.SR.bam* - -Submitted job 121 with external jobid '11296731'. - -[Wed Mar 24 19:33:12 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam - jobid: 46 - wildcards: sample=G_iSLK-UR - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.readids | sort | uniq > /dev/shm/G_iSLK-UR.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_iSLK-UR.chimeric.bam --readids /dev/shm/G_iSLK-UR.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_iSLK-UR.chimeric.sorted.bam /dev/shm/G_iSLK-UR.chimeric.bam -rm -f /dev/shm/G_iSLK-UR.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_iSLK-UR.chimeric.sorted.bam --outputBAM /dev/shm/G_iSLK-UR.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_iSLK-UR.BSJs.tmp.sorted.bam /dev/shm/G_iSLK-UR.BSJs.tmp.bam -rm -f /dev/shm/G_iSLK-UR.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_iSLK-UR.BSJs.tmp.sorted.bam > /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_iSLK-UR.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.sam > /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_iSLK-UR.BSJs.tmp.dedup.bam* - -Submitted job 46 with external jobid '11296732'. - -[Wed Mar 24 19:33:13 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt - jobid: 62 - wildcards: sample=G_iSLK-UR - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction > G_iSLK-UR_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt) --low-confidence - -Submitted job 62 with external jobid '11296733'. - -[Wed Mar 24 19:33:14 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-UR.MarkDuplicates.metrics.txt - jobid: 38 - wildcards: sample=G_iSLK-UR - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_iSLK-UR.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-UR.MarkDuplicates.metrics.txt - -Submitted job 38 with external jobid '11296734'. -[Wed Mar 24 19:33:53 2021] -Finished job 30. -36 of 126 steps (29%) done -[Wed Mar 24 19:33:53 2021] -Finished job 32. -37 of 126 steps (29%) done - -[Wed Mar 24 19:33:53 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-UR.MarkDuplicates.metrics.txt - jobid: 39 - wildcards: sample=G_INPUT-iSLK-UR - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_INPUT-iSLK-UR.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-UR.MarkDuplicates.metrics.txt - -Submitted job 39 with external jobid '11296736'. - -[Wed Mar 24 19:33:55 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam - jobid: 47 - wildcards: sample=G_INPUT-iSLK-UR - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.readids | sort | uniq > /dev/shm/G_INPUT-iSLK-UR.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_INPUT-iSLK-UR.chimeric.bam --readids /dev/shm/G_INPUT-iSLK-UR.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-iSLK-UR.chimeric.sorted.bam /dev/shm/G_INPUT-iSLK-UR.chimeric.bam -rm -f /dev/shm/G_INPUT-iSLK-UR.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_INPUT-iSLK-UR.chimeric.sorted.bam --outputBAM /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.sorted.bam /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.bam -rm -f /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.sorted.bam > /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.sam > /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_INPUT-iSLK-UR.BSJs.tmp.dedup.bam* - -Submitted job 47 with external jobid '11296737'. - -[Wed Mar 24 19:33:56 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam - jobid: 122 - wildcards: sample=G_INPUT-iSLK-UR - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_INPUT-iSLK-UR.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam /dev/shm/G_INPUT-iSLK-UR.SR.bam -rm -f /dev/shm/G_INPUT-iSLK-UR.SR.bam* - -Submitted job 122 with external jobid '11296738'. - -[Wed Mar 24 19:33:57 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt - jobid: 63 - wildcards: sample=G_INPUT-iSLK-UR - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction > G_INPUT-iSLK-UR_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt) --low-confidence - -Submitted job 63 with external jobid '11296739'. - -[Wed Mar 24 19:33:59 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt - jobid: 65 - wildcards: sample=G_INPUT-293T-T - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction > G_INPUT-293T-T_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt) --low-confidence - -Submitted job 65 with external jobid '11296740'. - -[Wed Mar 24 19:34:00 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam - jobid: 124 - wildcards: sample=G_INPUT-293T-T - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_INPUT-293T-T.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam /dev/shm/G_INPUT-293T-T.SR.bam -rm -f /dev/shm/G_INPUT-293T-T.SR.bam* - -Submitted job 124 with external jobid '11296741'. - -[Wed Mar 24 19:34:02 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-T.MarkDuplicates.metrics.txt - jobid: 41 - wildcards: sample=G_INPUT-293T-T - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_INPUT-293T-T.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-T.MarkDuplicates.metrics.txt - -Submitted job 41 with external jobid '11296742'. - -[Wed Mar 24 19:34:03 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam - jobid: 49 - wildcards: sample=G_INPUT-293T-T - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.readids | sort | uniq > /dev/shm/G_INPUT-293T-T.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_INPUT-293T-T.chimeric.bam --readids /dev/shm/G_INPUT-293T-T.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-293T-T.chimeric.sorted.bam /dev/shm/G_INPUT-293T-T.chimeric.bam -rm -f /dev/shm/G_INPUT-293T-T.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_INPUT-293T-T.chimeric.sorted.bam --outputBAM /dev/shm/G_INPUT-293T-T.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-293T-T.BSJs.tmp.sorted.bam /dev/shm/G_INPUT-293T-T.BSJs.tmp.bam -rm -f /dev/shm/G_INPUT-293T-T.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_INPUT-293T-T.BSJs.tmp.sorted.bam > /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_INPUT-293T-T.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.sam > /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_INPUT-293T-T.BSJs.tmp.dedup.bam* - -Submitted job 49 with external jobid '11296743'. -[Wed Mar 24 19:34:48 2021] -Finished job 27. -38 of 126 steps (30%) done - -[Wed Mar 24 19:34:48 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293T-T.MarkDuplicates.metrics.txt - jobid: 36 - wildcards: sample=G_293T-T - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_293T-T.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293T-T.MarkDuplicates.metrics.txt - -Submitted job 36 with external jobid '11296744'. - -[Wed Mar 24 19:34:50 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam - jobid: 44 - wildcards: sample=G_293T-T - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.readids | sort | uniq > /dev/shm/G_293T-T.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_293T-T.chimeric.bam --readids /dev/shm/G_293T-T.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_293T-T.chimeric.sorted.bam /dev/shm/G_293T-T.chimeric.bam -rm -f /dev/shm/G_293T-T.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_293T-T.chimeric.sorted.bam --outputBAM /dev/shm/G_293T-T.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_293T-T.BSJs.tmp.sorted.bam /dev/shm/G_293T-T.BSJs.tmp.bam -rm -f /dev/shm/G_293T-T.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_293T-T.BSJs.tmp.sorted.bam > /dev/shm/G_293T-T.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_293T-T.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_293T-T.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_293T-T.BSJs.tmp.dedup.sam > /dev/shm/G_293T-T.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam /dev/shm/G_293T-T.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_293T-T.BSJs.tmp.dedup.bam* - -Submitted job 44 with external jobid '11296745'. - -[Wed Mar 24 19:34:51 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam - jobid: 119 - wildcards: sample=G_293T-T - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_293T-T.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam /dev/shm/G_293T-T.SR.bam -rm -f /dev/shm/G_293T-T.SR.bam* - -Submitted job 119 with external jobid '11296746'. - -[Wed Mar 24 19:34:52 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt - jobid: 60 - wildcards: sample=G_293T-T - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction > G_293T-T_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt) --low-confidence - -Submitted job 60 with external jobid '11296747'. -[Wed Mar 24 19:35:24 2021] -Finished job 63. -39 of 126 steps (31%) done -[Wed Mar 24 19:35:24 2021] -Finished job 65. -40 of 126 steps (32%) done - -[Wed Mar 24 19:35:24 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.common.lst - jobid: 113 - wildcards: sample=G_INPUT-iSLK-UR - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out|grep -v circRNA_ID > /dev/shm/G_INPUT-iSLK-UR.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_INPUT-iSLK-UR.circExplorer.lst -2set_venn.R -l /dev/shm/G_INPUT-iSLK-UR.ciri.lst -r /dev/shm/G_INPUT-iSLK-UR.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.common.lst -t G_INPUT-iSLK-UR - -[Wed Mar 24 19:35:24 2021] -Finished job 60. -41 of 126 steps (33%) done -Submitted job 113 with external jobid '11296962'. - -[Wed Mar 24 19:35:26 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt - jobid: 101 - wildcards: sample=G_INPUT-iSLK-UR - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt - -Submitted job 101 with external jobid '11296963'. - -[Wed Mar 24 19:35:28 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt - jobid: 95 - wildcards: sample=G_293T-T - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt - -Submitted job 95 with external jobid '11296965'. - -[Wed Mar 24 19:35:29 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.common.lst - jobid: 110 - wildcards: sample=G_293T-T - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out|grep -v circRNA_ID > /dev/shm/G_293T-T.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_293T-T.circExplorer.lst -2set_venn.R -l /dev/shm/G_293T-T.ciri.lst -r /dev/shm/G_293T-T.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.common.lst -t G_293T-T - -Submitted job 110 with external jobid '11296967'. - -[Wed Mar 24 19:35:30 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.common.lst - jobid: 115 - wildcards: sample=G_INPUT-293T-T - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out|grep -v circRNA_ID > /dev/shm/G_INPUT-293T-T.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_INPUT-293T-T.circExplorer.lst -2set_venn.R -l /dev/shm/G_INPUT-293T-T.ciri.lst -r /dev/shm/G_INPUT-293T-T.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.common.lst -t G_INPUT-293T-T - -Submitted job 115 with external jobid '11296969'. - -[Wed Mar 24 19:35:32 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt - jobid: 105 - wildcards: sample=G_INPUT-293T-T - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt - -Submitted job 105 with external jobid '11296971'. -[Wed Mar 24 19:35:35 2021] -Finished job 62. -42 of 126 steps (33%) done - -[Wed Mar 24 19:35:35 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt - jobid: 99 - wildcards: sample=G_iSLK-UR - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt - -Submitted job 99 with external jobid '11296972'. - -[Wed Mar 24 19:35:37 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.common.lst - jobid: 112 - wildcards: sample=G_iSLK-UR - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out|grep -v circRNA_ID > /dev/shm/G_iSLK-UR.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_iSLK-UR.circExplorer.lst -2set_venn.R -l /dev/shm/G_iSLK-UR.ciri.lst -r /dev/shm/G_iSLK-UR.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.common.lst -t G_iSLK-UR - -Submitted job 112 with external jobid '11296973'. -[Wed Mar 24 19:35:47 2021] -Finished job 26. -43 of 126 steps (34%) done - -[Wed Mar 24 19:35:47 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam - jobid: 43 - wildcards: sample=G_INPUT-iSLK-R - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.readids | sort | uniq > /dev/shm/G_INPUT-iSLK-R.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_INPUT-iSLK-R.chimeric.bam --readids /dev/shm/G_INPUT-iSLK-R.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-iSLK-R.chimeric.sorted.bam /dev/shm/G_INPUT-iSLK-R.chimeric.bam -rm -f /dev/shm/G_INPUT-iSLK-R.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_INPUT-iSLK-R.chimeric.sorted.bam --outputBAM /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-iSLK-R.BSJs.tmp.sorted.bam /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.bam -rm -f /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.sorted.bam > /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.sam > /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_INPUT-iSLK-R.BSJs.tmp.dedup.bam* - -Submitted job 43 with external jobid '11296974'. - -[Wed Mar 24 19:35:49 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam - jobid: 118 - wildcards: sample=G_INPUT-iSLK-R - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_INPUT-iSLK-R.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam /dev/shm/G_INPUT-iSLK-R.SR.bam -rm -f /dev/shm/G_INPUT-iSLK-R.SR.bam* - -Submitted job 118 with external jobid '11296975'. - -[Wed Mar 24 19:35:50 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt - jobid: 59 - wildcards: sample=G_INPUT-iSLK-R - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction > G_INPUT-iSLK-R_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt) --low-confidence - -Submitted job 59 with external jobid '11296976'. - -[Wed Mar 24 19:35:51 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-R.MarkDuplicates.metrics.txt - jobid: 35 - wildcards: sample=G_INPUT-iSLK-R - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_INPUT-iSLK-R.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-R.MarkDuplicates.metrics.txt - -Submitted job 35 with external jobid '11296977'. -[Wed Mar 24 19:36:35 2021] -Finished job 31. -44 of 126 steps (35%) done - -[Wed Mar 24 19:36:35 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt - jobid: 64 - wildcards: sample=G_293t-NT - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction > G_293t-NT_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt) --low-confidence - -Submitted job 64 with external jobid '11296978'. - -[Wed Mar 24 19:36:37 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam - jobid: 123 - wildcards: sample=G_293t-NT - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_293t-NT.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam /dev/shm/G_293t-NT.SR.bam -rm -f /dev/shm/G_293t-NT.SR.bam* - -Submitted job 123 with external jobid '11296979'. - -[Wed Mar 24 19:36:38 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293t-NT.MarkDuplicates.metrics.txt - jobid: 40 - wildcards: sample=G_293t-NT - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_293t-NT.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293t-NT.MarkDuplicates.metrics.txt - -Submitted job 40 with external jobid '11296980'. - -[Wed Mar 24 19:36:39 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam - jobid: 48 - wildcards: sample=G_293t-NT - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.readids | sort | uniq > /dev/shm/G_293t-NT.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_293t-NT.chimeric.bam --readids /dev/shm/G_293t-NT.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_293t-NT.chimeric.sorted.bam /dev/shm/G_293t-NT.chimeric.bam -rm -f /dev/shm/G_293t-NT.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_293t-NT.chimeric.sorted.bam --outputBAM /dev/shm/G_293t-NT.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_293t-NT.BSJs.tmp.sorted.bam /dev/shm/G_293t-NT.BSJs.tmp.bam -rm -f /dev/shm/G_293t-NT.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_293t-NT.BSJs.tmp.sorted.bam > /dev/shm/G_293t-NT.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_293t-NT.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_293t-NT.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_293t-NT.BSJs.tmp.dedup.sam > /dev/shm/G_293t-NT.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam /dev/shm/G_293t-NT.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_293t-NT.BSJs.tmp.dedup.bam* - -Submitted job 48 with external jobid '11296981'. -[Wed Mar 24 19:38:19 2021] -Finished job 28. -45 of 126 steps (36%) done - -[Wed Mar 24 19:38:19 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam - jobid: 45 - wildcards: sample=G_iSLK-R - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.readids | sort | uniq > /dev/shm/G_iSLK-R.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_iSLK-R.chimeric.bam --readids /dev/shm/G_iSLK-R.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_iSLK-R.chimeric.sorted.bam /dev/shm/G_iSLK-R.chimeric.bam -rm -f /dev/shm/G_iSLK-R.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_iSLK-R.chimeric.sorted.bam --outputBAM /dev/shm/G_iSLK-R.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_iSLK-R.BSJs.tmp.sorted.bam /dev/shm/G_iSLK-R.BSJs.tmp.bam -rm -f /dev/shm/G_iSLK-R.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_iSLK-R.BSJs.tmp.sorted.bam > /dev/shm/G_iSLK-R.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_iSLK-R.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_iSLK-R.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_iSLK-R.BSJs.tmp.dedup.sam > /dev/shm/G_iSLK-R.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam /dev/shm/G_iSLK-R.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_iSLK-R.BSJs.tmp.dedup.bam* - -Submitted job 45 with external jobid '11296991'. - -[Wed Mar 24 19:38:20 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt - jobid: 61 - wildcards: sample=G_iSLK-R - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction > G_iSLK-R_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt) --low-confidence - -Submitted job 61 with external jobid '11296992'. - -[Wed Mar 24 19:38:21 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam - jobid: 120 - wildcards: sample=G_iSLK-R - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_iSLK-R.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam /dev/shm/G_iSLK-R.SR.bam -rm -f /dev/shm/G_iSLK-R.SR.bam* - -Submitted job 120 with external jobid '11296993'. - -[Wed Mar 24 19:38:23 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-R.MarkDuplicates.metrics.txt - jobid: 37 - wildcards: sample=G_iSLK-R - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_iSLK-R.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-R.MarkDuplicates.metrics.txt - -Submitted job 37 with external jobid '11296994'. -[Wed Mar 24 19:38:32 2021] -Finished job 33. -46 of 126 steps (37%) done - -[Wed Mar 24 19:38:32 2021] -localrule merge_genecounts: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.ReadsPerGene.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/unstranded_STAR_GeneCounts.tsv, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/stranded_STAR_GeneCounts.tsv, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/revstranded_STAR_GeneCounts.tsv - jobid: 34 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results -Rscript /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/merge_ReadsPerGene_counts.R - -Activating environment modules: R/4.0.3 - -[Wed Mar 24 19:38:32 2021] -rule create_spliced_reads_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam - jobid: 125 - wildcards: sample=G_INPUT-293T-NT - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_splice_reads.py --inbam /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam --outbam /dev/shm/G_INPUT-293T-NT.SR.bam --tab /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam /dev/shm/G_INPUT-293T-NT.SR.bam -rm -f /dev/shm/G_INPUT-293T-NT.SR.bam* - -[-] Unloading singularity 3.7.2 on cn4487 -[-] Unloading snakemake 5.24.1 -[-] Unloading python 3.7 ... -Submitted job 125 with external jobid '11296995'. - -[Wed Mar 24 19:38:33 2021] -rule estimate_duplication: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-NT.MarkDuplicates.metrics.txt - jobid: 42 - wildcards: sample=G_INPUT-293T-NT - - -java -Xmx100G -jar ${PICARD_JARPATH}/picard.jar MarkDuplicates I=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam O=/dev/shm/G_INPUT-293T-NT.mark_dup.bam M=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-NT.MarkDuplicates.metrics.txt - -Submitted job 42 with external jobid '11296996'. - -[Wed Mar 24 19:38:34 2021] -rule create_BSJ_bam: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.readids, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam - jobid: 50 - wildcards: sample=G_INPUT-293T-NT - threads: 4 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p - -## get BSJ readids along with chrom,site,cigar etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/junctions2readids.py -j /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.readids - -## extract only the uniq readids -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.readids | sort | uniq > /dev/shm/G_INPUT-293T-NT.readids - -## downsize the star2p bam file to a new bam file with only BSJ reads ... these may still contain alignments which are chimeric but not BSJ -## note the argument --readids here is just a list of readids -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_by_readids.py --inputBAM /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam --outputBAM /dev/shm/G_INPUT-293T-NT.chimeric.bam --readids /dev/shm/G_INPUT-293T-NT.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-293T-NT.chimeric.sorted.bam /dev/shm/G_INPUT-293T-NT.chimeric.bam -rm -f /dev/shm/G_INPUT-293T-NT.chimeric.bam* - -## using the downsized star2p bam file containing chimeric alignments ...included all the BSJs... we now extract only the BSJs -## note the argument --readids here is a tab delimited file created by junctions2readids.py ... reaids,chrom,strand,sites,cigars,etc. -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/filter_bam_for_BSJs.py --inputBAM /dev/shm/G_INPUT-293T-NT.chimeric.sorted.bam --outputBAM /dev/shm/G_INPUT-293T-NT.BSJs.tmp.bam --readids /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.readids -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/dev/shm/G_INPUT-293T-NT.BSJs.tmp.sorted.bam /dev/shm/G_INPUT-293T-NT.BSJs.tmp.bam -rm -f /dev/shm/G_INPUT-293T-NT.BSJs.tmp.bam* - -## some alignments are repeated/duplicated in the output for some reason ... hence deduplicating -samtools view -H /dev/shm/G_INPUT-293T-NT.BSJs.tmp.sorted.bam > /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.sam -samtools view /dev/shm/G_INPUT-293T-NT.BSJs.tmp.sorted.bam | sort | uniq >> /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.sam -samtools view -bS /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.sam > /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=4 --out=/gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.bam -rm -f /dev/shm/G_INPUT-293T-NT.BSJs.tmp.dedup.bam* - -[+] Loading gcc 9.2.0 ... -[+] Loading GSL 2.6 for GCC 9.2.0 ... -[-] Unloading gcc 9.2.0 ... -[+] Loading gcc 9.2.0 ... -[+] Loading openmpi 3.1.4 for GCC 9.2.0 -[+] Loading ImageMagick 7.0.8 on cn4487 -[+] Loading HDF5 1.10.4 -[-] Unloading gcc 9.2.0 ... -[+] Loading gcc 9.2.0 ... -[+] Loading NetCDF 4.7.4_gcc9.2.0 -[+] Loading pandoc 2.13 on cn4487 -[+] Loading pcre2 10.21 ... -[+] Loading R 4.0.3 -[INFO] Please allocate lscratch for batch/sinteractive R jobs -Submitted job 50 with external jobid '11296998'. - -[Wed Mar 24 19:38:36 2021] -rule annotate_circRNA: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.back_spliced_junction.bed, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt - jobid: 66 - wildcards: sample=G_INPUT-293T-NT - - -if [ ! -d /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer ];then mkdir /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer;fi -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction.original -grep -v junction_type /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction.original > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction -CIRCexplorer2 parse -t STAR /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction > G_INPUT-293T-NT_circexplorer_parse.log 2>&1 -mv back_spliced_junction.bed /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.back_spliced_junction.bed -mv /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction.original /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction -CIRCexplorer2 annotate -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -g /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.back_spliced_junction.bed -o $(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt) --low-confidence - -Submitted job 66 with external jobid '11296999'. -[Wed Mar 24 19:38:55 2021] -Finished job 34. -47 of 126 steps (37%) done -[Wed Mar 24 19:41:21 2021] -Finished job 46. -48 of 126 steps (38%) done - -[Wed Mar 24 19:41:21 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.hg38.bw - jobid: 80 - wildcards: sample=G_iSLK-UR - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 80 with external jobid '11297003'. -[Wed Mar 24 19:42:13 2021] -Finished job 49. -49 of 126 steps (39%) done - -[Wed Mar 24 19:42:13 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.hg38.bw - jobid: 83 - wildcards: sample=G_INPUT-293T-T - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 83 with external jobid '11297008'. -[Wed Mar 24 19:42:39 2021] -Finished job 121. -50 of 126 steps (40%) done - -[Wed Mar 24 19:42:39 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/alignmentstats.txt - jobid: 54 - wildcards: sample=G_iSLK-UR - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/tmp4 - -Submitted job 54 with external jobid '11297010'. - -[Wed Mar 24 19:42:40 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.hg38.bw - jobid: 88 - wildcards: sample=G_iSLK-UR - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 88 with external jobid '11297011'. -[Wed Mar 24 19:42:53 2021] -Finished job 44. -51 of 126 steps (40%) done - -[Wed Mar 24 19:42:53 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.hg38.bw - jobid: 78 - wildcards: sample=G_293T-T - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 78 with external jobid '11297012'. -[Wed Mar 24 19:43:45 2021] -Finished job 47. -52 of 126 steps (41%) done - -[Wed Mar 24 19:43:45 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.hg38.bw - jobid: 81 - wildcards: sample=G_INPUT-iSLK-UR - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 81 with external jobid '11297013'. -[Wed Mar 24 19:51:21 2021] -Finished job 113. -53 of 126 steps (42%) done -[Wed Mar 24 19:51:22 2021] -Finished job 110. -54 of 126 steps (43%) done -[Wed Mar 24 19:51:22 2021] -Finished job 115. -55 of 126 steps (44%) done -[Wed Mar 24 19:51:22 2021] -Finished job 112. -56 of 126 steps (44%) done -[Wed Mar 24 19:51:22 2021] -Finished job 59. -57 of 126 steps (45%) done - -[Wed Mar 24 19:51:22 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.common.lst - jobid: 109 - wildcards: sample=G_INPUT-iSLK-R - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out|grep -v circRNA_ID > /dev/shm/G_INPUT-iSLK-R.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_INPUT-iSLK-R.circExplorer.lst -2set_venn.R -l /dev/shm/G_INPUT-iSLK-R.ciri.lst -r /dev/shm/G_INPUT-iSLK-R.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.common.lst -t G_INPUT-iSLK-R - -[Wed Mar 24 19:51:23 2021] -Finished job 66. -58 of 126 steps (46%) done -Submitted job 109 with external jobid '11297295'. - -[Wed Mar 24 19:51:25 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt - jobid: 93 - wildcards: sample=G_INPUT-iSLK-R - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt - -Submitted job 93 with external jobid '11297338'. - -[Wed Mar 24 19:51:29 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.common.lst - jobid: 116 - wildcards: sample=G_INPUT-293T-NT - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out|grep -v circRNA_ID > /dev/shm/G_INPUT-293T-NT.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_INPUT-293T-NT.circExplorer.lst -2set_venn.R -l /dev/shm/G_INPUT-293T-NT.ciri.lst -r /dev/shm/G_INPUT-293T-NT.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.common.lst -t G_INPUT-293T-NT - -Submitted job 116 with external jobid '11297342'. - -[Wed Mar 24 19:51:31 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt - jobid: 107 - wildcards: sample=G_INPUT-293T-NT - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt - -Submitted job 107 with external jobid '11297344'. -[Wed Mar 24 19:51:33 2021] -Finished job 64. -59 of 126 steps (47%) done - -[Wed Mar 24 19:51:33 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.common.lst - jobid: 114 - wildcards: sample=G_293t-NT - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out|grep -v circRNA_ID > /dev/shm/G_293t-NT.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_293t-NT.circExplorer.lst -2set_venn.R -l /dev/shm/G_293t-NT.ciri.lst -r /dev/shm/G_293t-NT.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.common.lst -t G_293t-NT - -Submitted job 114 with external jobid '11297346'. - -[Wed Mar 24 19:51:34 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt - jobid: 103 - wildcards: sample=G_293t-NT - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt - -[Wed Mar 24 19:51:35 2021] -Finished job 61. -60 of 126 steps (48%) done -Submitted job 103 with external jobid '11297347'. - -[Wed Mar 24 19:51:36 2021] -rule clear: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt - jobid: 97 - wildcards: sample=G_iSLK-R - threads: 4 - - -circ_quant -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt -b /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam -t -r /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.genes.genepred_w_geneid -o /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt - -Submitted job 97 with external jobid '11297348'. - -[Wed Mar 24 19:51:37 2021] -rule venn: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.common.lst - jobid: 111 - wildcards: sample=G_iSLK-R - threads: 2 - - -cut -f1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out|grep -v circRNA_ID > /dev/shm/G_iSLK-R.ciri.lst -cut -f1-3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt|awk -F"\t" '{print $1":"$2+1"|"$3}' > /dev/shm/G_iSLK-R.circExplorer.lst -2set_venn.R -l /dev/shm/G_iSLK-R.ciri.lst -r /dev/shm/G_iSLK-R.circExplorer.lst -p /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.venn_mqc.png -m /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.cirionly.lst -s /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.circexploreronly.lst -c1 "CIRI2" -c2 "CircExplorer2" -c /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.common.lst -t G_iSLK-R - -Submitted job 111 with external jobid '11297349'. - -[Wed Mar 24 19:51:38 2021] -rule create_circexplorer_count_matrix: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/circExplorer_count_matrix.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/circExplorer_BSJ_count_matrix.txt - jobid: 76 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/Create_circExplorer_count_matrix.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/Create_circExplorer_BSJ_count_matrix.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt - -Submitted job 76 with external jobid '11297350'. -[Wed Mar 24 19:53:43 2021] -Finished job 114. -61 of 126 steps (48%) done -[Wed Mar 24 19:53:46 2021] -Finished job 109. -62 of 126 steps (49%) done -[Wed Mar 24 19:53:46 2021] -Finished job 116. -63 of 126 steps (50%) done -[Wed Mar 24 19:53:46 2021] -Finished job 111. -64 of 126 steps (51%) done -[Wed Mar 24 19:53:46 2021] -Finished job 76. -65 of 126 steps (52%) done -[Wed Mar 24 19:55:41 2021] -Finished job 122. -66 of 126 steps (52%) done - -[Wed Mar 24 19:55:41 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/alignmentstats.txt - jobid: 55 - wildcards: sample=G_INPUT-iSLK-UR - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/tmp4 - -[Wed Mar 24 19:55:42 2021] -Finished job 118. -67 of 126 steps (53%) done -Submitted job 55 with external jobid '11297568'. - -[Wed Mar 24 19:55:43 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.hg38.bw - jobid: 89 - wildcards: sample=G_INPUT-iSLK-UR - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 89 with external jobid '11297570'. - -[Wed Mar 24 19:55:44 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.hg38.bw - jobid: 85 - wildcards: sample=G_INPUT-iSLK-R - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 85 with external jobid '11297571'. -[Wed Mar 24 19:56:19 2021] -Finished job 80. -68 of 126 steps (54%) done -[Wed Mar 24 19:56:33 2021] -Finished job 124. -69 of 126 steps (55%) done - -[Wed Mar 24 19:56:33 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/alignmentstats.txt - jobid: 57 - wildcards: sample=G_INPUT-293T-T - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/tmp4 - -Submitted job 57 with external jobid '11297578'. - -[Wed Mar 24 19:56:35 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.hg38.bw - jobid: 91 - wildcards: sample=G_INPUT-293T-T - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 91 with external jobid '11297580'. -[Wed Mar 24 19:56:58 2021] -Finished job 81. -70 of 126 steps (56%) done -[Wed Mar 24 19:56:59 2021] -Finished job 99. -71 of 126 steps (56%) done - -[Wed Mar 24 19:56:59 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt.annotated - jobid: 100 - wildcards: sample=G_iSLK-UR - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt - -[Wed Mar 24 19:57:02 2021] -Finished job 100. -72 of 126 steps (57%) done -[Wed Mar 24 19:57:24 2021] -Finished job 78. -73 of 126 steps (58%) done -[Wed Mar 24 19:57:24 2021] -Finished job 83. -74 of 126 steps (59%) done -[Wed Mar 24 19:57:51 2021] -Finished job 105. -75 of 126 steps (60%) done - -[Wed Mar 24 19:57:51 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt.annotated - jobid: 106 - wildcards: sample=G_INPUT-293T-T - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt - -[Wed Mar 24 19:57:53 2021] -Finished job 106. -76 of 126 steps (60%) done -[Wed Mar 24 19:58:04 2021] -Finished job 101. -77 of 126 steps (61%) done - -[Wed Mar 24 19:58:04 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt.annotated - jobid: 102 - wildcards: sample=G_INPUT-iSLK-UR - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt - -[Wed Mar 24 19:58:06 2021] -Finished job 102. -78 of 126 steps (62%) done -[Wed Mar 24 19:58:15 2021] -Finished job 88. -79 of 126 steps (63%) done -[Wed Mar 24 20:00:05 2021] -Finished job 95. -80 of 126 steps (63%) done - -[Wed Mar 24 20:00:05 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt.annotated - jobid: 96 - wildcards: sample=G_293T-T - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt - -[Wed Mar 24 20:00:06 2021] -Finished job 50. -81 of 126 steps (64%) done - -[Wed Mar 24 20:00:06 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.hg38.bw - jobid: 84 - wildcards: sample=G_INPUT-293T-NT - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -[Wed Mar 24 20:00:07 2021] -Finished job 96. -82 of 126 steps (65%) done -Submitted job 84 with external jobid '11297999'. -[Wed Mar 24 20:00:16 2021] -Finished job 43. -83 of 126 steps (66%) done - -[Wed Mar 24 20:00:16 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.hg38.bw - jobid: 77 - wildcards: sample=G_INPUT-iSLK-R - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 77 with external jobid '11298001'. - -[Wed Mar 24 20:00:17 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/alignmentstats.txt - jobid: 51 - wildcards: sample=G_INPUT-iSLK-R - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/tmp4 - -Submitted job 51 with external jobid '11298002'. -[Wed Mar 24 20:01:16 2021] -Finished job 125. -84 of 126 steps (67%) done - -[Wed Mar 24 20:01:16 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/alignmentstats.txt - jobid: 58 - wildcards: sample=G_INPUT-293T-NT - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/tmp4 - -Submitted job 58 with external jobid '11298020'. - -[Wed Mar 24 20:01:17 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.hg38.bw - jobid: 92 - wildcards: sample=G_INPUT-293T-NT - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -[Wed Mar 24 20:01:18 2021] -Finished job 107. -85 of 126 steps (67%) done -Submitted job 92 with external jobid '11298021'. - -[Wed Mar 24 20:01:19 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt.annotated - jobid: 108 - wildcards: sample=G_INPUT-293T-NT - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt - -[Wed Mar 24 20:01:21 2021] -Finished job 108. -86 of 126 steps (68%) done -[Wed Mar 24 20:02:05 2021] -Finished job 48. -87 of 126 steps (69%) done - -[Wed Mar 24 20:02:05 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.hg38.bw - jobid: 82 - wildcards: sample=G_293t-NT - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 82 with external jobid '11298034'. -[Wed Mar 24 20:02:17 2021] -Finished job 119. -88 of 126 steps (70%) done - -[Wed Mar 24 20:02:17 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.hg38.bw - jobid: 86 - wildcards: sample=G_293T-T - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 86 with external jobid '11298035'. - -[Wed Mar 24 20:02:18 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/alignmentstats.txt - jobid: 52 - wildcards: sample=G_293T-T - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/tmp4 - -Submitted job 52 with external jobid '11298036'. -[Wed Mar 24 20:02:52 2021] -Finished job 103. -89 of 126 steps (71%) done - -[Wed Mar 24 20:02:52 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt.annotated - jobid: 104 - wildcards: sample=G_293t-NT - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt - -[Wed Mar 24 20:02:54 2021] -Finished job 104. -90 of 126 steps (71%) done -[Wed Mar 24 20:02:54 2021] -Finished job 91. -91 of 126 steps (72%) done -[Wed Mar 24 20:03:05 2021] -Finished job 85. -92 of 126 steps (73%) done -[Wed Mar 24 20:03:17 2021] -Finished job 89. -93 of 126 steps (74%) done -[Wed Mar 24 20:03:53 2021] -Finished job 97. -94 of 126 steps (75%) done - -[Wed Mar 24 20:03:53 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt.annotated - jobid: 98 - wildcards: sample=G_iSLK-R - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt - -[Wed Mar 24 20:03:55 2021] -Finished job 98. -95 of 126 steps (75%) done -[Wed Mar 24 20:04:16 2021] -Finished job 38. -96 of 126 steps (76%) done -[Wed Mar 24 20:04:29 2021] -Finished job 93. -97 of 126 steps (77%) done - -[Wed Mar 24 20:04:29 2021] -localrule annotate_clear_output: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt.annotated - jobid: 94 - wildcards: sample=G_INPUT-iSLK-R - - -## cleanup quant.txt* dirs before annotation -find /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR -maxdepth 1 -type d -name "quant.txt*" -exec rm -rf {} \; -python /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/annotate_clear_quant.py /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/resources/hg38_2_hg19_lookup.txt /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt - -[Wed Mar 24 20:04:30 2021] -Finished job 94. -98 of 126 steps (78%) done -[Wed Mar 24 20:06:16 2021] -Finished job 45. -99 of 126 steps (79%) done - -[Wed Mar 24 20:06:16 2021] -rule split_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.hg38.bw - jobid: 79 - wildcards: sample=G_iSLK-R - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 79 with external jobid '11298116'. -[Wed Mar 24 20:07:04 2021] -Finished job 77. -100 of 126 steps (79%) done -[Wed Mar 24 20:07:05 2021] -Finished job 84. -101 of 126 steps (80%) done -[Wed Mar 24 20:07:16 2021] -Finished job 41. -102 of 126 steps (81%) done -[Wed Mar 24 20:07:38 2021] -Finished job 36. -103 of 126 steps (82%) done -[Wed Mar 24 20:07:39 2021] -Finished job 92. -104 of 126 steps (83%) done -[Wed Mar 24 20:08:22 2021] -Finished job 39. -105 of 126 steps (83%) done -[Wed Mar 24 20:09:07 2021] -Finished job 82. -106 of 126 steps (84%) done -[Wed Mar 24 20:09:18 2021] -Finished job 86. -107 of 126 steps (85%) done -[Wed Mar 24 20:12:25 2021] -Finished job 35. -108 of 126 steps (86%) done -[Wed Mar 24 20:12:36 2021] -Finished job 120. -109 of 126 steps (87%) done - -[Wed Mar 24 20:12:36 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/alignmentstats.txt - jobid: 53 - wildcards: sample=G_iSLK-R - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/tmp4 - -Submitted job 53 with external jobid '11298165'. - -[Wed Mar 24 20:12:37 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.hg38.bw - jobid: 87 - wildcards: sample=G_iSLK-R - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 87 with external jobid '11298166'. -[Wed Mar 24 20:12:58 2021] -Finished job 79. -110 of 126 steps (87%) done -[Wed Mar 24 20:14:59 2021] -Finished job 40. -111 of 126 steps (88%) done -[Wed Mar 24 20:15:54 2021] -Finished job 42. -112 of 126 steps (89%) done -[Wed Mar 24 20:19:34 2021] -Finished job 37. -113 of 126 steps (90%) done - -[Wed Mar 24 20:19:34 2021] -localrule multiqc: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-R.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293T-T.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-R.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-UR.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-UR.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293t-NT.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-T.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-NT.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-R.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293T-T.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-R.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-UR.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-UR.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293t-NT.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-T.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-NT.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/multiqc_report.html - jobid: 117 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16 -multiqc . - -Activating environment modules: multiqc/1.9 -[-] Unloading singularity 3.7.2 on cn4487 -[-] Unloading snakemake 5.24.1 -[-] Unloading python 3.7 ... -[+] Loading singularity 3.7.2 on cn4487 -[+] Loading multiqc 1.9 -[WARNING] multiqc : MultiQC Version v1.10 now available! -[INFO ] multiqc : This is MultiQC v1.9 -[INFO ] multiqc : Template : default -[INFO ] multiqc : Searching : /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16 -[Wed Mar 24 20:20:25 2021] -Finished job 123. -114 of 126 steps (90%) done - -[Wed Mar 24 20:20:25 2021] -rule alignment_stats: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/alignmentstats.txt - jobid: 56 - wildcards: sample=G_293t-NT - threads: 4 - - -while read a b;do echo $a;done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp1 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp2 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp3 -while read a b;do bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/scripts/circRNA/workflow/scripts/bam_get_uniquely_aligned_fragments.bash /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam "$b";done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp4 -echo -ne "#region aligned_fragments spliced_fragments BSJ_fragments -" > /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/alignmentstats.txt -paste /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp4 >> /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/alignmentstats.txt -rm -f /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp1 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp2 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp3 /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/tmp4 - -Submitted job 56 with external jobid '11298338'. - -[Wed Mar 24 20:20:27 2021] -rule split_splice_reads_BAM_create_BW: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam - output: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.hg38.bw - jobid: 90 - wildcards: sample=G_293t-NT - threads: 2 - - -cd /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p -bam_basename="$(basename /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam)" -while read a b;do -bam="${bam_basename%.*}.${a}.bam" -samtools view /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.bam $b -b > /dev/shm/${bam%.*}.tmp.bam -sambamba sort --memory-limit=100G --tmpdir=/dev/shm --nthreads=2 --out=$bam /dev/shm/${bam%.*}.tmp.bam -bw="${bam%.*}.bw" -bdg="${bam%.*}.bdg" -sizes="${bam%.*}.sizes" -bedtools genomecov -bga -split -ibam $bam > $bdg -bedSort $bdg $bdg -if [ "$(wc -l $bdg|awk '{print $1}')" != "0" ];then -samtools view -H $bam|grep ^@SQ|cut -f2,3|sed "s/SN://g"|sed "s/LN://g" > $sizes -bedGraphToBigWig $bdg $sizes $bw -else -touch $bw -fi -rm -f $bdg $sizes -done < /data/Ziegelbauer_lab/resources/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.v3/hg38_rRNA_masked_plus_rRNA_plus_viruses_plus_ERCC.fa.regions - -Submitted job 90 with external jobid '11298339'. -[Wed Mar 24 20:20:35 2021] -Finished job 87. -115 of 126 steps (91%) done -[INFO ] custom_content : G_iSLK-UR.venn: Found 1 sample (image) -[INFO ] custom_content : G_INPUT-iSLK-R.venn: Found 1 sample (image) -[INFO ] custom_content : G_INPUT-293T-T.venn: Found 1 sample (image) -[INFO ] custom_content : G_INPUT-iSLK-UR.venn: Found 1 sample (image) -[INFO ] custom_content : G_293t-NT.venn: Found 1 sample (image) -[INFO ] custom_content : G_293T-T.venn: Found 1 sample (image) -[INFO ] custom_content : G_INPUT-293T-NT.venn: Found 1 sample (image) -[INFO ] custom_content : G_iSLK-R.venn: Found 1 sample (image) -[INFO ] picard : Found 8 MarkDuplicates reports -[INFO ] star : Found 16 reports and 8 gene count files -[INFO ] cutadapt : Found 8 reports -[INFO ] fastqc : Found 16 reports -[INFO ] multiqc : Compressing plot data -[INFO ] multiqc : Report : multiqc_report.html -[INFO ] multiqc : Data : multiqc_data -[INFO ] multiqc : MultiQC complete -[Wed Mar 24 20:20:58 2021] -Finished job 117. -116 of 126 steps (92%) done -[Wed Mar 24 20:21:46 2021] -Finished job 54. -117 of 126 steps (93%) done -[Wed Mar 24 20:25:07 2021] -Finished job 57. -118 of 126 steps (94%) done -[Wed Mar 24 20:29:07 2021] -Finished job 90. -119 of 126 steps (94%) done -[Wed Mar 24 20:31:27 2021] -Finished job 52. -120 of 126 steps (95%) done -[Wed Mar 24 20:33:37 2021] -Finished job 55. -121 of 126 steps (96%) done -[Wed Mar 24 20:38:38 2021] -Finished job 58. -122 of 126 steps (97%) done -[Wed Mar 24 20:39:38 2021] -Finished job 51. -123 of 126 steps (98%) done -[Wed Mar 24 20:58:49 2021] -Finished job 53. -124 of 126 steps (98%) done -[Wed Mar 24 21:05:19 2021] -Finished job 56. -125 of 126 steps (99%) done - -[Wed Mar 24 21:05:19 2021] -localrule all: - input: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R1.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/trim/G_INPUT-iSLK-R.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/trim/G_293T-T.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/trim/G_iSLK-R.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/trim/G_iSLK-UR.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/trim/G_INPUT-iSLK-UR.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/trim/G_293t-NT.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/trim/G_INPUT-293T-T.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/trim/G_INPUT-293T-NT.R2.trim.fastq.gz, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-R.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293T-T.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-R.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_iSLK-UR.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-iSLK-UR.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_293t-NT.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-T.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/fastqc/G_INPUT-293T-NT.R1.trim_fastqc.zip, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR1p/G_INPUT-iSLK-R_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR1p/G_293T-T_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR1p/G_iSLK-R_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR1p/G_iSLK-UR_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR1p/G_INPUT-iSLK-UR_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR1p/G_293t-NT_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR1p/G_INPUT-293T-T_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR1p/G_INPUT-293T-NT_p1.SJ.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/pass1.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Chimeric.out.junction, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.Aligned.sortedByCoord.out.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT_p2.ReadsPerGene.out.tab, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/stranded_STAR_GeneCounts.tsv, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-R.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293T-T.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-R.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_iSLK-UR.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-iSLK-UR.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_293t-NT.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-T.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/qc/picard_MarkDuplicates/G_INPUT-293T-NT.MarkDuplicates.metrics.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/alignmentstats.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/circExplorer/G_INPUT-iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/circExplorer/G_293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/circExplorer/G_iSLK-R.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/circExplorer/G_iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/circExplorer/G_INPUT-iSLK-UR.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/circExplorer/G_293t-NT.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/circExplorer/G_INPUT-293T-T.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/circExplorer/G_INPUT-293T-NT.circularRNA_known.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/ciri/G_INPUT-iSLK-R.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/ciri/G_293T-T.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/ciri/G_iSLK-R.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/ciri/G_iSLK-UR.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/ciri/G_INPUT-iSLK-UR.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/ciri/G_293t-NT.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/ciri/G_INPUT-293T-T.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/ciri/G_INPUT-293T-NT.ciri.out, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/ciri_count_matrix.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/circExplorer_count_matrix.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/circExplorer_BSJ_count_matrix.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.BSJ.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.hg38.bam, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/STAR2p/G_INPUT-iSLK-R.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/STAR2p/G_293T-T.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/STAR2p/G_iSLK-R.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/STAR2p/G_iSLK-UR.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/STAR2p/G_INPUT-iSLK-UR.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/STAR2p/G_293t-NT.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/STAR2p/G_INPUT-293T-T.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/STAR2p/G_INPUT-293T-NT.spliced_reads.hg38.bw, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/CLEAR/quant.txt.annotated, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.venn_mqc.png, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.cirionly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.circexploreronly.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-R/G_INPUT-iSLK-R.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293T-T/G_293T-T.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-R/G_iSLK-R.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_iSLK-UR/G_iSLK-UR.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-iSLK-UR/G_INPUT-iSLK-UR.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_293t-NT/G_293t-NT.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-T/G_INPUT-293T-T.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/results/G_INPUT-293T-NT/G_INPUT-293T-NT.common.lst, /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/multiqc_report.html - jobid: 0 - -[Wed Mar 24 21:05:19 2021] -Finished job 0. -126 of 126 steps (100%) done -Complete log: /gpfs/gsfs12/users/Ziegelbauer_lab/circRNADetection/circRNA_daq_v0.4.2/samples_16/.snakemake/log/2021-03-24T174440.557790.snakemake.log diff --git a/Biowulf/jobinfo b/Biowulf/jobinfo deleted file mode 100755 index ae80ed2..0000000 --- a/Biowulf/jobinfo +++ /dev/null @@ -1,210 +0,0 @@ -#!/usr/bin/env python3 - -""" -About: - This wrapper script works only on BIOWULF! - This script usage the "dashboard_cli" utility on biowulf to get HPC usage metadata - for a list of slurm jobids. These slurm jobids can be either provided at command - line or extracted from a snakemake.log file. Using snakemake.log file option together - with --failonly option lists path to the STDERR files for failed jobs. This can be - very useful to debug failed Snakemake workflows. -USAGE: - $ jobinfo -h -Example: - $ jobinfo -j 123456,7891011 - $ jobinfo -s /path/to/snakemake.log - $ jobinfo -j 123456,7891011 -o /path/to/report.tsv - $ jobinfo -s /path/to/snakemake.log --failonly -""" - -__version__ = 'v1.0.0' -__author__ = 'Vishal Koparde' -__email__ = 'vishal.koparde@nih.gov' - -import argparse,subprocess,json,os,datetime,time,textwrap,sys -import pandas as pd - -# SHORT_FIELDS used to display on screen -SHORT_FIELDS="jobid,state,jobname,elapsed_time,timelimit,time_util,cpus,max_cpu_util,mem,max_mem_util,exit_code" -FAILONLY_FIELDS="jobid,jobname,elapsed_time,timelimit,time_util,cpus,max_cpu_util,mem,max_mem_util,state_reason,eval,exit_code,std_err" -# LONG_FIELDS used to write to output file -LONG_FIELDS="jobid,jobname,state,state_reason,eval,exit_code,nodelist,partition,qos,submit_time,queued_time,queued_time_seconds,elapsed_time,elapsed_time_seconds,timelimit,timelimit_seconds,user,cpus,cpu_min,cpu_avg,cpu_max,mem,mem_min,mem_avg,mem_max,gres,work_dir,std_out,std_err" -FAILONLY="FAILED,TIMEOUT" - -# change FAILONLY state .. for debugging only -# FAILONLY="TIMEOUT" - -def exit_w_msg(message): - """ Gracefully exit with proper messsage""" - print('{} : EXITING!!'.format(__file__)) - print(message) - sys.exit() - -def check_help(parser): - """check if usage needs to be printed""" - if '-h' in sys.argv or '--help' in sys.argv or len(sys.argv) == 1: - print(__doc__) - parser.print_help() - parser.exit() - return - -def check_host(): - if os.environ.get('HOSTNAME') == "biowulf.nih.gov" or os.environ.get('HOSTNAME') == "helix.nih.gov" : - pass - else: - exit_w_msg("This script only works on BIOWULF!") - -def collect_args(): - # create parser - parser = argparse.ArgumentParser(description = 'Get slurm job information using slurm job id or snakemake.log file') - - # add version - parser.add_argument('-v','--version', action = 'version', version='%(prog)s {}'.format(__version__)) - - # add joblist - parser.add_argument('-j','--joblist', help='comma separated list of jobids. Cannot be used together with -s option.', required = False, type=str) - - # add snakemakelog - parser.add_argument('-s','--snakemakelog', help='snakemake.log file. Slurm jobids are extracted from here. Cannot be used together with -j option.',required = False, type=argparse.FileType('r')) - - # output file - parser.add_argument('-o','--output', help='Path to output file. All jobs (all states) and all columns are reported in output file.', type=str, required=False) - - # output only failed jobs - parser.add_argument('-f','--failonly', help='output FAILED jobs only (onscreen). Path to the STDERR files for failed jobs. All jobs are reported with -o option.', action='store_true', required=False) - - check_help(parser) - - # extract parsed arguments - args = parser.parse_args() - - - if args.output: - args.output=os.path.abspath(args.output) - if not os.access(os.path.dirname(args.output), os.W_OK): - msg = "File is not writable: {}".format(args.output) - exit_w_msg(msg) - - if args.joblist and args.snakemakelog: - exit_w_msg("Either -j or -s (not BOTH) is required!") - - if args.joblist: - jobids = args.joblist - args.joblist = jobids.split(",") - - if args.snakemakelog: # if snakemakelog file is given then extract the jobids from it. - cmd = 'grep "external jobid" ' + args.snakemakelog.name + ' | awk \'{print $NF}\' | sed "s/\'//g" | sed "s/\.//g"' - p1 = subprocess.run(cmd,capture_output=True,text=True,shell=True) - args.joblist = p1.stdout.strip().split("\n") - - return args - -def mem2gb(memstr): - if memstr == "0": - return float("0") - value,unit = memstr.split() - if unit == "GB": - return float(value) - elif unit == "MB": - return float(value) / 1024 - elif unit == "KB": - return float(value) / 1024 / 1024 - -def check_int_set_zero(s): - if s == '': - s = 0 - else: - s = int(s) - return s - -def time2sec(timestr): - debug=0 - dayHMSstr_list = timestr.split("-") - if debug==1: print(timestr) - if debug==1: print(dayHMSstr_list) - if debug==1: print(len(dayHMSstr_list)) - if len(dayHMSstr_list) == 2: - day = check_int_set_zero(dayHMSstr_list[0]) - HMSstr = dayHMSstr_list[1] - else: - day = 0 - HMSstr = dayHMSstr_list[0] - HMSstr_list = HMSstr.split(":") - if debug==1: print(HMSstr) - if debug==1: print(HMSstr_list) - if len(HMSstr_list) == 3: - hour = check_int_set_zero(HMSstr_list[0]) - minutes = check_int_set_zero(HMSstr_list[1]) - sec = check_int_set_zero(HMSstr_list[2]) - elif len(HMSstr_list) == 2: - hour = 0 - minutes = check_int_set_zero(HMSstr_list[0]) - sec = check_int_set_zero(HMSstr_list[1]) - elif len(HMSstr_list) == 1: - hour = 0 - minutes = 0 - sec = check_int_set_zero(HMSstr_list[0]) - if debug==1: print(day,hour,minutes,sec) - sec += int(day)*24*60*60 - if debug==1: print(day,hour,minutes,sec) - sec += int(hour)*60*60 - if debug==1: print(day,hour,minutes,sec) - sec += int(minutes)*60 - if debug==1: print(day,hour,minutes,sec) - return float(sec) - - -def get_jobinfo(args): - #cmd = '/usr/local/bin/dashboard_cli jobs --joblist ' + ",".join(args.joblist[0:10]) + " --archive --json --fields " + LONG_FIELDS - cmd = '/usr/local/bin/dashboard_cli jobs --joblist ' + ",".join(args.joblist) + " --archive --json --fields " + LONG_FIELDS - p1 = subprocess.run(cmd,capture_output=True,text=True,shell=True) - if p1.returncode != 0: - exit_w_msg("dashboard_cli failed!") - p1_json = json.loads(p1.stdout) - p1_table = pd.json_normalize(p1_json) - p1_table['epochtime'] = p1_table.apply( lambda row: time.mktime(datetime.datetime.strptime(row.submit_time,"%Y-%m-%dT%H:%M:%S").timetuple()), axis = 1) - p1_table = p1_table.sort_values(by=['epochtime']) - p1_table['max_cpu_util'] = p1_table.apply ( lambda row: "-" if row['cpu_max'] == "-" else "%.2f"%(float(row['cpu_max'])*100/int(row['cpus'])) + " %" , axis = 1) - p1_table['max_mem_util'] = p1_table.apply ( lambda row: "-" if row['mem_max'] == "-" else "%.2f"%(mem2gb(row['mem_max'])*100/mem2gb(row['mem'])) + " %" , axis = 1) - p1_table['queued_time_seconds'] = p1_table.apply ( lambda row: "%d"%(int(time2sec(row['queued_time']))), axis = 1) - p1_table['elapsed_time_seconds'] = p1_table.apply ( lambda row: "%d"%(int(time2sec(row['elapsed_time']))), axis = 1) - p1_table['timelimit_seconds'] = p1_table.apply ( lambda row: "%d"%(int(time2sec(row['timelimit']))), axis = 1) - p1_table['time_util'] = p1_table.apply ( lambda row: "%.2f"%(float(row['elapsed_time_seconds'])*100/float(row['timelimit_seconds'])) + " %" if float(row['timelimit_seconds']) != 0 else "- %",axis = 1) - if args.output: - try: - if not p1_table.empty: - p1_table.to_csv(args.output,sep="\t",header=True,index=False,columns=LONG_FIELDS.split(",")) - except: - msg = "File is not writable: {}".format(args.output) - exit_w_msg(msg) - return p1_table - -def filter_rows(func): - def wrapper(t,args): - if args.failonly: - t = t[t['state'].isin(FAILONLY.split(","))] - func(t,args) - return wrapper - -@filter_rows -def print2screen(t,args): - onscreenfields=SHORT_FIELDS - if args.failonly: - onscreenfields = FAILONLY_FIELDS - if t.empty: - print("Good News!! You have ZERO FAILED jobs!") - else: - print(t.to_string(index=False,justify="left",columns=onscreenfields.split(","))) - -def main(): - # check host - check_host() - # collect all arguments - args = collect_args() - # query dashboard_cli to get details as a pandas table - t = get_jobinfo(args) - # filter table, print to screen and write to output file - print2screen(t,args) - -if __name__ == '__main__': - main() diff --git a/Biowulf/peek b/Biowulf/peek deleted file mode 100755 index 2effbbb..0000000 --- a/Biowulf/peek +++ /dev/null @@ -1,123 +0,0 @@ -#!/usr/local/bin/python -# -*- coding: utf-8 -*- -from __future__ import print_function -import sys - - -def usage(): - """Print usage information and exit program""" - print("USAGE: {} [buffer]\n".format(sys.argv[0])) - print("Assumptions:\n\tInput file is tab delimted") - print("\t └── Globbing suported: *.txt\n") - print("Optional:\n\tbuffer = 40 (default)") - print("\t └── Changing buffer will increase/decrease output justification") - sys.exit() - - -def pargs(): - """Basic command-line parser """ - if '-h' in sys.argv or '--help' in sys.argv or len(sys.argv) == 1: - usage() - try: - fname = sys.argv[1] - except IndexError: - usage() - return - - -def max_string(data): - """Given a list of strings, finds the maximum strign length""" - max = -1 - for value in data: - if len(value) > max: - max = len(value) - return max - - -def print_header(filename, length): - """Print filenames and divider""" - print("# {}".format(filename)) - print("{}".format("="*length)) - -def justify(h, d, n, nr): - """Calculates the spacing for justifying to the right """ - xspaces = n - (h + d) - if nr < 10: - xspaces = xspaces - 2 - else: - xspaces = xspaces - 3 - spacing = xspaces * " " - return spacing - -def pprint(headlist, data, linelength, fn): - """Re-formats first two lines on file so columns are left justifed and values are right justified """ - # Print Filename - print_header(fn, linelength) - - # Print NR and justified contents of 1st and 2nd line - for i in range(len(headlist)): - rownumber = i + 1 - - # Attribute name and correspoding value - column = headlist[i].lstrip().rstrip() - if not column: - column = 'NULL' - value = data[i].lstrip().rstrip() - - # Calculate spacing for justifying to the right - insert_spaces = justify(len(column), len(value), linelength, rownumber) - print("{} {}{}{}".format(rownumber, column, insert_spaces, value)) - - -def peek(filename, buffer, delimeter='\t'): - pargs() - delim = delimeter - - # Getting contents of first line - try: - fh = open(filename, "r") - except IOError as e: - # File does not exist - print("\n{}\nPlease check you filename!\n\n".format(e)) - usage() - - headerlist = fh.readline().split(delim) - fh.close() - - # Getting contents of second line - fh = open(filename, "r") - try: - datalist = fh.readlines()[1].split(delim) - except IndexError: - datalist = ['EMPTY_FIELD'] - fh.close() - - max_attr_length = max_string(datalist) - total_length = max_attr_length + buffer - - # Pretty print data (Right justify results) - pprint(headerlist, datalist, total_length, filename) - print() - - - -def main(): - # Checking command-line usage before parsing - pargs() - - try: - buffer = int(sys.argv[-1]) - sys.argv.pop(-1) - except IndexError: - buffer = 40 - except ValueError: - buffer = 40 - - # Paring file(s) contents to support globbing - for file in sys.argv[1:]: - peek(file, buffer) - - -if __name__ == "__main__": - main() - diff --git a/Biowulf/run_jobby_on_nextflow_log b/Biowulf/run_jobby_on_nextflow_log deleted file mode 100755 index efecb52..0000000 --- a/Biowulf/run_jobby_on_nextflow_log +++ /dev/null @@ -1,4 +0,0 @@ -#!/usr/bin/env bash -nextflowlog=$1 -SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) -${SCRIPT_DIR}/jobby $(awk -F" jobId: " '{print $2}' ${nextflowlog} | awk -F";" '{print $1}' | grep -v "^$" | sort | uniq | tr "\\n" " ") |cut -f2,3,18 diff --git a/Biowulf/run_jobby_on_nextflow_log_full_format b/Biowulf/run_jobby_on_nextflow_log_full_format deleted file mode 100755 index c82de46..0000000 --- a/Biowulf/run_jobby_on_nextflow_log_full_format +++ /dev/null @@ -1,4 +0,0 @@ -#!/usr/bin/env bash -nextflowlog=$1 -SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) -${SCRIPT_DIR}/jobby $(awk -F" jobId: " '{print $2}' ${nextflowlog} | awk -F";" '{print $1}' | grep -v "^$" | sort | uniq | tr "\\n" " ") diff --git a/Biowulf/run_jobby_on_snakemake_log b/Biowulf/run_jobby_on_snakemake_log deleted file mode 100755 index f46ab22..0000000 --- a/Biowulf/run_jobby_on_snakemake_log +++ /dev/null @@ -1,4 +0,0 @@ -#!/usr/bin/env bash -snakemakelog=$1 -SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) -${SCRIPT_DIR}/jobby $(grep --color=never "^Submitted .* with external jobid" $snakemakelog | awk '{{print $NF}}' | sed "s/['.]//g" | sort | uniq | tr "\\n" " ") |cut -f2,3,18 diff --git a/Biowulf/run_jobby_on_snakemake_log_full_format b/Biowulf/run_jobby_on_snakemake_log_full_format deleted file mode 100755 index 1dc03e1..0000000 --- a/Biowulf/run_jobby_on_snakemake_log_full_format +++ /dev/null @@ -1,4 +0,0 @@ -#!/usr/bin/env bash -snakemakelog=$1 -SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd ) -${SCRIPT_DIR}/jobby $(grep --color=never "^Submitted .* with external jobid" $snakemakelog | awk '{{print $NF}}' | sed "s/['.]//g" | sort | uniq | tr "\\n" " ") diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..e69de29 diff --git a/GSEA/deg2gs.py b/GSEA/deg2gs.py deleted file mode 100755 index f32f234..0000000 --- a/GSEA/deg2gs.py +++ /dev/null @@ -1,198 +0,0 @@ -#!/usr/bin/env python - -''' -Susan Huse -NIAID Center for Biological Research -Frederick National Laboratory for Cancer Research -Leidos Biomedical - -deg2gs.py - Reads a rnaseq pipeliner *_DEG_all_genes.txt file - and outputs a prioritized list of Ensembl gene IDs for ToppFun - -v 1.0 - initial code version. -v 1.1 - updated for new column headers in pipeliner limma_DEG_all_genes.txt -v 1.2 - top2Excel format is now csv rather than tab-delimited - -''' - -__author__ = 'Susan Huse' -__date__ = 'August 6, 2018' -__version__ = '1.1' -__copyright__ = 'No copyright protection, can be used freely' - -import sys -import os -import re -import datetime -import pandas as pd - -import argparse -from argparse import RawTextHelpFormatter -from ncbr_huse import send_update - - - -#################################### -# -# Functions -# -#################################### -def filter_by_p(x, nhits, pvalue, qvalue): - # Filter the data to nhits, pvalue, qvalue - x.sort_values(by=['p'], inplace=True) - x = x[ (x['p'] <= pvalue) & (x['q'] <= qvalue)] - if x.shape[0] > nhits: - x = x.head(nhits) - return(x) - -#################################### -# -# Main -# -#################################### - -def main(): - # Usage statement - parseStr = 'Reads RNASeq differential expression output files\n\ -and outputs a prioritized list of genes for use in GSEA or ToppFun.\n\ -Will filter by both p and fdr values, and export up to nhits values.\n\ -Reads several limma output versions (topTable, pipeliner, top2Excel).\n\ -Outputs gene name and gsea rank for GSEA or Enssemble IDs for ToppFun.\n\ -NB: GSEA cannot support hyphens in filenames, theyΒ will be replaced with underscores.\n\n\ -Usage:\n\ - deg2gs.py -i infile -o outfile -n nHits -p pvalue -q fdrvalue -m method -s sheetname -f format\n\n\ -Example:\n\ - deg2gs.py -i ExpAll_limma_DEG_all_genes.txt -o ExpAll_limma_all_genes.gsea.rnk -n 1000 -q 0.05 -m gsea -f pipeliner\n\ - deg2gs.py -i DEanalysis.xlsx -o DEanalysis.topp.rnk -n 1000 -p 1e-05 -q 5e-02 -m toppfun -f topTable -s Sheet1\n\n' - - parser = argparse.ArgumentParser(description=parseStr, formatter_class=RawTextHelpFormatter) - parser.add_argument('-i', '--infile', required=True, nargs='?', type=argparse.FileType('r'), default=None, - help='Input file containing important things') - parser.add_argument('-o', '--outfile', required=True, action='store', type=str, default=None, - help='Output file for important results') - parser.add_argument('-m', '--method', type=str, action='store', default=None, choices=['gsea','toppfun'], - help='Method for gene set analysis (gsea or toppfun)') - parser.add_argument('-n', '--nhits', type=int, action='store', required=False, default=500, - help='Maximum number of top hits to extract, default = 500') - parser.add_argument('-p', '--pvalue', type=float, action='store', required=False, default=0.05, - help='Maxium p-value threshold to export, default = 0.05') - parser.add_argument('-q', '--qvalue', type=float, action='store', required=False, default=0.10, - help='Maximum FDR correction value to export, default = 0.10') - parser.add_argument('-s', '--sheetname', type=str, action='store', default=None, - help='Sheetname if input file is Excel rather than text (required for *.xlsx)') - parser.add_argument('-f', '--fformat', type=str, action='store', default="pipeliner", - choices=['pipeliner', 'topTable', 'top2Excel'], - help="Input file format for running gsea." +\ - 'Pipeliner output has different column names than limma topTable') - - # - # Set up the variables and the log file - # - args = parser.parse_args() - infile = args.infile - outfile = args.outfile - nhits = args.nhits - pvalue = args.pvalue - qvalue = args.qvalue - method = args.method - sheetname = args.sheetname - fformat = args.fformat - - # don't really need a log file for this - keepLog = False - - # replace hyphens with underscore, GSEA can't support hyphens - # NB: this replaces for toppfun as well, so the names are consistent, but you don't have to do both - fname = re.sub("-","_",outfile) - - # Column names for each format, to read and write - # NB: in_cols is not the same names as in the input file, but standardizes for this code - # Assumes that ensid_gene is the row index - if fformat == "pipeliner": - in_cols = ['gene', 'fc', 'log2FC', 'p', 'q', 'gsea'] - - elif fformat == "topTable": - in_cols = ['log2FC', 'AveExpr', 'gsea', 'p', 'q', 'B'] - - elif fformat == "top2Excel": - in_cols = ['ensid', 'gene', 'log2FC', 'AveExpr', 'gsea', 'p', 'q', 'B'] - - gsea_cols = ['gene','gsea'] - #top_cols = ['ensid'] - top_cols = ['gene'] - - # If keepLog then set it up - if keepLog: - thedate = str(datetime.datetime.now()).split()[0] - thedate = re.sub("-","",thedate) - - log = open('deg2gs' + '.log', 'a') - log.write('\n' + str(datetime.datetime.now()) + '\n') - log.write(' '.join(sys.argv) + '\n') - log.write('deg2gs.py version ' + __version__ + '\n') - log.write("Exporting genes to {}, n={}, p={}, q={}, ".format(outfile.name, nhits, pvalue, qvalue)) - log.flush() - - # - # Import from Excel or tabSV, if Excel extension than excel otherwise csv - # - infile_name, infile_extension = os.path.splitext(infile.name) - if infile_extension in [".xls", ".xlsx"]: - if sheetname is None: - err_out("Input Excel file requires sheet name", log) - df = pd.read_excel(infile.name, sheet_name = sheetname, header=0, index_col=0) - - else: - #df = pd.read_csv(infile, sep='\t', header=0, index_col=0) - df = pd.read_csv(infile, sep=',', header=0, index_col=0) - - # - # Set the columns based on the input format and the analysis method, but only if theyΒ match - # - if df.shape[1] != len(in_cols): - errMsg = '\nYour input file does not match the expected format "{}".\n'.format(fformat) + \ - 'Please check the file or the selected format and try again\n.' - if keepLog: - err_out(errMsg, log) - else: - print(errMsg) - sys.exit(1) - - df.columns = in_cols - - # split the ensemblID|Gene if necessary - if fformat == 'topTable': #and method == 'gsea': - df['gene'] = [re.sub("^.*\|", "", i) for i in df.index.values.tolist()] - - # Filter the df to the p-values, FDR, and number of hits specified - df = filter_by_p(df, nhits, pvalue, qvalue) - - # - # Grab the relevant columns and write out the file - # - if method == 'gsea': - df = df.filter(items=gsea_cols) - df.to_csv(fname, index=False, header=False, sep='\t') - - elif method == 'toppfun': - # # Get clean ensemble IDs for toppfun, strip off the .*$ from the Ensembl IDs and export them - # # top2Excel already has it - # if fformat != 'top2Excel': - # df['ensid'] = [re.sub("\..*$", "", i) for i in df.index.values.tolist()] - df = df.filter(items=top_cols) - df.to_csv(fname, index=False, header=False) - - # - # Close out the log file - # - if keepLog: - send_update("deg2gs.py successfully completed. {} written.".format(fname), log) - send_update(str(datetime.datetime.now()) + '\n', log) - log.close() - else: - print("deg2gs.py successfully completed. {} written.".format(fname)) - -if __name__ == '__main__': - main() - diff --git a/GSEA/multitext2excel.py b/GSEA/multitext2excel.py deleted file mode 100644 index fc9ea1f..0000000 --- a/GSEA/multitext2excel.py +++ /dev/null @@ -1,132 +0,0 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -""" -Created on Mon Aug 6 14:59:13 2018 - -Susan Huse -NIAID Center for Biological Research -Frederick National Laboratory for Cancer Research -Leidos Biomedical - -multitext2excel.py - Reads a list of files to import as separate tabs in Excel - -v 1.0 - initial code version. -v 1.1 - updated to include first splitter markowitzte@nih.gov - -""" -__author__ = 'Susan Huse' -__date__ = 'August 6, 2018' -__version__ = '1.1' -__copyright__ = 'No copyright protection, can be used freely' - -#import csv -import sys -import os -import re -import datetime -import pandas as pd -import glob -#import scipy -#import numpy - -import argparse -from argparse import RawTextHelpFormatter -from ncbr_huse import run_cmd, run_os_cmd, un_gzip, send_update, err_out, fasta_count, fasta_list - - -#################################### -# -# Functions -# -#################################### - -# -# Set up the parameters for the USEARCH command and run it -# - - -#################################### -# -# Main -# -#################################### - -def main(): - # Usage statement - parseStr = 'Reads a list of files and imports them each into a separate tab in one Excel spreadsheet.\n\n\ - Usage:\n\ - multitext2excel.py -o outfile -d directory -p filepattern -k delimiter -s namesplitter\n\ - Example:\n\ - multitext2excel.py -o MyResults.xlsx -d analysis -p ".txt" -k "\t" -s "."\n' - - - parser = argparse.ArgumentParser(description=parseStr, formatter_class=RawTextHelpFormatter) -# parser.add_argument('-i', '--infile', required=True, nargs='?', type=argparse.FileType('r'), default=None, -# help='Input file containing important things') - parser.add_argument('-o', '--outfile', required=True, nargs='?', type=argparse.FileType('w'), default=None, - help='Output file for important results') - parser.add_argument('-d', '--indir', required=False, type=str, action='store', default = '.', - help='Input directory containing data files to import [default="."]') - parser.add_argument('-p', '--pattern', required=True, type=str, - help='Pattern used to create list of input data files') - parser.add_argument('-k', '--delimiter', required=False, type=str, default='\t', - help='character delimiter that separates columns in each of the input data files [default="\t"]') - parser.add_argument('-s', '--splitter', required=False, type=str, default='.', - help='character to split input filenames to create output tab names. Cuts everything to the right [default="."]') - parser.add_argument('-f', '--firstsplitter', required=False, type=str, default='', - help='character to split input filenames to create output tab names. Cuts everything to the left [default=""]') - - - # - # Set up the variables and the log file - # - args = parser.parse_args() -# infile = args.infile - outfile = args.outfile - pattern = args.pattern - delimiter = args.delimiter - splitter = args.splitter - firstsplitter = args.firstsplitter - indir = args.indir - - thedate = str(datetime.datetime.now()).split()[0] - thedate = re.sub("-","",thedate) - - # Set up the log file - log = open('multitext2excel' + '.log', 'a') - log.write('\n' + str(datetime.datetime.now()) + '\n') - log.write(' '.join(sys.argv) + '\n') - log.write('multitext2excel.py version ' + __version__ + '\n') - log.flush() - - # Read for each matching file, read in and export to the output file - pattern = '*' + pattern + '*' - writer = pd.ExcelWriter(outfile.name) - for filename in glob.glob(os.path.join(indir, pattern)): - - # Extract the output tab name - #sheet_name = os.path.basename(filename).split(splitter)[0] - sheet_name = re.sub(indir + "/", "", filename).split(splitter)[0] - if firstsplitter != "": - sheet_name = sheet_name.split(firstsplitter)[1] - print("Writing data from input file: {} to output tab: {}".format(filename, sheet_name)) - - # Read in the data - df = pd.read_csv(filename, sep=delimiter, header=0, encoding = 'unicode_escape') - - # Write out the data - df.to_excel(writer, index=False, sheet_name=sheet_name) - - # Close it up! - writer.save() - - # - # Close out the log file - # - send_update("multitext2excel.py successfully completed. {} written.".format(outfile.name), log) - send_update(str(datetime.datetime.now()) + '\n', log) - log.close() - -if __name__ == '__main__': - main() diff --git a/Karyoplot/karyoploter.R b/Karyoplot/karyoploter.R deleted file mode 100644 index 275f83d..0000000 --- a/Karyoplot/karyoploter.R +++ /dev/null @@ -1,112 +0,0 @@ -#Author: Vishal Koparde, PhD -#Take reformatted DEG out file from RNASeq contrast and the geneinfo.bed to make a Karyoplot with -#updregulated genes in red and downregulated genes in blue - -library("argparse") - -parser <- ArgumentParser() -parser$add_argument("-d", "--degout", type="character", required=TRUE, - help="Reformmated DEG out file from limma/edgeR/DESeq2") -parser$add_argument("-c", "--gene2coord", type="character", required=TRUE, - help="Gene to coordinate file ie geneinfo.bed") -parser$add_argument("-g", "--genome", type="character", required=TRUE, - help="Genome .. either hg19/hg38/mm9/mm10/Mmul8.0.1/canFam3") -parser$add_argument("-f", "--fdr", type="double", default=0.05, - help="FDR cutoff to use") -args <- parser$parse_args() - -# setwd("/Users/kopardevn/Documents/Work/Projects/ccbr983/fastq2/GI_Skin_compares") -# args$degout="DESeq2_DEG_Skin_T-Skin_N_all_genes.txt" -# args$gene2coord="geneinfo.bed" -# args$fdr=0.05 -# args$genome="hg19" - - -for (f in c(args$degout,args$gene2coord)) -if (! file.exists(f)) { - stop(paste("File does not exist:",f)) -} - -if (! args$genome %in% c("hg19","hg38","mm9","mm10","Mmul8.0.1","canFam3")) { - stop("Only hg19/hg38/mm9/mm10/Mmul8.0.1/canFam3 genomes are supported!") -} - - -library("karyoploteR") -library("BSgenome.Mmusculus.UCSC.mm9") -library("BSgenome.Mmusculus.UCSC.mm10") -library("BSgenome.Hsapiens.UCSC.hg19") -library("BSgenome.Hsapiens.UCSC.hg38") -library("BSgenome.Mmulatta.UCSC.rheMac8") -library("BSgenome.Cfamiliaris.UCSC.canFam3.masked") - -if (args$genome=="Mmul8.0.1"){args$genome="rheMac8"} - -deseqout=read.delim(args$degout) -dim(deseqout) -fdr_filter=deseqout$fdr < args$fdr -positive_lfc_filter=deseqout$log2fc>1 -negative_lfc_filter=deseqout$log2fc < -1 -table(( negative_lfc_filter | positive_lfc_filter ) & fdr_filter ) -deseqout_filtered=deseqout[( ( negative_lfc_filter | positive_lfc_filter ) & fdr_filter ),] - -coordinates=read.delim(args$gene2coord,header=FALSE) -colnames(coordinates)=c("chr","start","end","strand","ensid","biotype","gene_name") -deseqout_filtered_w_coord=merge(deseqout_filtered,coordinates,by.x="gene",by.y="gene_name") -dim(deseqout_filtered_w_coord) - -if(nrow(deseqout_filtered_w_coord)==0){ - stop("No DEGs found. Try increasing FDR cutoff") -} - -genome=args$genome -chrs=c() -maxchrs=0 -if (genome %in% c("hg19","hg38")) {maxchrs=22} -if (genome %in% c("mm10","mm9")) {maxchrs=19} -if (genome %in% c("rheMac8")) {maxchrs=20} -if (genome %in% c("canFam3")) {maxchrs=38} - -for (i in seq(1,maxchrs)) {chrs=c(chrs,paste("chr",i,sep=""))} -chrs=c(chrs,"chrX") -if (! genome %in% c("canFam3")){chrs=c(chrs,"chrY")} - -y=round(length(chrs)/2) -a=chrs[seq(1,y)] -b=chrs[seq(y+1,length(chrs))] -chrs_subsets=list(a,b) - -deseqout_filtered_w_coord=deseqout_filtered_w_coord[deseqout_filtered_w_coord$chr %in% chrs,] -dim(deseqout_filtered_w_coord) - -pos_scale_limit=abs(floor(fivenum(deseqout_filtered_w_coord$log2fc)[2]))+0.5 -neg_scale_limit=-1*(abs(ceiling(fivenum(deseqout_filtered_w_coord$log2fc)[4]))+0.5) - -deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc > pos_scale_limit,]$log2fc=pos_scale_limit -deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc < neg_scale_limit,]$log2fc=neg_scale_limit - -upregulated=deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc>0,] -downregulated=deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc<0,] - - -for (i in seq(1,length(chrs_subsets))) { - chrs2=unlist(chrs_subsets[i]) - pdf(paste("karyoplot",i,".pdf",sep="")) - kp <- plotKaryotype(genome=genome, plot.type=2, chromosomes = chrs2, cytobands=NULL) - - kpDataBackground(kp, data.panel = 1, r0=0, r1=1) - kpDataBackground(kp, data.panel = 2, r0=0, r1=1) - kpHeatmap(kp, chr=upregulated$chr, - x0=upregulated$start, - x1=upregulated$end, - y=upregulated$log2fc, - data.panel = 1, - colors = c("white", "red")) - kpHeatmap(kp, chr=downregulated$chr, - x0=downregulated$start, - x1=downregulated$end, - y=downregulated$log2fc, - data.panel = 2, - colors = c("blue", "white")) - dev.off() -} diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 0000000..c1c5435 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,3 @@ +include CITATION.cff +include VERSION +include pyproject.toml diff --git a/README.md b/README.md index 9bfd0ef..ec16045 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,108 @@ -# Tools -Tools created by CCBR/NCBR members +# CCBR Tools + + + +Utilities for CCBR Bioinformatics Software + +[![build](https://github.com/CCBR/Tools/actions/workflows/build-python.yml/badge.svg)](https://github.com/CCBR/Tools/actions/workflows/build-python.yml) +[![codecov](https://codecov.io/gh/CCBR/Tools/graph/badge.svg?token=O73NOR65B3)](https://codecov.io/gh/CCBR/Tools) + +## Installation + +``` sh +pip install git+https://github.com/CCBR/Tools +``` + +## Usage + +``` python +!ccbr_tools --help +``` + + Usage: ccbr_tools [OPTIONS] COMMAND [ARGS]... + + Utilities for CCBR Bioinformatics Software + + For more options, run: tool_name [command] --help + + Options: + -v, --version Show the version and exit. + -h, --help Show this message and exit. + + Commands: + cite Print the citation in the desired format + version Print the version of ccbr_tools + + All installed tools: + ccbr_tools + gb2gtf + hf + intersect + jobby + jobinfo + peek + +``` python +import ccbr_tools.util +print(ccbr_tools.util.get_version()) +``` + + 0.1.0-dev + +## CLI Utilities + +Command-line utilities in CCBR Tools. + +- `ccbr_tools` +- `gb2gtf` +- `hf` +- `intersect` +- `jobby` +- `jobinfo` +- `peek` + +## External Scripts + +There are additional standalone scripts for various common tasks in +[scripts/](scripts/). They are less robust than the CLI Utilities +included in the package and do no have any unit tests. + +- `add_gene_name_to_count_matrix.R` +- `aggregate_data_tables.R` +- `argparse.bash` +- `cancel_snakemake_jobs.sh` +- `create_hpc_link.sh` +- `extract_value_from_json.py` +- `extract_value_from_yaml.py` +- `filter_bam_by_readids.py` +- `filter_fastq_by_readids_highmem.py` +- `filter_fastq_by_readids_highmem_pe.py` +- `gather_cluster_stats.sh` +- `gather_cluster_stats_biowulf.sh` +- `get_buyin_partition_list.bash` +- `get_slurm_file_with_error.sh` +- `gsea_preranked.sh` +- `karyoploter.R` +- `make_labels_for_pipeliner.sh` +- `rawcounts2normalizedcounts_DESeq2.R` +- `rawcounts2normalizedcounts_limmavoom.R` +- `run_jobby_on_nextflow_log` +- `run_jobby_on_nextflow_log_full_format` +- `run_jobby_on_snakemake_log` +- `run_jobby_on_snakemake_log_full_format` +- `spooker` +- `which_vpn.sh` + +## Citation + +Please cite this software if you use it in a publication: + + Sovacool K., Koparde V., Kuhn S. CCBR Tools: Utilities for CCBR Bioinformatics Software URL: https://ccbr.github.io/Tools/ + +### Bibtex entry + + @misc{YourReferenceHere, + author = {Sovacool, Kelly and Koparde, Vishal and Kuhn, Skyler}, + title = {CCBR Tools: Utilities for CCBR Bioinformatics Software}, + url = {https://ccbr.github.io/Tools/} + } diff --git a/README.qmd b/README.qmd new file mode 100644 index 0000000..d2c8e5e --- /dev/null +++ b/README.qmd @@ -0,0 +1,73 @@ +--- +format: gfm +--- + + + +# CCBR Tools + +Utilities for CCBR Bioinformatics Software + +[![build](https://github.com/CCBR/Tools/actions/workflows/build-python.yml/badge.svg)](https://github.com/CCBR/Tools/actions/workflows/build-python.yml) +[![codecov](https://codecov.io/gh/CCBR/Tools/graph/badge.svg?token=O73NOR65B3)](https://codecov.io/gh/CCBR/Tools) + +## Installation + +```sh +pip install git+https://github.com/CCBR/Tools +``` + +## Usage + +```{python} +!ccbr_tools --help +``` + + +```{python} +import ccbr_tools.util +print(ccbr_tools.util.get_version()) +``` + + +## CLI Utilities + +Command-line utilities in CCBR Tools. + +```{python} +#| echo: false +#| output: asis +print("\n".join([f" - `{cmd}`" for cmd in ccbr_tools.util.get_project_scripts()])) +``` + + +## External Scripts + +There are additional standalone scripts for various common tasks in [scripts/](scripts/). +They are less robust than the CLI Utilities included in the package and do no have any unit tests. + +```{python} +#| echo: false +#| output: asis +from ccbr_tools.util import get_pyproject_toml +from pathlib import Path +print('\n'.join([f" - `{Path(script).name}`" for script in get_pyproject_toml()['tool']['setuptools']['script-files']])) +``` + + +## Citation + +Please cite this software if you use it in a publication: + +```{python} +#| echo: false +!ccbr_tools cite -f apalike +``` + + +### Bibtex entry + +```{python} +#| echo: false +!ccbr_tools cite -f bibtex +``` diff --git a/Biowulf/example.snakemake.log.HPC_stats.biowulf.tsv b/docs/example.snakemake.log.HPC_stats.biowulf.tsv similarity index 100% rename from Biowulf/example.snakemake.log.HPC_stats.biowulf.tsv rename to docs/example.snakemake.log.HPC_stats.biowulf.tsv diff --git a/Biowulf/example.snakemake.log.HPC_stats.tsv b/docs/example.snakemake.log.HPC_stats.tsv similarity index 100% rename from Biowulf/example.snakemake.log.HPC_stats.tsv rename to docs/example.snakemake.log.HPC_stats.tsv diff --git a/gb2gtf/gb2gtf.py b/gb2gtf/gb2gtf.py deleted file mode 100644 index fb55bf6..0000000 --- a/gb2gtf/gb2gtf.py +++ /dev/null @@ -1,117 +0,0 @@ -#download GenBank file from NCBI and then -#Usage:python gb2gtf.py sequence.gb > sequence.gtf - -import os,sys -from Bio.Seq import Seq -from Bio.SeqRecord import SeqRecord -from Bio.SeqFeature import SeqFeature, FeatureLocation -from Bio import SeqIO -import Bio - -# get all sequence records for the specified genbank file -recs = [rec for rec in SeqIO.parse(sys.argv[1], "genbank")] - -# print the number of sequence records that were extracted -#print(len(recs)) - -# print annotations for each sequence record -#for rec in recs: -# print(rec.annotations) - -# print the CDS sequence feature summary information for each feature in each -# sequence record -for rec in recs: - #print(type(rec)) - seqname=rec.id - #feats = [feat for feat in rec.features if feat.type == "CDS"] - feats = [feat for feat in rec.features] - for feat in feats: -# print(feat) - l=feat.location - start=l.start - end=l.end - if feat.strand==1: - strand="+" - else: - strand="-" - if feat.type=="gene": - gffstring=list() - gffstring.append(seqname) - gffstring.append("RefSeq") - gffstring.append("gene") - gffstring.append(str(start)) - gffstring.append(str(end)) - gffstring.append(".") - gffstring.append(strand) - gffstring.append(".") - q=feat.qualifiers - try: - gene=q["gene"][0] - except: - try: - gene=q["locus_tag"][0] - except: - exit("Something fishy!") - - x="gene_name \"%s\"; gene_id \"%s\""%(gene,gene) - gffstring.append(x) - print("\t".join(gffstring)+";") -# #print(feat.qualifiers.keys()) -# #print(feat.qualifiers.values()) - elif feat.type=="CDS": - #if feat.type=="CDS": - gffstring=list() - gffstring.append(seqname) - gffstring.append("RefSeq") - gffstring.append("transcript") - gffstring.append(str(start)) - gffstring.append(str(end)) - gffstring.append(".") - gffstring.append(strand) - gffstring.append(".") - q=feat.qualifiers - try: - gene=q["gene"][0] - except: - try: - gene=q["locus_tag"][0] - except: - exit("Something fishy!") - x="gene_name \"%s\"; gene_id \"%s\"; transcript_id \"%s\"; transcript_name \"%s\""%(gene,gene,gene,gene) - gffstring.append(x) - print("\t".join(gffstring)+";") - gffstring[2]="exon" - if isinstance(l,Bio.SeqFeature.CompoundLocation): - parts=l.parts - #lenparts=len(parts) - for i,part in enumerate(parts): - j=i+1 - start=part.start - end=part.end - gffstring2=gffstring - gffstring2[3]=str(start) - gffstring2[4]=str(end) - y=x+"; exon_number %s"%(str(j)) - gffstring2[8]=y - print("\t".join(gffstring2)+";") - else: - y=x+"; exon_number 1" - gffstring[8]=y - print("\t".join(gffstring)+";") - - # print(j,part) - else: - continue - - #else: - -# print(l.start) -# exit() -# for l in feat.location: -# print(l.start) -# print(l.end) -# print(l.strand) -# exit() -# print(type(feat.location)) -# print(feat.strand) -# exit() diff --git a/git/README.md b/git/README.md deleted file mode 100644 index f6678c1..0000000 --- a/git/README.md +++ /dev/null @@ -1,205 +0,0 @@ - - -- [List Branches](#list-branches) -- [Create Branch](#create-branch) -- [Delete Branch](#delete-branch) -- [View and Set remote](#view-and-set-remote) -- [Adding files](#adding-files) -- [Removing files](#removing-files) -- [Commiting](#commiting) -- [Pulling](#pulling) -- [Pushing](#pushing) -- [PAT or Personal Access Token](#pat-or-personal-access-token) -- [References](#references) - - - -### List Branches - -The command to list all branches in local and remote repositories is: - -```bash -> git branch -a -``` - -Show only remote branches - -```bash -> git branch -r -``` - -Show only local branches - -```bash -> git branch -a | grep -v remotes -``` - -### Create Branch - -Create and switch to local branch - -```bash -> git branch newbr1 -> git checkout newbr1 -OR -> git checkout -b newbr1 -``` - -Move the local branch to remote - -```bash -> git push origin newbr1 -``` - -### Delete Branch - -Delete local branch: - -```bash -> git branch -d newbr1 -``` - -If branch has unmerged changes then - -```bash -> git branch -d -f newbr1 -``` - -Delete remote branch: - -```bash -> git push origin --delete newbr1 -``` - -Local and remote branches are distinct git objects, deleting one does not delete the other. You need to delete each explicitly. - -### View and Set remote - -View: - -```bash -> git remote -v -``` - -Set: - -```bash -> git remote add origin "repo URL" -``` - -### Adding files - -```bash -> git add file1 # adds one file -> git add folder1/file1 # adds one file inside a folder -> git add file1 file2 file3 # add multiple files at the same time -> git add . # add all files and folders in the current folder -> git add -all # add all folders in the curre -``` - -### Removing files - -Remove file from git working tree and local file system - -```bash -> git rm file1 -``` - -Remove file from git working tree but keep the local copy - -```bash -> git rm --cached file1 -``` - -Similar commands for removing folders recursively: - -```bash -> git rm -r folder1 # rm from git working tree and local file system -> git rm -r --cached folder1 # rm from git working tree only -``` - -### Commiting - -```bash -> git commit -m "commit message" -``` - -Change/Amend the commit message - -```bash -> git commit --amend -m "new commit message" -``` - -Undo last commit: **hard** ... this will undo commit and roll back all the changed files/folders - -```bash -> git reset --hard HEAD~1 -``` - -Undo last commit: **sort** ... this will undo commit, but keep all the changed files/folders - -```bash -> git reset --soft HEAD~1 -``` - -### Pulling - -Download content from remote and merge with local files: - -`pull` is basically a `fetch` + `merge` - -```bash -> git pull origin master -OR simply -> git pull -``` - -`origin` is the convention name for the remote repository and `master` is the branch being pulled - -### Pushing - -To push the currently checkedout branch to remote repo's `master` branch: - -```bash -> git push origin master -``` - -To push multiple (all) branches at the same time - -```bash -> git push origin --all -``` - -### PAT or Personal Access Token - -Github will stop letting users log in with simple username/password from Friday the 13th [08/13/21]! You can create PAT easily going to [Setting-->DeveloperSetting-->PAT](https://github.com/settings/tokens). Copy this and save it somewhere safe and it cannot be seen again if you leave the page and will have to be regenerated. FYI, most of use will be "ok" with just "**repo**" as the OAuth scope while creating the PAT. - -![image-20210813161235424](https://tva1.sinaimg.cn/large/008i3skNgy1gtft3sql0jj60i606q74n02.jpg) - -Once you have created your token, you can use it two ways: - -1. Continue the *old school* ways, i.e., whenever GH asks for password, copy and paste the token instead and you will be good to go! -2. Automate the whole process so GH does not ask for password at the time of `pull`, `push`, `commit` etc. Here are the steps for this: - - a. ensure that your GH *user.email* and *user.name* as set correctly. You can check the current settings using - ``` - git config -l - ``` - It should match your GH username/handle and GH email address. If it is NOT, then you can set it using - ``` - git config --global user.name "kopardev" - git config --global user.email "vishal.koparde@nih.gov" - ``` - b. Next run - ``` - git config --global credential.helper cache - ``` - c. The next time you do a `pull`, `push`, `commit` GH will ask for username and password, provide your \ and \. - - Done! GH will not ask for any password going forward. - - - -### References - -https://www.jquery-az.com/git-commands/ \ No newline at end of file diff --git a/homologfinder/create_human_mouse_homolog_table.py b/homologfinder/create_human_mouse_homolog_table.py deleted file mode 100644 index 8cc0172..0000000 --- a/homologfinder/create_human_mouse_homolog_table.py +++ /dev/null @@ -1,29 +0,0 @@ -#!/usr/bin/env python3 -import pandas as pd -cols = ["DB Class Key","Common Organism Name","Symbol"] -df = pd.read_csv("HOM_MouseHumanSequence.rpt", usecols=cols,sep="\t") -# human-mouse homologs file --> HOM_MouseHumanSequence.rpt -# can be downloaded from http://www.informatics.jax.org/faq/ORTH_dload.shtml -lookup = dict() -lookup2 = dict() -for index, row in df.iterrows(): - if not row["DB Class Key"] in lookup: - lookup[row["DB Class Key"]] = dict() - lookup[row["DB Class Key"]]["mouse, laboratory"] = list() - lookup[row["DB Class Key"]]["human"] = list() - if not row["Common Organism Name"] in lookup[row["DB Class Key"]]: - continue - lookup[row["DB Class Key"]][row["Common Organism Name"]].append(row["Symbol"]) -for k,v in lookup.items(): - #print(",".join(v["mouse, laboratory"]),",".join(v["human"]),sep="\t") - for l in v["mouse, laboratory"]: - if not l in lookup2: - lookup2[l] = list() - lookup2[l].extend(v["human"]) - for l in v["human"]: - if not l in lookup2: - lookup2[l] = list() - lookup2[l].extend(v["mouse, laboratory"]) - -for k,v in lookup2.items(): - print(k,",".join(v),sep="\t") diff --git a/homologfinder/hf b/homologfinder/hf deleted file mode 100755 index 36543ea..0000000 --- a/homologfinder/hf +++ /dev/null @@ -1,111 +0,0 @@ -#!/usr/bin/env python3 - -""" -About: - hf or HomologFinder finds homologs in human and mouse. - if the input gene or genelist is human, then it returns mouse homolog(s) and vice versa -USAGE: - $ hf -h -Example: - $ hf -g ZNF365 - $ hf -l Wdr53,Zfp365 - $ hf -f genelist.txt -""" - -__version__ = 'v1.0.0' -__author__ = 'Vishal Koparde' -__email__ = 'vishal.koparde@nih.gov' - -import argparse,subprocess,json,os,datetime,time,textwrap,sys -import pandas as pd -import requests -import io - -def exit_w_msg(message): - """ Gracefully exit with proper messsage""" - print('{} : EXITING!!'.format(__file__)) - print(message) - sys.exit() - -def check_help(parser): - """check if usage needs to be printed""" - if '-h' in sys.argv or '--help' in sys.argv or len(sys.argv) == 1: - print(__doc__) - parser.print_help() - parser.exit() - return - -def collect_args(): - """collect all the cli arguments""" - # create parser - parser = argparse.ArgumentParser(description = 'Get Human2Mouse (or Mouse2Human) homolog gene or genelist') - - # add version - parser.add_argument('-v','--version', action = 'version', version='%(prog)s {}'.format(__version__)) - - # add joblist - parser.add_argument('-g','--gene', help='single gene name', required = False, type=str) - - # add snakemakelog - parser.add_argument('-l','--genelist', help='comma separated gene list',required = False, type=str) - - # output file - parser.add_argument('-f','--genelistfile', help='genelist in file (one gene per line)', type=str, required=False) - - check_help(parser) - - # extract parsed arguments - args = parser.parse_args() - - if (args.gene and args.genelist) or (args.gene and args.genelistfile) or (args.genelist and args.genelistfile) or (args.gene and args.genelist and args.genelistfile): - msg = "Only one can be provided -g or -l or -f" - exit_w_msg(msg) - - return args - -def process_genelist(gl,lookup): - result = [] - for g in gl: - if g in lookup: - result.extend(lookup[g].split(",")) - return result - -def process_args(args,lookup): - if args.gene: - r = process_genelist([args.gene],lookup) - if args.genelist: - gl = args.genelist - r = process_genelist(gl.split(","),lookup) - if args.genelistfile: - with open(args.genelistfile) as f: - lines = f.readlines() - lines = list(map(lambda x:x.strip(),lines)) - r = process_genelist(lines,lookup) - return r - -def print_results(result): - for g in result: - print(g) - -def read_lookup(): - lookup = dict() - # read in lookup table from github - url = "https://raw.githubusercontent.com/CCBR/Tools/master/homologfinder/human_mouse_homolog_lookup.txt" - download = requests.get(url).content - lookupdf = pd.read_csv(io.StringIO(download.decode('utf-8')),sep="\t") - lookupdf.columns = ["geneName","homologs"] - for index, row in lookupdf.iterrows(): - lookup[row["geneName"]]=row["homologs"] - return lookup - -def main(): - # collect all arguments - args = collect_args() - # now that args are correct load in the lookup - lookup = read_lookup() - # process the arguments - result = process_args(args,lookup) - print_results(result) - -if __name__ == '__main__': - main() \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml new file mode 120000 index 0000000..76308dc --- /dev/null +++ b/pyproject.toml @@ -0,0 +1 @@ +src/ccbr_tools/pyproject.toml \ No newline at end of file diff --git a/scripts/DEG/add_gene_name_to_count_matrix.R b/scripts/DEG/add_gene_name_to_count_matrix.R deleted file mode 100644 index 2568291..0000000 --- a/scripts/DEG/add_gene_name_to_count_matrix.R +++ /dev/null @@ -1,49 +0,0 @@ -#!/usr/bin/env Rscript -# This script is used to lookup the gene_name for a column of gene_ids and -# add the gene_names as an extra column to the output. -# Example use case: RSEM outputs do not have gene_name but only gene_id. -# This script can then be used to add the gene_name column to the count matrix -# gene_id to gene_name lookup is created on the fly from a user provided GTF. -# -# Eg: -# -# Rscript add_gene_name_to_count_matrix.R \ -# -r rsem.raw_counts_matrix.tsv.tmp \ -# -g /data/RBL_NCI/Wolin/mESC_slam_analysis/resources/mm10/mm10_plus_45S_plus_5S.v2.genes.gtf \ -# -o rsem.raw_counts_matrix.tsv -# -suppressPackageStartupMessages(library("argparse")) - -# create parser object -parser <- ArgumentParser() - -# specify our desired options -# by default ArgumentParser will add an help option - -parser$add_argument("-r", "--rawcountsmatrix", - type="character", - help="file with raw counts matrix with gene_id column", - required=TRUE) -parser$add_argument("-g", "--gtf", - type="character", - help="GTF file with gene_name and gene_id", - required=TRUE) -parser$add_argument("-o","--outfile", - type="character", - help="count matrix with gene_name column included", - required=TRUE) - - -args <- parser$parse_args() - -library("rtracklayer") -library("tidyverse") -gtf2<-rtracklayer::import(args$gtf) -gtf2<-as.data.frame(gtf2) -unique(data.frame(gene_id=gtf2$gene_id,gene_name=gtf2$gene_name)) %>% drop_na() -> lookuptable - -in_df=read.csv(args$rawcountsmatrix,sep="\t",header=TRUE,check.names=FALSE) - -out_df=merge(in_df,lookuptable,by=c("gene_id")) - -write.table(out_df,file=args$outfile,sep="\t",quote=FALSE,row.names=FALSE,col.names=TRUE) diff --git a/scripts/DEG/aggregate_data_tables.R b/scripts/DEG/aggregate_data_tables.R deleted file mode 100644 index 8cec490..0000000 --- a/scripts/DEG/aggregate_data_tables.R +++ /dev/null @@ -1,113 +0,0 @@ -#!/usr/bin/env Rscript -# This script is used to extract 1 column each from multiple files and report all columns -# in a single file. -# Example use case for this script is extracting raw counts from multiple per sample RSEM -# files and creating a single raw counts matrix -# -# Eg: -# -# Rscript aggregate_data_tables.R \ -# -s KO1_resent_mutated,KO1_resent,KO1_resent_unmutated,KO1_slam_mutated \ -# -l ./KO1_resent/KO1_resent_mutated.RSEM.genes.results,./KO1_resent/KO1_resent.RSEM.genes.results,./KO1_resent/KO1_resent_unmutated.RSEM.genes.results \ -# -c expected_count \ -# -i gene_id \ -# -o rsem.raw_counts_matrix.tsv -# -suppressPackageStartupMessages(library("argparse")) -suppressPackageStartupMessages(library("sets")) - -## functions -checkfile <- function(filename) { - if( file.access(filename) == -1) { - stop(sprintf("Specified file ( %s ) does not exist", filename)) - } -} - -readfile <- function(filename,indexcols,datacol,samplename) { - d=read.csv(filename,header=TRUE,sep="\t") - cols=colnames(d) - reqdcols=c(indexcols,datacol) - if(set_is_subset(as.set(reqdcols),as.set(cols))){ - d=d[,reqdcols] - cols=colnames(d) - cols=gsub(pattern = datacol,replacement = samplename,cols) - colnames(d)=cols - return(d) - } else { - stop(sprintf("Required columns: %s are missing in file %s", reqdcols,filename)) - } -} - -debug=0 - - - -# create parser object -parser <- ArgumentParser() - -# specify our desired options -# by default ArgumentParser will add an help option - -parser$add_argument("-l", "--filelist", - type="character", - help="comma separated list of files", - required=TRUE) -parser$add_argument("-i", "--indexcols", - type="character", - help="comma separated list of columns to use as index while merging. These need to exist in all files provided in the -l option.", - required=TRUE) -parser$add_argument("-c","--datacol", - type="character", - help="column to be extracted from all samples and aggregated in the outputfile. Should exist in all files provided to -l. Its name will be replaced by corresponding value in -s argument.", - required=TRUE) -parser$add_argument("-s","--samplenames", - type="character", - help="comma separated list of sample names. Need to be unique. Will replace datacol (-c) when reported in the output file. ", - required=TRUE) -parser$add_argument("-o","--outfile", - type="character", - help="aggregated outfile", - required=TRUE) - - -args <- parser$parse_args() - -filelist=unlist(strsplit(args$filelist,",")) -indexcols=unlist(strsplit(args$indexcols,",")) -datacol=args$datacol -samplenames=unlist(strsplit(args$samplenames,",")) -outfile=args$outfile - -if(debug==1){ - filelist=unlist(strsplit("/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/a.tsv,/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/b.tsv",",")) - indexcols=unlist(strsplit("ensemblID,gene_name,mRNA_length",",")) - datacol="tpm" - samplenames=unlist(strsplit("A,B",",")) - outfile="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/test.tpm.tsv" -} - -# 2 or more files required -if (length(filelist)<=1) { - stop(sprintf("Two are more comma separated files needed")) -} -# nfiles and nsamplenames should match -if (length(unique(filelist))!=length(unique(samplenames))){ - stop(sprintf("Number of files need to match number of samplenames. Duplicate files and samplenames not allowed!")) -} - -for (f in filelist) {checkfile(f)} - -m=readfile(filename = filelist[1], - indexcols = indexcols, - datacol = datacol, - samplename = samplenames[1]) - -for (i in 2:length(filelist)){ - n=readfile(filename = filelist[i], - indexcols = indexcols, - datacol = datacol, - samplename = samplenames[i]) - m=merge(m,n,by=indexcols) -} - -write.table(m,file=outfile,quote = FALSE,sep="\t",row.names = FALSE) diff --git a/scripts/DEG/rawcounts2normalizedcounts_DESeq2.R b/scripts/DEG/rawcounts2normalizedcounts_DESeq2.R deleted file mode 100644 index b96bfe7..0000000 --- a/scripts/DEG/rawcounts2normalizedcounts_DESeq2.R +++ /dev/null @@ -1,113 +0,0 @@ -#!/usr/bin/env Rscript -# This script is used to normalize raw count matrix with DESeq2 normalization. -# Output is a log2 transformed DESeq2 normalized counts matrix. -# Note: round() is used to convert float raw counts to integers prior to DESeq2 -# normalization. -# Note: offset of 1 is added to normalized counts before log2 transformation -# -# Eg: -# -# Rscript rawcounts2normalizedcounts_DESeq2.R \ -# -r rsem.raw_counts_matrix.tsv \ -# -c rsem.raw_counts_matrix.tsv.colData \ -# -i gene_id,gene_name \ -# -o rsem.DESeq2_normalized_counts_matrix.tsv -# -suppressPackageStartupMessages(library("argparse")) - -# create parser object -parser <- ArgumentParser() - -parser$add_argument("-r", "--rawcountsmatrix", - type="character", - help="file with raw counts matrix", - required=TRUE) -parser$add_argument("-c", "--coldata", - type="character", - help="two tab delimited columns.. sample_name and condition", - required=FALSE) -parser$add_argument("-i", "--indexcols", - type="character", - help="comma separated list of columns that do not contain any counts eg. ensemblID, geneName, etc., ie., columns to be excluded from normalization by included in the output file.", - required=TRUE) -parser$add_argument("-x","--excludecols", - type="character", - help="comma separated list of columns in the input that should be excluded from the output file.", - required=FALSE) -parser$add_argument("-o","--outfile", - type="character", - help="name of outfile", - required=TRUE) - - -args <- parser$parse_args() - - -suppressPackageStartupMessages(library("DESeq2")) -suppressPackageStartupMessages(library("tidyverse")) -debug=0 - -rawcountsmatrix=args$rawcountsmatrix -coldata=args$coldata -indexcols=unlist(strsplit(args$indexcols,",")) -if (length(args$excludecols)==0){ - excludecols=c() -} else { - excludecols=unlist(strsplit(args$excludecols,",")) -} -outfile=args$outfil - - -if(debug==1){ - rawcountsmatrix="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.tsv" - coldata="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.coldata" - indexcols=unlist(strsplit("ensemblID,gene_name,mRNA_length",",")) - excludecols=unlist(strsplit("596-7-2_p1",",")) - outfile="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_DESeq2_normalized_counts.tsv" -} - - -# read in raw counts -d=read.csv(rawcountsmatrix,header=TRUE,sep="\t",check.names = FALSE) - -# remove excludecols, concate includecols into a single column and use it as index -d %>% select(-all_of(excludecols)) %>% - unite("geneID",all_of(indexcols),sep="##",remove=TRUE) %>% - column_to_rownames(.,var="geneID") -> e - -e <- round(as.matrix(e),0) - -if ( is.null(coldata) ){ - # convert count matrix df to SE object - se <- SummarizedExperiment(list(counts=as.matrix(e))) - # head(assay(se)) - - # convert SE object to DESeqDataSet - dds <- DESeqDataSet( se, design = ~ 1 ) - -} else { - as.data.frame(read.csv(coldata,header = TRUE,sep="\t")) %>% - select(c("sample_name","condition")) %>% - column_to_rownames(.,var="sample_name") -> cdata - # change hyphen to underscore in conditions - cdata$condition = as.factor(gsub("-","_",cdata$condition)) - dds <- DESeqDataSetFromMatrix(countData = as.matrix(e), - colData = cdata, - design = ~ condition) -} - -#Estimate size factors -dds <- estimateSizeFactors( dds ) -# sizeFactors(dds) - -# Plot column sums according to size factor -plot(sizeFactors(dds), colSums(counts(dds))) -abline(lm(colSums(counts(dds)) ~ sizeFactors(dds) + 0)) - -# get normalized counts -logcounts <- log2( counts(dds, normalized=TRUE) + 1 ) -as.data.frame(logcounts) %>% rownames_to_column(.,var="geneID") %>% - separate(col="geneID",into=indexcols,sep="##",remove = TRUE) -> outdf - -# write output -write.table(outdf,file=outfile,sep="\t",quote = FALSE,row.names = FALSE) diff --git a/scripts/DEG/rawcounts2normalizedcounts_limmavoom.R b/scripts/DEG/rawcounts2normalizedcounts_limmavoom.R deleted file mode 100644 index 249b0e0..0000000 --- a/scripts/DEG/rawcounts2normalizedcounts_limmavoom.R +++ /dev/null @@ -1,132 +0,0 @@ -#!/usr/bin/env Rscript -suppressPackageStartupMessages(library("argparse")) - -# create parser object -parser <- ArgumentParser() - -# specify our desired options -# by default ArgumentParser will add an help option - -parser$add_argument("-r", "--rawcountsmatrix", - type="character", - help="file with raw counts matrix", - required=TRUE) -parser$add_argument("-c", "--coldata", - type="character", - help="two tab delimited columns.. sample_name and condition", - required=TRUE) -parser$add_argument("-i", "--indexcols", - type="character", - help="comma separated list of columns that do not contain any counts eg. ensemblID, geneName, etc., ie., columns to be excluded from normalization by included in the output file.", - required=TRUE) -parser$add_argument("-x","--excludecols", - type="character", - help="comma separated list of columns in the input that should be excluded from the output file.", - required=FALSE) -parser$add_argument("-t","--cpmthreshold", - type="character", default="1", - help="cpm threshold (Default=1.0). Genes will cpm less than threshold are filtered out.", - required=FALSE) -parser$add_argument("-f","--mingroupfraction", - type="character", default="0.5", - help="Fraction of samples per group that should meet the CPM threshold", - required=FALSE) -parser$add_argument("-o","--outfile", - type="character", - help="name of outfile", - required=TRUE) - - -args <- parser$parse_args() - - -suppressPackageStartupMessages(library("limma")) -suppressPackageStartupMessages(library("edgeR")) -suppressPackageStartupMessages(library("tidyverse")) -debug=0 - -rawcountsmatrix=args$rawcountsmatrix -coldata=args$coldata -indexcols=unlist(strsplit(args$indexcols,",")) -#print(indexcols) -if (length(args$excludecols)==0){ -excludecols=c() -} else { -excludecols=unlist(strsplit(args$excludecols,",")) -} -#print(excludecols) -outfile=args$outfil -cpmthreshold=as.numeric(args$cpmthreshold) -min_group_fraction=as.numeric(args$mingroupfraction) -outfile2=paste0(outfile,".antilog") - - -if(debug==1){ - rawcountsmatrix="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.tsv" - coldata="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.coldata" - indexcols=unlist(strsplit("ensemblID,gene_name,mRNA_length",",")) - excludecols=unlist(strsplit("596-7-2_p1",",")) - cpmthreshold=as.numeric("1.0") - min_group_fraction=as.numeric("0.5") - outfile="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_limmavoom_normalized_counts.tsv" - outfile2="/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_limmavoom_normalized_counts.antilog.tsv" -} - - -# read in raw counts -d=read.csv(rawcountsmatrix,header=TRUE,sep="\t",check.names = FALSE) - -# remove excludecols, concate includecols into a single column and use it as index -d %>% select(-all_of(excludecols)) %>% - unite("geneID",all_of(indexcols),sep="##",remove=TRUE) %>% - column_to_rownames(.,var="geneID") -> e - -# load coldata -as.data.frame(read.csv(coldata,header = TRUE,sep="\t")) %>% - select(c("sample_name","condition")) -> cdata - # column_to_rownames(.,var="sample_name") -> cdata -# change hyphen to underscore in conditions -cdata$condition = as.factor(gsub("-","_",cdata$condition)) -design=model.matrix(~0+cdata$condition) - -d0 <- DGEList(as.matrix(e)) -d0 <- calcNormFactors(d0) - -# apply cpm filters -conditions <- as.vector(unique(cdata$condition)) -cpmd0 <- cpm(d0) -group = conditions[1] -print(group) -cpmsubset = cpmd0[,cdata[cdata$condition==group,]$sample_name] -nsamples = ncol(cpmsubset) -keep = !(rowSums(cpmsubsetmin_group_fraction) -for (i in 2:length(conditions)){ - group=conditions[i] - print(group) - cpmsubset = cpmd0[,cdata[cdata$condition==group,]$sample_name] - nsamples = ncol(cpmsubset) - k = !(rowSums(cpmsubsetmin_group_fraction) - keep = (keep|k) -} -d <- d0[keep,] - -# apply voom -v <- voom(as.matrix(d),design,plot=FALSE,normalize="quantile") - - -# Plot column sums according to size factor -# plot(d0$samples$norm.factors, d0$samples$lib.size) -# abline(lm(d0$samples$norm.factors ~ d0$samples$lib.size + 0)) - -# get normalized counts -logcounts <- v$E -as.data.frame(logcounts) %>% rownames_to_column(.,var="geneID") %>% - separate(col="geneID",into=indexcols,sep="##",remove = TRUE) -> outdf - -# write output -write.table(outdf,file=outfile,sep="\t",quote = FALSE,row.names = FALSE) - -antilogcounts <- 2^logcounts -as.data.frame(antilogcounts) %>% rownames_to_column(.,var="geneID") %>% - separate(col="geneID",into=indexcols,sep="##",remove = TRUE) -> outdf2 -write.table(outdf2,file=outfile2,sep="\t",quote = FALSE,row.names = FALSE) diff --git a/scripts/add_gene_name_to_count_matrix.R b/scripts/add_gene_name_to_count_matrix.R new file mode 100755 index 0000000..24adb47 --- /dev/null +++ b/scripts/add_gene_name_to_count_matrix.R @@ -0,0 +1,52 @@ +#!/usr/bin/env Rscript +# This script is used to lookup the gene_name for a column of gene_ids and +# add the gene_names as an extra column to the output. +# Example use case: RSEM outputs do not have gene_name but only gene_id. +# This script can then be used to add the gene_name column to the count matrix +# gene_id to gene_name lookup is created on the fly from a user provided GTF. +# +# Eg: +# +# Rscript add_gene_name_to_count_matrix.R \ +# -r rsem.raw_counts_matrix.tsv.tmp \ +# -g /data/RBL_NCI/Wolin/mESC_slam_analysis/resources/mm10/mm10_plus_45S_plus_5S.v2.genes.gtf \ +# -o rsem.raw_counts_matrix.tsv +# +suppressPackageStartupMessages(library("argparse")) + +# create parser object +parser <- ArgumentParser() + +# specify our desired options +# by default ArgumentParser will add an help option + +parser$add_argument("-r", "--rawcountsmatrix", + type = "character", + help = "file with raw counts matrix with gene_id column", + required = TRUE +) +parser$add_argument("-g", "--gtf", + type = "character", + help = "GTF file with gene_name and gene_id", + required = TRUE +) +parser$add_argument("-o", "--outfile", + type = "character", + help = "count matrix with gene_name column included", + required = TRUE +) + + +args <- parser$parse_args() + +library("rtracklayer") +library("tidyverse") +gtf2 <- rtracklayer::import(args$gtf) +gtf2 <- as.data.frame(gtf2) +unique(data.frame(gene_id = gtf2$gene_id, gene_name = gtf2$gene_name)) %>% drop_na() -> lookuptable + +in_df <- read.csv(args$rawcountsmatrix, sep = "\t", header = TRUE, check.names = FALSE) + +out_df <- merge(in_df, lookuptable, by = c("gene_id")) + +write.table(out_df, file = args$outfile, sep = "\t", quote = FALSE, row.names = FALSE, col.names = TRUE) diff --git a/scripts/aggregate_data_tables.R b/scripts/aggregate_data_tables.R new file mode 100755 index 0000000..eea5d36 --- /dev/null +++ b/scripts/aggregate_data_tables.R @@ -0,0 +1,124 @@ +#!/usr/bin/env Rscript +# This script is used to extract 1 column each from multiple files and report all columns +# in a single file. +# Example use case for this script is extracting raw counts from multiple per sample RSEM +# files and creating a single raw counts matrix +# +# Eg: +# +# Rscript aggregate_data_tables.R \ +# -s KO1_resent_mutated,KO1_resent,KO1_resent_unmutated,KO1_slam_mutated \ +# -l ./KO1_resent/KO1_resent_mutated.RSEM.genes.results,./KO1_resent/KO1_resent.RSEM.genes.results,./KO1_resent/KO1_resent_unmutated.RSEM.genes.results \ +# -c expected_count \ +# -i gene_id \ +# -o rsem.raw_counts_matrix.tsv +# +suppressPackageStartupMessages(library("argparse")) +suppressPackageStartupMessages(library("sets")) + +## functions +checkfile <- function(filename) { + if (file.access(filename) == -1) { + stop(sprintf("Specified file ( %s ) does not exist", filename)) + } +} + +readfile <- function(filename, indexcols, datacol, samplename) { + d <- read.csv(filename, header = TRUE, sep = "\t") + cols <- colnames(d) + reqdcols <- c(indexcols, datacol) + if (set_is_subset(as.set(reqdcols), as.set(cols))) { + d <- d[, reqdcols] + cols <- colnames(d) + cols <- gsub(pattern = datacol, replacement = samplename, cols) + colnames(d) <- cols + return(d) + } else { + stop(sprintf("Required columns: %s are missing in file %s", reqdcols, filename)) + } +} + +debug <- 0 + + + +# create parser object +parser <- ArgumentParser() + +# specify our desired options +# by default ArgumentParser will add an help option + +parser$add_argument("-l", "--filelist", + type = "character", + help = "comma separated list of files", + required = TRUE +) +parser$add_argument("-i", "--indexcols", + type = "character", + help = "comma separated list of columns to use as index while merging. These need to exist in all files provided in the -l option.", + required = TRUE +) +parser$add_argument("-c", "--datacol", + type = "character", + help = "column to be extracted from all samples and aggregated in the outputfile. Should exist in all files provided to -l. Its name will be replaced by corresponding value in -s argument.", + required = TRUE +) +parser$add_argument("-s", "--samplenames", + type = "character", + help = "comma separated list of sample names. Need to be unique. Will replace datacol (-c) when reported in the output file. ", + required = TRUE +) +parser$add_argument("-o", "--outfile", + type = "character", + help = "aggregated outfile", + required = TRUE +) + + +args <- parser$parse_args() + +filelist <- unlist(strsplit(args$filelist, ",")) +indexcols <- unlist(strsplit(args$indexcols, ",")) +datacol <- args$datacol +samplenames <- unlist(strsplit(args$samplenames, ",")) +outfile <- args$outfile + +if (debug == 1) { + filelist <- unlist(strsplit("/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/a.tsv,/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/b.tsv", ",")) + indexcols <- unlist(strsplit("ensemblID,gene_name,mRNA_length", ",")) + datacol <- "tpm" + samplenames <- unlist(strsplit("A,B", ",")) + outfile <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis/results/596-7-2/STAR/withChimericJunctions/test.tpm.tsv" +} + +# 2 or more files required +if (length(filelist) <= 1) { + stop(sprintf("Two are more comma separated files needed")) +} +# nfiles and nsamplenames should match +if (length(unique(filelist)) != length(unique(samplenames))) { + stop(sprintf("Number of files need to match number of samplenames. Duplicate files and samplenames not allowed!")) +} + +for (f in filelist) { + checkfile(f) +} + +m <- readfile( + filename = filelist[1], + indexcols = indexcols, + datacol = datacol, + samplename = samplenames[1] +) + +for (i in 2:length(filelist)) { + n <- readfile( + filename = filelist[i], + indexcols = indexcols, + datacol = datacol, + samplename = samplenames[i] + ) + m <- merge(m, n, by = indexcols) +} + +write.table(m, file = outfile, quote = FALSE, sep = "\t", row.names = FALSE) diff --git a/Biowulf/cancel_snakemake_jobs.sh b/scripts/cancel_snakemake_jobs.sh similarity index 95% rename from Biowulf/cancel_snakemake_jobs.sh rename to scripts/cancel_snakemake_jobs.sh index 81eae7a..9dc3db2 100644 --- a/Biowulf/cancel_snakemake_jobs.sh +++ b/scripts/cancel_snakemake_jobs.sh @@ -1,5 +1,6 @@ +#!/usr/bin/env bash ## Usage: ./cancel_snakemake_jobs.sh [ SNAKEMAKE_LOG_FILE ] -## +## ## This script will find all Slurm IDs in a snakemake log file ## and issue 'scancel' to cancel them diff --git a/Biowulf/create_hpc_link.sh b/scripts/create_hpc_link.sh similarity index 91% rename from Biowulf/create_hpc_link.sh rename to scripts/create_hpc_link.sh index bd402c2..f794bb9 100644 --- a/Biowulf/create_hpc_link.sh +++ b/scripts/create_hpc_link.sh @@ -1,5 +1,6 @@ +#!/usr/bin/env bash ## Usage: ./create_hpc_link.sh -## +## ## This script will print the HPC Datashare URL for accessing files in /data/CCBR/datashare ## for all files in the current directory diff --git a/scripts/extract_value_from_json.py b/scripts/extract_value_from_json.py old mode 100644 new mode 100755 index 6640753..3acbd62 --- a/scripts/extract_value_from_json.py +++ b/scripts/extract_value_from_json.py @@ -1,8 +1,12 @@ +#!/usr/bin/env python import json import argparse -parser = argparse.ArgumentParser(description='extract value for key from JSON') -parser.add_argument('-j',dest="json",required=True,help="input JSON file") -parser.add_argument('-k',dest="key",required=True,help="key whose value is to be extracted") + +parser = argparse.ArgumentParser(description="extract value for key from JSON") +parser.add_argument("-j", dest="json", required=True, help="input JSON file") +parser.add_argument( + "-k", dest="key", required=True, help="key whose value is to be extracted" +) args = parser.parse_args() data = json.load(open(args.json)) print(data[args.key]) diff --git a/scripts/extract_value_from_yaml.py b/scripts/extract_value_from_yaml.py old mode 100644 new mode 100755 index e57b0b9..56f7b92 --- a/scripts/extract_value_from_yaml.py +++ b/scripts/extract_value_from_yaml.py @@ -1,8 +1,12 @@ +#!/usr/bin/env python import yaml import argparse -parser = argparse.ArgumentParser(description='extract value for key from YAML') -parser.add_argument('-y',dest="yaml",required=True,help="input YAML file") -parser.add_argument('-k',dest="key",required=True,help="key whose value is to be extracted") + +parser = argparse.ArgumentParser(description="extract value for key from YAML") +parser.add_argument("-y", dest="yaml", required=True, help="input YAML file") +parser.add_argument( + "-k", dest="key", required=True, help="key whose value is to be extracted" +) args = parser.parse_args() data = yaml.safe_load(open(args.yaml)) print(data[args.key]) diff --git a/scripts/filter_bam_by_readids.py b/scripts/filter_bam_by_readids.py old mode 100644 new mode 100755 index a11d880..f39a847 --- a/scripts/filter_bam_by_readids.py +++ b/scripts/filter_bam_by_readids.py @@ -1,32 +1,45 @@ +#!/usr/bin/env python import pysam import sys import argparse import os -parser = argparse.ArgumentParser(description='Filter BAM by readids') -parser.add_argument('--inputBAM', dest='inputBAM', type=str, required=True, - help='input BAM file') -parser.add_argument('--outputBAM', dest='outputBAM', type=str, required=True, - help='filtered output BAM file') -parser.add_argument('--readids', dest='readids', type=str, required=True, - help='file with readids to keep (one readid per line)') + +parser = argparse.ArgumentParser(description="Filter BAM by readids") +parser.add_argument( + "--inputBAM", dest="inputBAM", type=str, required=True, help="input BAM file" +) +parser.add_argument( + "--outputBAM", + dest="outputBAM", + type=str, + required=True, + help="filtered output BAM file", +) +parser.add_argument( + "--readids", + dest="readids", + type=str, + required=True, + help="file with readids to keep (one readid per line)", +) args = parser.parse_args() -rids = list(map(lambda x:x.strip(),open(args.readids,'r').readlines())) +rids = list(map(lambda x: x.strip(), open(args.readids, "r").readlines())) inBAM = pysam.AlignmentFile(args.inputBAM, "rb") outBAM = pysam.AlignmentFile(args.outputBAM, "wb", template=inBAM) bigdict = dict() -count=0 +count = 0 for read in inBAM.fetch(): - count+=1 - if count%1000000 == 0: - print("%d reads read!"%(count)) - qn=read.query_name - if not qn in bigdict: - bigdict[qn]=list() - bigdict[qn].append(read) + count += 1 + if count % 1000000 == 0: + print("%d reads read!" % (count)) + qn = read.query_name + if not qn in bigdict: + bigdict[qn] = list() + bigdict[qn].append(read) inBAM.close() for r in rids: - for read in bigdict[r]: - outBAM.write(read) + for read in bigdict[r]: + outBAM.write(read) outBAM.close() diff --git a/scripts/filter_fastq_by_readids_highmem.py b/scripts/filter_fastq_by_readids_highmem.py old mode 100644 new mode 100755 index 0680403..d895cf6 --- a/scripts/filter_fastq_by_readids_highmem.py +++ b/scripts/filter_fastq_by_readids_highmem.py @@ -1,35 +1,50 @@ +#!/usr/bin/env python3 import HTSeq import sys import argparse import os + + def get_sname(s): - sname=s.name - sname=sname.split()[0] - return sname + sname = s.name + sname = sname.split()[0] + return sname + -parser = argparse.ArgumentParser(description='Filter FASTQ by readids') -parser.add_argument('--infq', dest='infq', type=str, required=True, - help='input FASTQ file') -parser.add_argument('--outfq', dest='outfq', type=str, required=True, - help='filtered output FASTQ file') -parser.add_argument('--readids', dest='readids', type=str, required=True, - help='file with readids to keep (one readid per line)') -parser.add_argument('--complement', dest='complement', action='store_true', - help='complement the readid list, ie., include readids NOT in the list') +parser = argparse.ArgumentParser(description="Filter FASTQ by readids") +parser.add_argument( + "--infq", dest="infq", type=str, required=True, help="input FASTQ file" +) +parser.add_argument( + "--outfq", dest="outfq", type=str, required=True, help="filtered output FASTQ file" +) +parser.add_argument( + "--readids", + dest="readids", + type=str, + required=True, + help="file with readids to keep (one readid per line)", +) +parser.add_argument( + "--complement", + dest="complement", + action="store_true", + help="complement the readid list, ie., include readids NOT in the list", +) args = parser.parse_args() -rids=set(map(lambda x:x.strip(),open(args.readids,'r').readlines())) -sequences = dict( (get_sname(s), s) for s in HTSeq.FastqReader(args.infq)) +rids = set(map(lambda x: x.strip(), open(args.readids, "r").readlines())) +sequences = dict((get_sname(s), s) for s in HTSeq.FastqReader(args.infq)) if args.complement: - rids=set(sequences.keys())-rids -outfqfilename=args.outfq -dummy=outfqfilename.strip().split(".") -if dummy[-1]=="gz": - dummy.pop(-1) - outfqfilename=".".join(dummy) -outfqfile = open(outfqfilename,'w') + rids = set(sequences.keys()) - rids +outfqfilename = args.outfq +dummy = outfqfilename.strip().split(".") +if dummy[-1] == "gz": + dummy.pop(-1) + outfqfilename = ".".join(dummy) +outfqfile = open(outfqfilename, "w") for rid in rids: - s=sequences[rid] - s.write_to_fastq_file(outfqfile) + s = sequences[rid] + s.write_to_fastq_file(outfqfile) outfqfile.close() -if dummy[-1]=="gz": - os.system("pigz -p4 -f "+outfqfilename) +if dummy[-1] == "gz": + os.system("pigz -p4 -f " + outfqfilename) diff --git a/scripts/filter_fastq_by_readids_highmem_pe.py b/scripts/filter_fastq_by_readids_highmem_pe.py old mode 100644 new mode 100755 index f68ea39..cf08e92 --- a/scripts/filter_fastq_by_readids_highmem_pe.py +++ b/scripts/filter_fastq_by_readids_highmem_pe.py @@ -1,53 +1,76 @@ +#!/usr/bin/env python import HTSeq import sys import argparse import os + + def get_sname(s): - sname=s.name - sname=sname.split()[0] - return sname + sname = s.name + sname = sname.split()[0] + return sname + def fixoutfilename(f): - outfqfilename=f - dummy=outfqfilename.strip().split(".") - if dummy[-1]=="gz": - dummy.pop(-1) - outfqfilename=".".join(dummy) - return outfqfilename + outfqfilename = f + dummy = outfqfilename.strip().split(".") + if dummy[-1] == "gz": + dummy.pop(-1) + outfqfilename = ".".join(dummy) + return outfqfilename -parser = argparse.ArgumentParser(description='Filter FASTQ by readids from PE reads') -parser.add_argument('--infq', dest='infq', type=str, required=True, - help='input FASTQ file') -parser.add_argument('--infq2', dest='infq2', type=str, required=True, - help='input FASTQ file') -parser.add_argument('--outfq', dest='outfq', type=str, required=True, - help='filtered output FASTQ file') -parser.add_argument('--outfq2', dest='outfq2', type=str, required=True, - help='filtered output FASTQ file') -parser.add_argument('--readids', dest='readids', type=str, required=True, - help='file with readids to keep (one readid per line)') -parser.add_argument('--complement', dest='complement', action='store_true', - help='complement the readid list, ie., include readids NOT in the list') +parser = argparse.ArgumentParser(description="Filter FASTQ by readids from PE reads") +parser.add_argument( + "--infq", dest="infq", type=str, required=True, help="input FASTQ file" +) +parser.add_argument( + "--infq2", dest="infq2", type=str, required=True, help="input FASTQ file" +) +parser.add_argument( + "--outfq", dest="outfq", type=str, required=True, help="filtered output FASTQ file" +) +parser.add_argument( + "--outfq2", + dest="outfq2", + type=str, + required=True, + help="filtered output FASTQ file", +) +parser.add_argument( + "--readids", + dest="readids", + type=str, + required=True, + help="file with readids to keep (one readid per line)", +) +parser.add_argument( + "--complement", + dest="complement", + action="store_true", + help="complement the readid list, ie., include readids NOT in the list", +) args = parser.parse_args() -rids=set(map(lambda x:x.strip(),open(args.readids,'r').readlines())) -sequences = dict( (get_sname(s), s) for s in HTSeq.FastqReader(args.infq)) -sequences2 = dict( (get_sname(s), s) for s in HTSeq.FastqReader(args.infq2)) -if len(set(sequences.keys())) != len(set(sequences.keys()).intersection(set(sequences2.keys()))): - print("readids differ between input paired end mates") - exit() +rids = set(map(lambda x: x.strip(), open(args.readids, "r").readlines())) +sequences = dict((get_sname(s), s) for s in HTSeq.FastqReader(args.infq)) +sequences2 = dict((get_sname(s), s) for s in HTSeq.FastqReader(args.infq2)) +if len(set(sequences.keys())) != len( + set(sequences.keys()).intersection(set(sequences2.keys())) +): + print("readids differ between input paired end mates") + exit() if args.complement: - rids=set(sequences.keys())-rids -outfqfilename=fixoutfilename(args.outfq) -outfqfile = open(outfqfilename,'w') -outfqfilename2=fixoutfilename(args.outfq2) -outfqfile2 = open(outfqfilename2,'w') + rids = set(sequences.keys()) - rids +outfqfilename = fixoutfilename(args.outfq) +outfqfile = open(outfqfilename, "w") +outfqfilename2 = fixoutfilename(args.outfq2) +outfqfile2 = open(outfqfilename2, "w") for rid in rids: - s=sequences[rid] - s.write_to_fastq_file(outfqfile) - s=sequences2[rid] - s.write_to_fastq_file(outfqfile2) + s = sequences[rid] + s.write_to_fastq_file(outfqfile) + s = sequences2[rid] + s.write_to_fastq_file(outfqfile2) outfqfile.close() outfqfile2.close() -os.system("pigz -p4 -f "+outfqfilename) -os.system("pigz -p4 -f "+outfqfilename2) +os.system("pigz -p4 -f " + outfqfilename) +os.system("pigz -p4 -f " + outfqfilename2) diff --git a/Biowulf/gather_cluster_stats.sh b/scripts/gather_cluster_stats.sh similarity index 98% rename from Biowulf/gather_cluster_stats.sh rename to scripts/gather_cluster_stats.sh index c53c6bd..9895d47 100644 --- a/Biowulf/gather_cluster_stats.sh +++ b/scripts/gather_cluster_stats.sh @@ -20,7 +20,7 @@ function get_sacct_info { attribute=$2 x=$(sacct -j $jobid --noheader --format="${attribute}%500"|head -n1|awk '{print $1}') echo $x -} +} function displaytime { local T=$1 @@ -86,7 +86,7 @@ END { print str } }' > /dev/shm/${jobid}.sacct.batchline -#batch line variables +#batch line variables jobdataarray["elapsed"]=$(get_batchline_variable "Elapsed") jobdataarray["reqcpus"]=$(get_batchline_variable "ReqCPUS") @@ -119,7 +119,7 @@ END { jobdataarray["runtime"]=$(displaytime $rt) jobdataarray["job_name"]=$(get_secondline_variable "JobName") jobdataarray["time_limit"]=$(get_secondline_variable "Timelimit") - jobdataarray["node_list"]=$(get_secondline_variable "NodeList") + jobdataarray["node_list"]=$(get_secondline_variable "NodeList") jobdataarray["run_node_partition"]=$(get_secondline_variable "Partition") jobdataarray["qos"]=$(get_secondline_variable "QOS") jobdataarray["username"]=$(get_secondline_variable "User") @@ -151,4 +151,4 @@ echo -ne "##SubmitTime\tHumanSubmitTime\tJobID:JobState:JobName\tNode;Partition: while read jid;do print_jobid_stats $jid done < $externalidslst |sort -k1,1n -rm -f $externalidslst /dev/shm/${jobid}.sacct* \ No newline at end of file +rm -f $externalidslst /dev/shm/${jobid}.sacct* diff --git a/Biowulf/gather_cluster_stats_biowulf.sh b/scripts/gather_cluster_stats_biowulf.sh similarity index 99% rename from Biowulf/gather_cluster_stats_biowulf.sh rename to scripts/gather_cluster_stats_biowulf.sh index e425d74..ff65fe0 100644 --- a/Biowulf/gather_cluster_stats_biowulf.sh +++ b/scripts/gather_cluster_stats_biowulf.sh @@ -88,4 +88,4 @@ echo -ne "##SubmitTime\tHumanSubmitTime\tJobID;JobState;JobName\tAllocNode;Alloc while read jid;do get_jobid_stats $jid done < ${snakemakelogfile}.jobids.lst |sort -k1,1n -rm -f ${snakemakelogfile}.jobids.lst \ No newline at end of file +rm -f ${snakemakelogfile}.jobids.lst diff --git a/Biowulf/get_buyin_partition_list.bash b/scripts/get_buyin_partition_list.bash similarity index 100% rename from Biowulf/get_buyin_partition_list.bash rename to scripts/get_buyin_partition_list.bash diff --git a/Biowulf/get_slurm_file_with_error.sh b/scripts/get_slurm_file_with_error.sh similarity index 94% rename from Biowulf/get_slurm_file_with_error.sh rename to scripts/get_slurm_file_with_error.sh index f34eafe..98074fe 100644 --- a/Biowulf/get_slurm_file_with_error.sh +++ b/scripts/get_slurm_file_with_error.sh @@ -1,16 +1,17 @@ +#!/usr/bin/env bash ## Usage: ./get_slurm_file_with_error.sh [ SNAKEMAKE_LOG_FILE ] -## +## ## This script tries to find the first failed job and returns the slurm output file -## that hopefully contains the error. This was written for troubleshooting CCBR +## that hopefully contains the error. This was written for troubleshooting CCBR ## Pipeliner, but should basically work for any Snakemake job on Biowulf you provide ## as an argument ## Specifically, it does this: -## 1. Find the first occurence of "Job failed" in the snakemake log file +## 1. Find the first occurrence of "Job failed" in the snakemake log file ## 2. Find the job id/rule name associated with it (usually occurs 3/4 lines before) ## 3. Find the Slurm ("external") ID of that job from when it was submitted ## 4. Find the slurm output file with that ID -## +## ## The analyst can then go through the slurm file to find/evaluate the error ## One liner version: @@ -32,12 +33,12 @@ then echo "Couldn't find any explicit failures." echo "" else - + slurmids=($(grep "job $jobid " $SNAKEMAKE_LOG | sed "s/^.*jobid '\(.*\)'\.$/\1/")) echo -e 'Rule Name\tSlurm ID(s)' echo -e $rulename'\t'$(IFS=, ; echo "${slurmids[*]}") echo "" - + if [ ${#slurmids[@]} -gt 0 ]; then echo "Slurm output files are listed below:" for slurmid in "${slurmids[@]}"; do @@ -53,9 +54,8 @@ else echo "Hmm... can't find the slurm error file..." fi fi - + ls -lh $errorfile done fi -fi - +fi diff --git a/GSEA/gsea_preranked b/scripts/gsea_preranked.sh similarity index 99% rename from GSEA/gsea_preranked rename to scripts/gsea_preranked.sh index c96ee7b..85d1f03 100755 --- a/GSEA/gsea_preranked +++ b/scripts/gsea_preranked.sh @@ -10,7 +10,7 @@ set -e if [ $# -lt 1 ]; then - echo + echo echo 'Usage: run_gsea_preranked rankfile gmtfile label' echo 'Example: run_gsea_preranked KO_WT.rnk MousePath_GO_gmt.gmt KO_WT.GO' echo diff --git a/scripts/karyoploter.R b/scripts/karyoploter.R new file mode 100644 index 0000000..5981941 --- /dev/null +++ b/scripts/karyoploter.R @@ -0,0 +1,140 @@ +#!/usr/bin/env Rscript +# Author: Vishal Koparde, PhD +# Take reformatted DEG out file from RNASeq contrast and the geneinfo.bed to make a Karyoplot with +# updregulated genes in red and downregulated genes in blue + +library("argparse") + +parser <- ArgumentParser() +parser$add_argument("-d", "--degout", + type = "character", required = TRUE, + help = "Reformmated DEG out file from limma/edgeR/DESeq2" +) +parser$add_argument("-c", "--gene2coord", + type = "character", required = TRUE, + help = "Gene to coordinate file ie geneinfo.bed" +) +parser$add_argument("-g", "--genome", + type = "character", required = TRUE, + help = "Genome .. either hg19/hg38/mm9/mm10/Mmul8.0.1/canFam3" +) +parser$add_argument("-f", "--fdr", + type = "double", default = 0.05, + help = "FDR cutoff to use" +) +args <- parser$parse_args() + +# setwd("/Users/kopardevn/Documents/Work/Projects/ccbr983/fastq2/GI_Skin_compares") +# args$degout="DESeq2_DEG_Skin_T-Skin_N_all_genes.txt" +# args$gene2coord="geneinfo.bed" +# args$fdr=0.05 +# args$genome="hg19" + + +for (f in c(args$degout, args$gene2coord)) { + if (!file.exists(f)) { + stop(paste("File does not exist:", f)) + } +} + +if (!args$genome %in% c("hg19", "hg38", "mm9", "mm10", "Mmul8.0.1", "canFam3")) { + stop("Only hg19/hg38/mm9/mm10/Mmul8.0.1/canFam3 genomes are supported!") +} + + +library("karyoploteR") +library("BSgenome.Mmusculus.UCSC.mm9") +library("BSgenome.Mmusculus.UCSC.mm10") +library("BSgenome.Hsapiens.UCSC.hg19") +library("BSgenome.Hsapiens.UCSC.hg38") +library("BSgenome.Mmulatta.UCSC.rheMac8") +library("BSgenome.Cfamiliaris.UCSC.canFam3.masked") + +if (args$genome == "Mmul8.0.1") { + args$genome <- "rheMac8" +} + +deseqout <- read.delim(args$degout) +dim(deseqout) +fdr_filter <- deseqout$fdr < args$fdr +positive_lfc_filter <- deseqout$log2fc > 1 +negative_lfc_filter <- deseqout$log2fc < -1 +table((negative_lfc_filter | positive_lfc_filter) & fdr_filter) +deseqout_filtered <- deseqout[((negative_lfc_filter | positive_lfc_filter) & fdr_filter), ] + +coordinates <- read.delim(args$gene2coord, header = FALSE) +colnames(coordinates) <- c("chr", "start", "end", "strand", "ensid", "biotype", "gene_name") +deseqout_filtered_w_coord <- merge(deseqout_filtered, coordinates, by.x = "gene", by.y = "gene_name") +dim(deseqout_filtered_w_coord) + +if (nrow(deseqout_filtered_w_coord) == 0) { + stop("No DEGs found. Try increasing FDR cutoff") +} + +genome <- args$genome +chrs <- c() +maxchrs <- 0 +if (genome %in% c("hg19", "hg38")) { + maxchrs <- 22 +} +if (genome %in% c("mm10", "mm9")) { + maxchrs <- 19 +} +if (genome %in% c("rheMac8")) { + maxchrs <- 20 +} +if (genome %in% c("canFam3")) { + maxchrs <- 38 +} + +for (i in seq(1, maxchrs)) { + chrs <- c(chrs, paste("chr", i, sep = "")) +} +chrs <- c(chrs, "chrX") +if (!genome %in% c("canFam3")) { + chrs <- c(chrs, "chrY") +} + +y <- round(length(chrs) / 2) +a <- chrs[seq(1, y)] +b <- chrs[seq(y + 1, length(chrs))] +chrs_subsets <- list(a, b) + +deseqout_filtered_w_coord <- deseqout_filtered_w_coord[deseqout_filtered_w_coord$chr %in% chrs, ] +dim(deseqout_filtered_w_coord) + +pos_scale_limit <- abs(floor(fivenum(deseqout_filtered_w_coord$log2fc)[2])) + 0.5 +neg_scale_limit <- -1 * (abs(ceiling(fivenum(deseqout_filtered_w_coord$log2fc)[4])) + 0.5) + +deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc > pos_scale_limit, ]$log2fc <- pos_scale_limit +deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc < neg_scale_limit, ]$log2fc <- neg_scale_limit + +upregulated <- deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc > 0, ] +downregulated <- deseqout_filtered_w_coord[deseqout_filtered_w_coord$log2fc < 0, ] + + +for (i in seq(1, length(chrs_subsets))) { + chrs2 <- unlist(chrs_subsets[i]) + pdf(paste("karyoplot", i, ".pdf", sep = "")) + kp <- plotKaryotype(genome = genome, plot.type = 2, chromosomes = chrs2, cytobands = NULL) + + kpDataBackground(kp, data.panel = 1, r0 = 0, r1 = 1) + kpDataBackground(kp, data.panel = 2, r0 = 0, r1 = 1) + kpHeatmap(kp, + chr = upregulated$chr, + x0 = upregulated$start, + x1 = upregulated$end, + y = upregulated$log2fc, + data.panel = 1, + colors = c("white", "red") + ) + kpHeatmap(kp, + chr = downregulated$chr, + x0 = downregulated$start, + x1 = downregulated$end, + y = downregulated$log2fc, + data.panel = 2, + colors = c("blue", "white") + ) + dev.off() +} diff --git a/Biowulf/make_labels_for_pipeliner.sh b/scripts/make_labels_for_pipeliner.sh similarity index 99% rename from Biowulf/make_labels_for_pipeliner.sh rename to scripts/make_labels_for_pipeliner.sh index 09298b4..9b249fb 100644 --- a/Biowulf/make_labels_for_pipeliner.sh +++ b/scripts/make_labels_for_pipeliner.sh @@ -1,5 +1,5 @@ ## Usage: ./make_labels_for_pipeliner.sh -## +## ## This script will look for *fastq.gz files in the current directory and ## make a 'labels.txt' file for pipeliner when the FASTQs do not conform ## to pipeliner's nomenclature (i.e. *_R1_001.fastq.gz instead of *.R1.fastq.gz) diff --git a/scripts/rawcounts2normalizedcounts_DESeq2.R b/scripts/rawcounts2normalizedcounts_DESeq2.R new file mode 100755 index 0000000..4325b41 --- /dev/null +++ b/scripts/rawcounts2normalizedcounts_DESeq2.R @@ -0,0 +1,121 @@ +#!/usr/bin/env Rscript +# This script is used to normalize raw count matrix with DESeq2 normalization. +# Output is a log2 transformed DESeq2 normalized counts matrix. +# Note: round() is used to convert float raw counts to integers prior to DESeq2 +# normalization. +# Note: offset of 1 is added to normalized counts before log2 transformation +# +# Eg: +# +# Rscript rawcounts2normalizedcounts_DESeq2.R \ +# -r rsem.raw_counts_matrix.tsv \ +# -c rsem.raw_counts_matrix.tsv.colData \ +# -i gene_id,gene_name \ +# -o rsem.DESeq2_normalized_counts_matrix.tsv +# +suppressPackageStartupMessages(library("argparse")) + +# create parser object +parser <- ArgumentParser() + +parser$add_argument("-r", "--rawcountsmatrix", + type = "character", + help = "file with raw counts matrix", + required = TRUE +) +parser$add_argument("-c", "--coldata", + type = "character", + help = "two tab delimited columns.. sample_name and condition", + required = FALSE +) +parser$add_argument("-i", "--indexcols", + type = "character", + help = "comma separated list of columns that do not contain any counts eg. ensemblID, geneName, etc., ie., columns to be excluded from normalization by included in the output file.", + required = TRUE +) +parser$add_argument("-x", "--excludecols", + type = "character", + help = "comma separated list of columns in the input that should be excluded from the output file.", + required = FALSE +) +parser$add_argument("-o", "--outfile", + type = "character", + help = "name of outfile", + required = TRUE +) + + +args <- parser$parse_args() + + +suppressPackageStartupMessages(library("DESeq2")) +suppressPackageStartupMessages(library("tidyverse")) +debug <- 0 + +rawcountsmatrix <- args$rawcountsmatrix +coldata <- args$coldata +indexcols <- unlist(strsplit(args$indexcols, ",")) +if (length(args$excludecols) == 0) { + excludecols <- c() +} else { + excludecols <- unlist(strsplit(args$excludecols, ",")) +} +outfile <- args$outfil + + +if (debug == 1) { + rawcountsmatrix <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.tsv" + coldata <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.coldata" + indexcols <- unlist(strsplit("ensemblID,gene_name,mRNA_length", ",")) + excludecols <- unlist(strsplit("596-7-2_p1", ",")) + outfile <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_DESeq2_normalized_counts.tsv" +} + + +# read in raw counts +d <- read.csv(rawcountsmatrix, header = TRUE, sep = "\t", check.names = FALSE) + +# remove excludecols, concate includecols into a single column and use it as index +d %>% + select(-all_of(excludecols)) %>% + unite("geneID", all_of(indexcols), sep = "##", remove = TRUE) %>% + column_to_rownames(., var = "geneID") -> e + +e <- round(as.matrix(e), 0) + +if (is.null(coldata)) { + # convert count matrix df to SE object + se <- SummarizedExperiment(list(counts = as.matrix(e))) + # head(assay(se)) + + # convert SE object to DESeqDataSet + dds <- DESeqDataSet(se, design = ~1) +} else { + as.data.frame(read.csv(coldata, header = TRUE, sep = "\t")) %>% + select(c("sample_name", "condition")) %>% + column_to_rownames(., var = "sample_name") -> cdata + # change hyphen to underscore in conditions + cdata$condition <- as.factor(gsub("-", "_", cdata$condition)) + dds <- DESeqDataSetFromMatrix( + countData = as.matrix(e), + colData = cdata, + design = ~condition + ) +} + +# Estimate size factors +dds <- estimateSizeFactors(dds) +# sizeFactors(dds) + +# Plot column sums according to size factor +plot(sizeFactors(dds), colSums(counts(dds))) +abline(lm(colSums(counts(dds)) ~ sizeFactors(dds) + 0)) + +# get normalized counts +logcounts <- log2(counts(dds, normalized = TRUE) + 1) +as.data.frame(logcounts) %>% + rownames_to_column(., var = "geneID") %>% + separate(col = "geneID", into = indexcols, sep = "##", remove = TRUE) -> outdf + +# write output +write.table(outdf, file = outfile, sep = "\t", quote = FALSE, row.names = FALSE) diff --git a/scripts/rawcounts2normalizedcounts_limmavoom.R b/scripts/rawcounts2normalizedcounts_limmavoom.R new file mode 100755 index 0000000..d5734c8 --- /dev/null +++ b/scripts/rawcounts2normalizedcounts_limmavoom.R @@ -0,0 +1,142 @@ +#!/usr/bin/env Rscript +suppressPackageStartupMessages(library("argparse")) + +# create parser object +parser <- ArgumentParser() + +# specify our desired options +# by default ArgumentParser will add an help option + +parser$add_argument("-r", "--rawcountsmatrix", + type = "character", + help = "file with raw counts matrix", + required = TRUE +) +parser$add_argument("-c", "--coldata", + type = "character", + help = "two tab delimited columns.. sample_name and condition", + required = TRUE +) +parser$add_argument("-i", "--indexcols", + type = "character", + help = "comma separated list of columns that do not contain any counts eg. ensemblID, geneName, etc., ie., columns to be excluded from normalization by included in the output file.", + required = TRUE +) +parser$add_argument("-x", "--excludecols", + type = "character", + help = "comma separated list of columns in the input that should be excluded from the output file.", + required = FALSE +) +parser$add_argument("-t", "--cpmthreshold", + type = "character", default = "1", + help = "cpm threshold (Default=1.0). Genes will cpm less than threshold are filtered out.", + required = FALSE +) +parser$add_argument("-f", "--mingroupfraction", + type = "character", default = "0.5", + help = "Fraction of samples per group that should meet the CPM threshold", + required = FALSE +) +parser$add_argument("-o", "--outfile", + type = "character", + help = "name of outfile", + required = TRUE +) + + +args <- parser$parse_args() + + +suppressPackageStartupMessages(library("limma")) +suppressPackageStartupMessages(library("edgeR")) +suppressPackageStartupMessages(library("tidyverse")) +debug <- 0 + +rawcountsmatrix <- args$rawcountsmatrix +coldata <- args$coldata +indexcols <- unlist(strsplit(args$indexcols, ",")) +# print(indexcols) +if (length(args$excludecols) == 0) { + excludecols <- c() +} else { + excludecols <- unlist(strsplit(args$excludecols, ",")) +} +# print(excludecols) +outfile <- args$outfil +cpmthreshold <- as.numeric(args$cpmthreshold) +min_group_fraction <- as.numeric(args$mingroupfraction) +outfile2 <- paste0(outfile, ".antilog") + + +if (debug == 1) { + rawcountsmatrix <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.tsv" + coldata <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_raw_counts_counts.coldata" + indexcols <- unlist(strsplit("ensemblID,gene_name,mRNA_length", ",")) + excludecols <- unlist(strsplit("596-7-2_p1", ",")) + cpmthreshold <- as.numeric("1.0") + min_group_fraction <- as.numeric("0.5") + outfile <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_limmavoom_normalized_counts.tsv" + outfile2 <- "/Volumes/CCBR/projects/ccbr1060/Hg38_shRNA_hybrid/HGHY2DRXY_analysis_v2/results/all_limmavoom_normalized_counts.antilog.tsv" +} + + +# read in raw counts +d <- read.csv(rawcountsmatrix, header = TRUE, sep = "\t", check.names = FALSE) + +# remove excludecols, concate includecols into a single column and use it as index +d %>% + select(-all_of(excludecols)) %>% + unite("geneID", all_of(indexcols), sep = "##", remove = TRUE) %>% + column_to_rownames(., var = "geneID") -> e + +# load coldata +as.data.frame(read.csv(coldata, header = TRUE, sep = "\t")) %>% + select(c("sample_name", "condition")) -> cdata +# column_to_rownames(.,var="sample_name") -> cdata +# change hyphen to underscore in conditions +cdata$condition <- as.factor(gsub("-", "_", cdata$condition)) +design <- model.matrix(~ 0 + cdata$condition) + +d0 <- DGEList(as.matrix(e)) +d0 <- calcNormFactors(d0) + +# apply cpm filters +conditions <- as.vector(unique(cdata$condition)) +cpmd0 <- cpm(d0) +group <- conditions[1] +print(group) +cpmsubset <- cpmd0[, cdata[cdata$condition == group, ]$sample_name] +nsamples <- ncol(cpmsubset) +keep <- !(rowSums(cpmsubset < cpmthreshold) / nsamples > min_group_fraction) +for (i in 2:length(conditions)) { + group <- conditions[i] + print(group) + cpmsubset <- cpmd0[, cdata[cdata$condition == group, ]$sample_name] + nsamples <- ncol(cpmsubset) + k <- !(rowSums(cpmsubset < cpmthreshold) / nsamples > min_group_fraction) + keep <- (keep | k) +} +d <- d0[keep, ] + +# apply voom +v <- voom(as.matrix(d), design, plot = FALSE, normalize = "quantile") + + +# Plot column sums according to size factor +# plot(d0$samples$norm.factors, d0$samples$lib.size) +# abline(lm(d0$samples$norm.factors ~ d0$samples$lib.size + 0)) + +# get normalized counts +logcounts <- v$E +as.data.frame(logcounts) %>% + rownames_to_column(., var = "geneID") %>% + separate(col = "geneID", into = indexcols, sep = "##", remove = TRUE) -> outdf + +# write output +write.table(outdf, file = outfile, sep = "\t", quote = FALSE, row.names = FALSE) + +antilogcounts <- 2^logcounts +as.data.frame(antilogcounts) %>% + rownames_to_column(., var = "geneID") %>% + separate(col = "geneID", into = indexcols, sep = "##", remove = TRUE) -> outdf2 +write.table(outdf2, file = outfile2, sep = "\t", quote = FALSE, row.names = FALSE) diff --git a/scripts/run_jobby_on_nextflow_log b/scripts/run_jobby_on_nextflow_log new file mode 100755 index 0000000..c589a1c --- /dev/null +++ b/scripts/run_jobby_on_nextflow_log @@ -0,0 +1,3 @@ +#!/usr/bin/env bash +nextflowlog=$1 +jobby $(awk -F" jobId: " '{print $2}' ${nextflowlog} | awk -F";" '{print $1}' | grep -v "^$" | sort | uniq | tr "\\n" " ") |cut -f2,3,18 diff --git a/scripts/run_jobby_on_nextflow_log_full_format b/scripts/run_jobby_on_nextflow_log_full_format new file mode 100755 index 0000000..d912be7 --- /dev/null +++ b/scripts/run_jobby_on_nextflow_log_full_format @@ -0,0 +1,3 @@ +#!/usr/bin/env bash +nextflowlog=$1 +jobby $(awk -F" jobId: " '{print $2}' ${nextflowlog} | awk -F";" '{print $1}' | grep -v "^$" | sort | uniq | tr "\\n" " ") diff --git a/scripts/run_jobby_on_snakemake_log b/scripts/run_jobby_on_snakemake_log new file mode 100755 index 0000000..4235b84 --- /dev/null +++ b/scripts/run_jobby_on_snakemake_log @@ -0,0 +1,3 @@ +#!/usr/bin/env bash +snakemakelog=$1 +jobby $(grep --color=never "^Submitted .* with external jobid" $snakemakelog | awk '{{print $NF}}' | sed "s/['.]//g" | sort | uniq | tr "\\n" " ") |cut -f2,3,18 diff --git a/scripts/run_jobby_on_snakemake_log_full_format b/scripts/run_jobby_on_snakemake_log_full_format new file mode 100755 index 0000000..412fec8 --- /dev/null +++ b/scripts/run_jobby_on_snakemake_log_full_format @@ -0,0 +1,3 @@ +#!/usr/bin/env bash +snakemakelog=$1 +jobby $(grep --color=never "^Submitted .* with external jobid" $snakemakelog | awk '{{print $NF}}' | sed "s/['.]//g" | sort | uniq | tr "\\n" " ") diff --git a/scripts/spooker b/scripts/spooker new file mode 100755 index 0000000..2f15692 --- /dev/null +++ b/scripts/spooker @@ -0,0 +1,149 @@ +#!/usr/bin/env bash +# This script is designed to be part of the +# - onerror +# - oncomplete +# part of Snakefiles to gather info about: +# 1. the pipeline +# 2. the user +# 3. other metadata +# This data is then tar-gzipped and saved to a common location. +# For runs on BIOWULF: +# - tarball is saved to /scratch/ccbrpipeliner using the spook tool +# - a cronjob then picks up the tarball and saves it to /data/CCBR_Pipeliner/userdata/ccbrpipeliner +# - [TODO] another cronjob then reads files under /data/CCBR_Pipeliner/userdata/ccbrpipeliner and generates +# detailed HTML reports about pipeline usages. +# For runs on FRCE: +# - tarball is saved to /mnt/projects/CCBR-Pipelines/pipelines/userdata/ccbrpipeliner by simple cp command +# - [TODO] a cronjob then picks up the tarball and moves it to biowulf at /data/CCBR_Pipeliner/userdata/ccbrpipeliner/frce +# - [TODO] another cronjob then reads files under /data/CCBR_Pipeliner/userdata/ccbrpipeliner/frce and adds to the +# detailed HTML report about pipeline usages. + +# requires 2 inputs: +# 1. Pipelines outputdir ... absolute path +# 2. name of the pipeline ... eg. RENEE or XAVIER + +SCRIPTNAME="$0" +SCRIPTBASENAME=$(basename $SCRIPTNAME) + +# 2 arguments are required ... PIPELINE_OUTDIR and PIPELINE_NAME +if [[ "$#" != "2" ]];then + echo "$SCRIPTBASENAME FAILED!: ERROR: 2 arguments expected!" + echo "$SCRIPTBASENAME FAILED!: ERROR: Argument 1: pipeline outdir" + echo "$SCRIPTBASENAME FAILED!: ERROR: Argument 2: pipeline name" + exit 1 +fi + +set -o pipefail +PIPELINE_OUTDIR=$1 +PIPELINE_NAME=$2 + +PIPELINE_OUTDIR_SIZE=$(du -bs $PIPELINE_OUTDIR | awk '{print $1}') +PIPELINE_NAME_UPPER=$(echo "$PIPELINE_NAME" | tr '[:lower:]' '[:upper:]') +PIPELINE_NAME_LOWER=$(echo "$PIPELINE_NAME" | tr '[:upper:]' '[:lower:]') +PIPELINE_PATH=$(which $PIPELINE_NAME_LOWER) +PIPELINE_VERSION=$($PIPELINE_NAME_LOWER --version 2>/dev/null | tail -n1 | awk '{print $NF}' || echo "UNKNOWN") + +DT=$(date +%y%m%d%H%M%S) +archivefile="${PIPELINE_OUTDIR}/${DT}.tar.gz" +treefile="${PIPELINE_OUTDIR}/${DT}.tree.json" +metadata="${PIPELINE_OUTDIR}/${DT}.json" + +SCONTROL=$(type -P scontrol) +if [[ "$SCONTROL" == "" ]];then + echo "$SCRIPTBASENAME FAILED!: ERROR: scontrol command not in PATH!" + echo "$SCRIPTBASENAME FAILED!: ERROR: usage metadata cannot be collected!!" + exit 1 +fi + +# create the archive with all metadata +dryrunlogfile="" +if [[ -d "$PIPELINE_OUTDIR" ]];then + # find the newest dryrun file + dryrunlogfile=$(ls -rt ${PIPELINE_OUTDIR}/dryrun*log 2>/dev/null |tail -n1 || echo "") + cmd="tar czvf ${archivefile}" + if [[ "$dryrunlogfile" != "" ]];then + cmd="$cmd $dryrunlogfile" + fi + # gather some info + echo "PIPELINE_OUTDIR: $PIPELINE_OUTDIR" > $metadata + echo "PIPELINE_OUTDIR_SIZE: $PIPELINE_OUTDIR_SIZE" >> $metadata + echo "PIPELINE_NAME: $PIPELINE_NAME_UPPER" >> $metadata + echo "PIPELINE_PATH: $PIPELINE_PATH" >> $metadata + echo "PIPELINE_VERSION: $PIPELINE_VERSION" >> $metadata + echo "USER: $USER" >> $metadata + #GROUPS=$(groups 2>/dev/null) + echo "GROUPS:" $(groups) >> $metadata + echo "DATE: $DT" >> $metadata + tree -J $PIPELINE_OUTDIR > $treefile + cmd="$cmd $metadata $treefile" + +# files from pipelines in written in snakemake + if [[ -d "${PIPELINE_OUTDIR}/logfiles" ]];then + logdir="${PIPELINE_OUTDIR}/logfiles" + for thisfile in "snakemake.log" "snakemake.log.jobby" "master.log" "runtime_statistics.json";do + absthisfile="${logdir}/${thisfile}" + if [[ -f "$absthisfile" ]];then + cmd="$cmd $absthisfile" + fi + done + fi + +# [TODO] files from pipelines in written in nextflow +# [TODO] ... add nextflow related files here ... + + echo "$SCRIPTBASENAME: $cmd" + $cmd && echo "$SCRIPTBASENAME: $archivefile created!" + rm -f $metadata $treefile + +else # PIPELINE_OUTDIR does not exist! + echo "$SCRIPTBASENAME FAILED!: ERROR: $PIPELINE_OUTDIR does not exist!" + echo "$SCRIPTBASENAME FAILED!: ERROR: usage metadata cannot be collected!!" + exit 1 +fi + +# check if you are on BIOWULF or FRCE +clustername=$(scontrol show config|grep -i clustername|awk '{print $NF}') +if [[ "$clustername" == "biowulf" ]];then ISBIOWULF=true;else ISBIOWULF=false;fi +if [[ "$clustername" == "fnlcr" ]];then ISFRCE=true;else ISFRCE=false;fi + +if [[ $ISBIOWULF == true || $ISFRCE == true ]];then + if [[ $ISBIOWULF == true ]];then + SPOOK=$(type -P spook) + if [[ "$SPOOK" == "" ]];then + echo "$SCRIPTBASENAME: spook is NOT in PATH." + echo "$SCRIPTBASENAME: trying to add it by sourcing /data/CCBR_Pipeliner/cronjobs/scripts/setup" + . "/data/CCBR_Pipeliner/cronjobs/scripts/setup" + SPOOK=$(type -P spook) + if [[ "$SPOOK" == "" ]];then + echo "$SCRIPTBASENAME FAILED!: ERROR: spook is still not in PATH!" + echo "$SCRIPTBASENAME FAILED!: ERROR: usage metadata cannot be collected!!" + exit 1 + fi + fi + echo "$SCRIPTBASENAME: spook is now in PATH:$SPOOK" + SPOOK_COPY2DIR="/scratch/ccbrpipeliner" + fi + if [[ $ISFRCE == true ]];then + SPOOK_COPY2DIR="/mnt/projects/CCBR-Pipelines/pipelines/userdata/ccbrpipeliner" + fi + echo "$SCRIPTBASENAME: SPOOK_COPY2DIR: $SPOOK_COPY2DIR" + + # copy over the metadata archive + if [ -f "${archivefile}" ];then + if [[ $ISBIOWULF == true ]]; then + cmd="$SPOOK -f ${archivefile} -d $SPOOK_COPY2DIR" + echo "$SCRIPTBASENAME: $cmd" + $cmd + fi + if [[ $ISFRCE == true ]];then + cmd="cp -rv ${archivefile} $SPOOK_COPY2DIR" + echo "$SCRIPTBASENAME: $cmd" + $cmd + fi + fi + +else # not biowulf or frce ... so exit + echo "$SCRIPTBASENAME FAILED!: ERROR: Neither on BIOWULF Nor on FRCE" + echo "$SCRIPTBASENAME FAILED!: ERROR: $archivefile created but NOT copied!" + exit 1 +fi diff --git a/scripts/which_vpn.sh b/scripts/which_vpn.sh index 0a83f06..d12fe7d 100755 --- a/scripts/which_vpn.sh +++ b/scripts/which_vpn.sh @@ -20,7 +20,7 @@ ip=$(ifconfig -a|grep "inet 10."|awk '{print $2}') if [[ "$ip" == "" ]] then - echo "Are you really connected to VPN?? Doesnt look like it!" + echo "Are you really connected to VPN?? Doesn't look like it!" exit 1 fi @@ -36,7 +36,7 @@ elif [[ "$numbertwo" == "242" || "$numbertwo" == "243" ]] then echo "You are connected to the BETHESDA VPN!" exit 0 -else +else echo "Sorry, I cannot guess which VPN you are connect to!" exit 0 fi diff --git a/src/ccbr_tools/CITATION.cff b/src/ccbr_tools/CITATION.cff new file mode 100644 index 0000000..a80373d --- /dev/null +++ b/src/ccbr_tools/CITATION.cff @@ -0,0 +1,24 @@ +cff-version: 1.2.0 +message: "Please cite CCBR Tools as below" +authors: + - family-names: Sovacool + given-names: Kelly + orcid: https://orcid.org/0000-0003-3283-829X + affiliation: Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA + - family-names: Koparde + given-names: Vishal + orcid: https://orcid.org/0000-0001-8978-8495 + affiliation: Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA + - family-names: Kuhn + given-names: Skyler +title: "CCBR Tools: Utilities for CCBR Bioinformatics Software" +url: https://ccbr.github.io/Tools/ +repository-code: https://github.com/CCBR/Tools +license: MIT +type: software +#identifiers: +# - description: "Archived snapshots of all versions" +# type: doi +# value: # TODO add all-versions DOI from Zenodo here +#version: # TODO update the version here *before* cutting a new release +#date-released: # TODO update the release date *before* cutting a new release diff --git a/GSEA/NCBRgseaheatmap/DESCRIPTION b/src/ccbr_tools/GSEA/NCBRgseaheatmap/DESCRIPTION similarity index 100% rename from GSEA/NCBRgseaheatmap/DESCRIPTION rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/DESCRIPTION diff --git a/GSEA/NCBRgseaheatmap/NAMESPACE b/src/ccbr_tools/GSEA/NCBRgseaheatmap/NAMESPACE similarity index 100% rename from GSEA/NCBRgseaheatmap/NAMESPACE rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/NAMESPACE diff --git a/GSEA/NCBRgseaheatmap/R/gsea_to_mtx.R b/src/ccbr_tools/GSEA/NCBRgseaheatmap/R/gsea_to_mtx.R similarity index 100% rename from GSEA/NCBRgseaheatmap/R/gsea_to_mtx.R rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/R/gsea_to_mtx.R diff --git a/GSEA/NCBRgseaheatmap/R/pathway_heatmap.R b/src/ccbr_tools/GSEA/NCBRgseaheatmap/R/pathway_heatmap.R similarity index 100% rename from GSEA/NCBRgseaheatmap/R/pathway_heatmap.R rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/R/pathway_heatmap.R diff --git a/GSEA/NCBRgseaheatmap/man/gsea_to_mtx.Rd b/src/ccbr_tools/GSEA/NCBRgseaheatmap/man/gsea_to_mtx.Rd similarity index 100% rename from GSEA/NCBRgseaheatmap/man/gsea_to_mtx.Rd rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/man/gsea_to_mtx.Rd diff --git a/GSEA/NCBRgseaheatmap/man/pathway_heatmap.Rd b/src/ccbr_tools/GSEA/NCBRgseaheatmap/man/pathway_heatmap.Rd similarity index 100% rename from GSEA/NCBRgseaheatmap/man/pathway_heatmap.Rd rename to src/ccbr_tools/GSEA/NCBRgseaheatmap/man/pathway_heatmap.Rd diff --git a/GSEA/NCBRgseaheatmap_1.0.tar.gz b/src/ccbr_tools/GSEA/NCBRgseaheatmap_1.0.tar.gz similarity index 100% rename from GSEA/NCBRgseaheatmap_1.0.tar.gz rename to src/ccbr_tools/GSEA/NCBRgseaheatmap_1.0.tar.gz diff --git a/GSEA/README.md b/src/ccbr_tools/GSEA/README.md similarity index 100% rename from GSEA/README.md rename to src/ccbr_tools/GSEA/README.md diff --git a/src/ccbr_tools/GSEA/deg2gs.py b/src/ccbr_tools/GSEA/deg2gs.py new file mode 100755 index 0000000..5701f0d --- /dev/null +++ b/src/ccbr_tools/GSEA/deg2gs.py @@ -0,0 +1,263 @@ +#!/usr/bin/env python + +""" +Susan Huse +NIAID Center for Biological Research +Frederick National Laboratory for Cancer Research +Leidos Biomedical + +deg2gs.py + Reads a rnaseq pipeliner *_DEG_all_genes.txt file + and outputs a prioritized list of Ensembl gene IDs for ToppFun + +v 1.0 - initial code version. +v 1.1 - updated for new column headers in pipeliner limma_DEG_all_genes.txt +v 1.2 - top2Excel format is now csv rather than tab-delimited + +""" + +__author__ = "Susan Huse" +__date__ = "August 6, 2018" +__version__ = "1.1" +__copyright__ = "No copyright protection, can be used freely" + +import sys +import os +import re +import datetime +import pandas as pd + +import argparse +from argparse import RawTextHelpFormatter +from ncbr_huse import send_update + + +#################################### +# +# Functions +# +#################################### +def filter_by_p(x, nhits, pvalue, qvalue): + # Filter the data to nhits, pvalue, qvalue + x.sort_values(by=["p"], inplace=True) + x = x[(x["p"] <= pvalue) & (x["q"] <= qvalue)] + if x.shape[0] > nhits: + x = x.head(nhits) + return x + + +#################################### +# +# Main +# +#################################### + + +def main(): + # Usage statement + parseStr = "Reads RNASeq differential expression output files\n\ +and outputs a prioritized list of genes for use in GSEA or ToppFun.\n\ +Will filter by both p and fdr values, and export up to nhits values.\n\ +Reads several limma output versions (topTable, pipeliner, top2Excel).\n\ +Outputs gene name and gsea rank for GSEA or Enssemble IDs for ToppFun.\n\ +NB: GSEA cannot support hyphens in filenames, theyΒ will be replaced with underscores.\n\n\ +Usage:\n\ + deg2gs.py -i infile -o outfile -n nHits -p pvalue -q fdrvalue -m method -s sheetname -f format\n\n\ +Example:\n\ + deg2gs.py -i ExpAll_limma_DEG_all_genes.txt -o ExpAll_limma_all_genes.gsea.rnk -n 1000 -q 0.05 -m gsea -f pipeliner\n\ + deg2gs.py -i DEanalysis.xlsx -o DEanalysis.topp.rnk -n 1000 -p 1e-05 -q 5e-02 -m toppfun -f topTable -s Sheet1\n\n" + + parser = argparse.ArgumentParser( + description=parseStr, formatter_class=RawTextHelpFormatter + ) + parser.add_argument( + "-i", + "--infile", + required=True, + nargs="?", + type=argparse.FileType("r"), + default=None, + help="Input file containing important things", + ) + parser.add_argument( + "-o", + "--outfile", + required=True, + action="store", + type=str, + default=None, + help="Output file for important results", + ) + parser.add_argument( + "-m", + "--method", + type=str, + action="store", + default=None, + choices=["gsea", "toppfun"], + help="Method for gene set analysis (gsea or toppfun)", + ) + parser.add_argument( + "-n", + "--nhits", + type=int, + action="store", + required=False, + default=500, + help="Maximum number of top hits to extract, default = 500", + ) + parser.add_argument( + "-p", + "--pvalue", + type=float, + action="store", + required=False, + default=0.05, + help="Maximum p-value threshold to export, default = 0.05", + ) + parser.add_argument( + "-q", + "--qvalue", + type=float, + action="store", + required=False, + default=0.10, + help="Maximum FDR correction value to export, default = 0.10", + ) + parser.add_argument( + "-s", + "--sheetname", + type=str, + action="store", + default=None, + help="Sheetname if input file is Excel rather than text (required for *.xlsx)", + ) + parser.add_argument( + "-f", + "--fformat", + type=str, + action="store", + default="pipeliner", + choices=["pipeliner", "topTable", "top2Excel"], + help="Input file format for running gsea." + + "Pipeliner output has different column names than limma topTable", + ) + + # + # Set up the variables and the log file + # + args = parser.parse_args() + infile = args.infile + outfile = args.outfile + nhits = args.nhits + pvalue = args.pvalue + qvalue = args.qvalue + method = args.method + sheetname = args.sheetname + fformat = args.fformat + + # don't really need a log file for this + keepLog = False + + # replace hyphens with underscore, GSEA can't support hyphens + # NB: this replaces for toppfun as well, so the names are consistent, but you don't have to do both + fname = re.sub("-", "_", outfile) + + # Column names for each format, to read and write + # NB: in_cols is not the same names as in the input file, but standardizes for this code + # Assumes that ensid_gene is the row index + if fformat == "pipeliner": + in_cols = ["gene", "fc", "log2FC", "p", "q", "gsea"] + + elif fformat == "topTable": + in_cols = ["log2FC", "AveExpr", "gsea", "p", "q", "B"] + + elif fformat == "top2Excel": + in_cols = ["ensid", "gene", "log2FC", "AveExpr", "gsea", "p", "q", "B"] + + gsea_cols = ["gene", "gsea"] + # top_cols = ['ensid'] + top_cols = ["gene"] + + # If keepLog then set it up + if keepLog: + thedate = str(datetime.datetime.now()).split()[0] + thedate = re.sub("-", "", thedate) + + log = open("deg2gs" + ".log", "a") + log.write("\n" + str(datetime.datetime.now()) + "\n") + log.write(" ".join(sys.argv) + "\n") + log.write("deg2gs.py version " + __version__ + "\n") + log.write( + "Exporting genes to {}, n={}, p={}, q={}, ".format( + outfile.name, nhits, pvalue, qvalue + ) + ) + log.flush() + + # + # Import from Excel or tabSV, if Excel extension than excel otherwise csv + # + infile_name, infile_extension = os.path.splitext(infile.name) + if infile_extension in [".xls", ".xlsx"]: + if sheetname is None: + err_out("Input Excel file requires sheet name", log) + df = pd.read_excel(infile.name, sheet_name=sheetname, header=0, index_col=0) + + else: + # df = pd.read_csv(infile, sep='\t', header=0, index_col=0) + df = pd.read_csv(infile, sep=",", header=0, index_col=0) + + # + # Set the columns based on the input format and the analysis method, but only if theyΒ match + # + if df.shape[1] != len(in_cols): + errMsg = ( + '\nYour input file does not match the expected format "{}".\n'.format( + fformat + ) + + "Please check the file or the selected format and try again\n." + ) + if keepLog: + err_out(errMsg, log) + else: + print(errMsg) + sys.exit(1) + + df.columns = in_cols + + # split the ensemblID|Gene if necessary + if fformat == "topTable": # and method == 'gsea': + df["gene"] = [re.sub("^.*\|", "", i) for i in df.index.values.tolist()] + + # Filter the df to the p-values, FDR, and number of hits specified + df = filter_by_p(df, nhits, pvalue, qvalue) + + # + # Grab the relevant columns and write out the file + # + if method == "gsea": + df = df.filter(items=gsea_cols) + df.to_csv(fname, index=False, header=False, sep="\t") + + elif method == "toppfun": + # # Get clean ensemble IDs for toppfun, strip off the .*$ from the Ensembl IDs and export them + # # top2Excel already has it + # if fformat != 'top2Excel': + # df['ensid'] = [re.sub("\..*$", "", i) for i in df.index.values.tolist()] + df = df.filter(items=top_cols) + df.to_csv(fname, index=False, header=False) + + # + # Close out the log file + # + if keepLog: + send_update("deg2gs.py successfully completed. {} written.".format(fname), log) + send_update(str(datetime.datetime.now()) + "\n", log) + log.close() + else: + print("deg2gs.py successfully completed. {} written.".format(fname)) + + +if __name__ == "__main__": + main() diff --git a/src/ccbr_tools/GSEA/multitext2excel.py b/src/ccbr_tools/GSEA/multitext2excel.py new file mode 100644 index 0000000..cfea3fc --- /dev/null +++ b/src/ccbr_tools/GSEA/multitext2excel.py @@ -0,0 +1,186 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +""" +Created on Mon Aug 6 14:59:13 2018 + +Susan Huse +NIAID Center for Biological Research +Frederick National Laboratory for Cancer Research +Leidos Biomedical + +multitext2excel.py + Reads a list of files to import as separate tabs in Excel + +v 1.0 - initial code version. +v 1.1 - updated to include first splitter markowitzte@nih.gov + +""" +__author__ = "Susan Huse" +__date__ = "August 6, 2018" +__version__ = "1.1" +__copyright__ = "No copyright protection, can be used freely" + +# import csv +import sys +import os +import re +import datetime +import pandas as pd +import glob + +# import scipy +# import numpy + +import argparse +from argparse import RawTextHelpFormatter +from ncbr_huse import ( + run_cmd, + run_os_cmd, + un_gzip, + send_update, + err_out, + fasta_count, + fasta_list, +) + + +#################################### +# +# Functions +# +#################################### + +# +# Set up the parameters for the USEARCH command and run it +# + + +#################################### +# +# Main +# +#################################### + + +def main(): + # Usage statement + parseStr = 'Reads a list of files and imports them each into a separate tab in one Excel spreadsheet.\n\n\ + Usage:\n\ + multitext2excel.py -o outfile -d directory -p filepattern -k delimiter -s namesplitter\n\ + Example:\n\ + multitext2excel.py -o MyResults.xlsx -d analysis -p ".txt" -k "\t" -s "."\n' + + parser = argparse.ArgumentParser( + description=parseStr, formatter_class=RawTextHelpFormatter + ) + # parser.add_argument('-i', '--infile', required=True, nargs='?', type=argparse.FileType('r'), default=None, + # help='Input file containing important things') + parser.add_argument( + "-o", + "--outfile", + required=True, + nargs="?", + type=argparse.FileType("w"), + default=None, + help="Output file for important results", + ) + parser.add_argument( + "-d", + "--indir", + required=False, + type=str, + action="store", + default=".", + help='Input directory containing data files to import [default="."]', + ) + parser.add_argument( + "-p", + "--pattern", + required=True, + type=str, + help="Pattern used to create list of input data files", + ) + parser.add_argument( + "-k", + "--delimiter", + required=False, + type=str, + default="\t", + help='character delimiter that separates columns in each of the input data files [default="\t"]', + ) + parser.add_argument( + "-s", + "--splitter", + required=False, + type=str, + default=".", + help='character to split input filenames to create output tab names. Cuts everything to the right [default="."]', + ) + parser.add_argument( + "-f", + "--firstsplitter", + required=False, + type=str, + default="", + help='character to split input filenames to create output tab names. Cuts everything to the left [default=""]', + ) + + # + # Set up the variables and the log file + # + args = parser.parse_args() + # infile = args.infile + outfile = args.outfile + pattern = args.pattern + delimiter = args.delimiter + splitter = args.splitter + firstsplitter = args.firstsplitter + indir = args.indir + + thedate = str(datetime.datetime.now()).split()[0] + thedate = re.sub("-", "", thedate) + + # Set up the log file + log = open("multitext2excel" + ".log", "a") + log.write("\n" + str(datetime.datetime.now()) + "\n") + log.write(" ".join(sys.argv) + "\n") + log.write("multitext2excel.py version " + __version__ + "\n") + log.flush() + + # Read for each matching file, read in and export to the output file + pattern = "*" + pattern + "*" + writer = pd.ExcelWriter(outfile.name) + for filename in glob.glob(os.path.join(indir, pattern)): + # Extract the output tab name + # sheet_name = os.path.basename(filename).split(splitter)[0] + sheet_name = re.sub(indir + "/", "", filename).split(splitter)[0] + if firstsplitter != "": + sheet_name = sheet_name.split(firstsplitter)[1] + print( + "Writing data from input file: {} to output tab: {}".format( + filename, sheet_name + ) + ) + + # Read in the data + df = pd.read_csv(filename, sep=delimiter, header=0, encoding="unicode_escape") + + # Write out the data + df.to_excel(writer, index=False, sheet_name=sheet_name) + + # Close it up! + writer.save() + + # + # Close out the log file + # + send_update( + "multitext2excel.py successfully completed. {} written.".format(outfile.name), + log, + ) + send_update(str(datetime.datetime.now()) + "\n", log) + log.close() + + +if __name__ == "__main__": + main() diff --git a/GSEA/ncbr_huse.py b/src/ccbr_tools/GSEA/ncbr_huse.py similarity index 69% rename from GSEA/ncbr_huse.py rename to src/ccbr_tools/GSEA/ncbr_huse.py index e166b1a..d6d6390 100644 --- a/GSEA/ncbr_huse.py +++ b/src/ccbr_tools/GSEA/ncbr_huse.py @@ -5,12 +5,12 @@ ncbr_huse.py Set of functions supporting the FNL NCBR work - + """ -__author__ = 'Susan Huse' -__version__ = '1.0.0' -__copyright__ = 'none' +__author__ = "Susan Huse" +__version__ = "1.0.0" +__copyright__ = "none" import csv import sys @@ -19,13 +19,14 @@ import datetime import subprocess import pandas as pd - + #################################### -# -# Functions +# +# Functions # #################################### + # # Run any command and send notice to log file. if dorun=F, just notify for testing # @@ -38,6 +39,7 @@ def run_cmd(theCommand, fn, dorun): if dorun: subprocess.check_call(theCommand, stderr=fn) + # # Run an OS command and notify log file # @@ -51,15 +53,16 @@ def run_os_cmd(theCommand, fn, dorun): if dorun: os.system(theCommandStr) + # # gunzip a file # def un_gzip(fname, logfn): # If rerunning and the previous step has already been compressed, # need to uncompress it before you can rerun the command - gzip_name = fname + '.gz' + gzip_name = fname + ".gz" if (not os.path.isfile(fname)) and os.path.isfile(gzip_name): - ungz_cmd = ['gunzip', gzip_name] + ungz_cmd = ["gunzip", gzip_name] run_cmd(ungz_cmd, logfn, True) @@ -73,13 +76,16 @@ def con_db(host_name, db_name, port_number): ## password="password" cnfdata = f.readlines() - cnfdata = [x.strip() for x in cnfdata] - cnfdata = [x.replace('"', '') for x in cnfdata] - user_name = cnfdata[1].replace('user=', '') - password = cnfdata[2].replace('password=', '') - - db = MySQLdb.connect(host=host_name, db=db_name, port=port_number, user=user_name, passwd=password) - return(db) + cnfdata = [x.strip() for x in cnfdata] + cnfdata = [x.replace('"', "") for x in cnfdata] + user_name = cnfdata[1].replace("user=", "") + password = cnfdata[2].replace("password=", "") + + db = MySQLdb.connect( + host=host_name, db=db_name, port=port_number, user=user_name, passwd=password + ) + return db + # # Print updates to screen and log file @@ -89,29 +95,29 @@ def send_update(updateStr, log=None, quiet=False): print(updateStr) if log is not None: - log.write(updateStr + '\n') - return(0) + log.write(updateStr + "\n") + return 0 + # # Log error message and exit # -def err_out(errMsg, log = None): +def err_out(errMsg, log=None): if log is not None: log.write(errMsg) sys.exit(errMsg) + # # Pause for user to be ready to continue, use contkey=None to get any input # -def pause_for_input(txt, contkey='y', quitkey='q', log=None): - +def pause_for_input(txt, contkey="y", quitkey="q", log=None): # tally the number of tries - ans_cnt = 0 + ans_cnt = 0 # loop for the user to enter input, give them a few tries while True: - # wait for the input ans = input(txt) @@ -121,23 +127,30 @@ def pause_for_input(txt, contkey='y', quitkey='q', log=None): # if none, just return the input if contkey is None: - return(ans) + return ans # if there is a contkey, then be sure it is correctly typed elif ans == contkey: - return(ans) + return ans else: - # give them additional help and increment the answer count - reminder = "Note: only {} to continue and {} to quit are valid options.\nPlease try again.\n".format(contkey, quitkey) + # give them additional help and increment the answer count + reminder = "Note: only {} to continue and {} to quit are valid options.\nPlease try again.\n".format( + contkey, quitkey + ) if ans_cnt == 0: txt = "\n" + txt + "\n" + reminder # Otherwise 3 strikes and exit from the loop if ans_cnt == 2: - err_out("User failed to continue ({}) or quit ({}) three times in a row. Exiting...".format(contkey, quitkey), log) + err_out( + "User failed to continue ({}) or quit ({}) three times in a row. Exiting...".format( + contkey, quitkey + ), + log, + ) - ans_cnt = ans_cnt + 1 + ans_cnt = ans_cnt + 1 # @@ -146,9 +159,10 @@ def pause_for_input(txt, contkey='y', quitkey='q', log=None): def fasta_count(fastaFile): seqcount = 0 for line in open(fastaFile, "r"): - if re.match(">", line): - seqcount += 1 - return(seqcount) + if re.match(">", line): + seqcount += 1 + return seqcount + # # Count sequences in a fasta file @@ -156,7 +170,6 @@ def fasta_count(fastaFile): def fasta_list(fastaFile): seqs = [] for line in open(fastaFile, "r"): - if re.match(">", line): - seqs.append(re.sub(">", "", line.rstrip())) - return(seqs) - + if re.match(">", line): + seqs.append(re.sub(">", "", line.rstrip())) + return seqs diff --git a/src/ccbr_tools/VERSION b/src/ccbr_tools/VERSION new file mode 100644 index 0000000..0d4d124 --- /dev/null +++ b/src/ccbr_tools/VERSION @@ -0,0 +1 @@ +0.1.0-dev diff --git a/src/ccbr_tools/__main__.py b/src/ccbr_tools/__main__.py new file mode 100644 index 0000000..e36da97 --- /dev/null +++ b/src/ccbr_tools/__main__.py @@ -0,0 +1,95 @@ +""" +Entry point for CCBR Tools +""" + +import click +import cffconvert.cli.cli + +from .util import get_project_scripts, get_version, print_citation, repo_base + + +class CustomClickGroup(click.Group): + def format_epilog(self, ctx, formatter): + if self.epilog: + formatter.write_paragraph() + for line in self.epilog.split("\n"): + formatter.write_text(line) + + def list_commands(self, ctx: click.Context): + """Preserve the order of subcommands when printing --help""" + return list(self.commands) + + +all_commands = "All installed tools:\n" + "\n".join( + [f" {cmd}" for cmd in get_project_scripts()] +) + + +@click.group( + cls=CustomClickGroup, + context_settings=dict(help_option_names=["-h", "--help"]), + epilog=all_commands, +) +@click.version_option(get_version(), "-v", "--version", is_flag=True) +def cli(): + """ + Utilities for CCBR Bioinformatics Software + + For more options, run: + tool_name [command] --help + + """ + pass + + +@click.command() +@click.argument( + "citation_file", + type=click.Path(exists=True), + required=True, + default=repo_base("CITATION.cff"), +) +@click.option( + "--output-format", + "-f", + default="bibtex", + help="Output format for the citation", + type=cffconvert.cli.cli.options["outputformat"]["type"], +) +def cite(citation_file, output_format): + """ + Print the citation in the desired format + + citation_file : Path to a file in Citation File Format (CFF) [default: the CFF for ccbr_tools] + """ + print_citation(citation_file=citation_file, output_format=output_format) + + +@click.command() +@click.option( + "--debug", + "-d", + help="Print the path to the VERSION file", + type=bool, + default=False, + is_flag=True, +) +def version(debug): + """ + Print the version of ccbr_tools + """ + print(get_version(debug=debug)) + + +cli.add_command(cite) +cli.add_command(version) + + +def main(): + cli() + + +cli(prog_name="ccbr_tools") + +if __name__ == "__main__": + main() diff --git a/src/ccbr_tools/gb2gtf.py b/src/ccbr_tools/gb2gtf.py new file mode 100644 index 0000000..dec0c88 --- /dev/null +++ b/src/ccbr_tools/gb2gtf.py @@ -0,0 +1,141 @@ +# download GenBank file from NCBI and then +# Usage:python gb2gtf.py sequence.gb > sequence.gtf + +import sys +from Bio.Seq import Seq +from Bio.SeqRecord import SeqRecord +from Bio.SeqFeature import SeqFeature, FeatureLocation +from Bio import SeqIO +import Bio + + +def main(): + if check_args(sys.argv): + gb2gtf() + + +def check_args(args): + valid_usage = True + if len(args) < 2 or "-h" in args or "--help" in args: + print("Usage: gb2gtf sequence.gb > sequence.gtf") + valid_usage = False + return valid_usage + + +def gb2gtf(): + args = sys.argv + + # get all sequence records for the specified genbank file + recs = [rec for rec in SeqIO.parse(args[1], "genbank")] + + # print the number of sequence records that were extracted + # print(len(recs)) + + # print annotations for each sequence record + # for rec in recs: + # print(rec.annotations) + + # print the CDS sequence feature summary information for each feature in each + # sequence record + for rec in recs: + # print(type(rec)) + seqname = rec.id + # feats = [feat for feat in rec.features if feat.type == "CDS"] + feats = [feat for feat in rec.features] + for feat in feats: + # print(feat) + l = feat.location + start = l.start + end = l.end + if feat.strand == 1: + strand = "+" + else: + strand = "-" + if feat.type == "gene": + gffstring = list() + gffstring.append(seqname) + gffstring.append("RefSeq") + gffstring.append("gene") + gffstring.append(str(start)) + gffstring.append(str(end)) + gffstring.append(".") + gffstring.append(strand) + gffstring.append(".") + q = feat.qualifiers + try: + gene = q["gene"][0] + except: + try: + gene = q["locus_tag"][0] + except: + exit("Something fishy!") + + x = 'gene_name "%s"; gene_id "%s"' % (gene, gene) + gffstring.append(x) + print("\t".join(gffstring) + ";") + # #print(feat.qualifiers.keys()) + # #print(feat.qualifiers.values()) + elif feat.type == "CDS": + # if feat.type=="CDS": + gffstring = list() + gffstring.append(seqname) + gffstring.append("RefSeq") + gffstring.append("transcript") + gffstring.append(str(start)) + gffstring.append(str(end)) + gffstring.append(".") + gffstring.append(strand) + gffstring.append(".") + q = feat.qualifiers + try: + gene = q["gene"][0] + except: + try: + gene = q["locus_tag"][0] + except: + exit("Something fishy!") + x = ( + 'gene_name "%s"; gene_id "%s"; transcript_id "%s"; transcript_name "%s"' + % (gene, gene, gene, gene) + ) + gffstring.append(x) + print("\t".join(gffstring) + ";") + gffstring[2] = "exon" + if isinstance(l, Bio.SeqFeature.CompoundLocation): + parts = l.parts + # lenparts=len(parts) + for i, part in enumerate(parts): + j = i + 1 + start = part.start + end = part.end + gffstring2 = gffstring + gffstring2[3] = str(start) + gffstring2[4] = str(end) + y = x + "; exon_number %s" % (str(j)) + gffstring2[8] = y + print("\t".join(gffstring2) + ";") + else: + y = x + "; exon_number 1" + gffstring[8] = y + print("\t".join(gffstring) + ";") + + # print(j,part) + else: + continue + + # else: + + # print(l.start) + # exit() + # for l in feat.location: + # print(l.start) + # print(l.end) + # print(l.strand) + # exit() + # print(type(feat.location)) + # print(feat.strand) + # exit() + + +if __name__ == "__main__": + main() diff --git a/homologfinder/HOM_MouseHumanSequence.rpt b/src/ccbr_tools/homologfinder/HOM_MouseHumanSequence.rpt similarity index 100% rename from homologfinder/HOM_MouseHumanSequence.rpt rename to src/ccbr_tools/homologfinder/HOM_MouseHumanSequence.rpt diff --git a/homologfinder/README.md b/src/ccbr_tools/homologfinder/README.md similarity index 100% rename from homologfinder/README.md rename to src/ccbr_tools/homologfinder/README.md diff --git a/src/ccbr_tools/homologfinder/hf.py b/src/ccbr_tools/homologfinder/hf.py new file mode 100755 index 0000000..a67f561 --- /dev/null +++ b/src/ccbr_tools/homologfinder/hf.py @@ -0,0 +1,170 @@ +#!/usr/bin/env python3 + +""" +About: + hf or HomologFinder finds homologs in human and mouse. + if the input gene or genelist is human, then it returns mouse homolog(s) and vice versa +USAGE: + $ hf -h +Example: + $ hf -g ZNF365 + $ hf -l Wdr53,Zfp365 + $ hf -f genelist.txt +""" + +__version__ = "v1.0.0" +__author__ = "Vishal Koparde" +__email__ = "vishal.koparde@nih.gov" + +import argparse +import io +import pandas as pd +import requests +import sys + + +def exit_w_msg(message): + """Gracefully exit with proper message""" + print("{} : EXITING!!".format(__file__)) + print(message) + sys.exit() + + +def check_help(parser): + """check if usage needs to be printed""" + if "-h" in sys.argv or "--help" in sys.argv or len(sys.argv) == 1: + print(__doc__) + parser.print_help() + parser.exit() + return + + +def collect_args(): + """collect all the cli arguments""" + # create parser + parser = argparse.ArgumentParser( + description="Get Human2Mouse (or Mouse2Human) homolog gene or genelist" + ) + + # add version + parser.add_argument( + "-v", "--version", action="version", version="%(prog)s {}".format(__version__) + ) + + # add joblist + parser.add_argument( + "-g", "--gene", help="single gene name", required=False, type=str + ) + + # add snakemakelog + parser.add_argument( + "-l", "--genelist", help="comma separated gene list", required=False, type=str + ) + + # output file + parser.add_argument( + "-f", + "--genelistfile", + help="genelist in file (one gene per line)", + type=str, + required=False, + ) + + check_help(parser) + + # extract parsed arguments + args = parser.parse_args() + + if ( + (args.gene and args.genelist) + or (args.gene and args.genelistfile) + or (args.genelist and args.genelistfile) + or (args.gene and args.genelist and args.genelistfile) + ): + msg = "Only one can be provided -g or -l or -f" + exit_w_msg(msg) + + return args + + +def process_genelist(gl, lookup): + result = [] + for g in gl: + if g in lookup: + result.extend(lookup[g].split(",")) + return result + + +def process_args(args, lookup): + if args.gene: + r = process_genelist([args.gene], lookup) + if args.genelist: + gl = args.genelist + r = process_genelist(gl.split(","), lookup) + if args.genelistfile: + with open(args.genelistfile) as f: + lines = f.readlines() + lines = list(map(lambda x: x.strip(), lines)) + r = process_genelist(lines, lookup) + return r + + +def print_results(result): + for g in result: + print(g) + + +def read_lookup(): + lookup = dict() + # read in lookup table from github + url = "https://raw.githubusercontent.com/CCBR/Tools/master/homologfinder/human_mouse_homolog_lookup.txt" + download = requests.get(url).content + lookupdf = pd.read_csv(io.StringIO(download.decode("utf-8")), sep="\t") + lookupdf.columns = ["geneName", "homologs"] + for index, row in lookupdf.iterrows(): + lookup[row["geneName"]] = row["homologs"] + return lookup + + +def create_homolog_table(rpt_file="HOM_MouseHumanSequence.rpt"): + cols = ["DB Class Key", "Common Organism Name", "Symbol"] + df = pd.read_csv(rpt_file, usecols=cols, sep="\t") + # human-mouse homologs file --> HOM_MouseHumanSequence.rpt + # can be downloaded from http://www.informatics.jax.org/faq/ORTH_dload.shtml + lookup = dict() + lookup2 = dict() + for index, row in df.iterrows(): + if not row["DB Class Key"] in lookup: + lookup[row["DB Class Key"]] = dict() + lookup[row["DB Class Key"]]["mouse, laboratory"] = list() + lookup[row["DB Class Key"]]["human"] = list() + if not row["Common Organism Name"] in lookup[row["DB Class Key"]]: + continue + lookup[row["DB Class Key"]][row["Common Organism Name"]].append(row["Symbol"]) + for k, v in lookup.items(): + # print(",".join(v["mouse, laboratory"]),",".join(v["human"]),sep="\t") + for l in v["mouse, laboratory"]: + if not l in lookup2: + lookup2[l] = list() + lookup2[l].extend(v["human"]) + for l in v["human"]: + if not l in lookup2: + lookup2[l] = list() + lookup2[l].extend(v["mouse, laboratory"]) + + for k, v in lookup2.items(): + print(k, ",".join(v), sep="\t") + + +def main(): + # collect all arguments + args = collect_args() + # now that args are correct load in the lookup + lookup = read_lookup() + # process the arguments + result = process_args(args, lookup) + print_results(result) + + +if __name__ == "__main__": + main() diff --git a/homologfinder/human_mouse_homolog_lookup.txt b/src/ccbr_tools/homologfinder/human_mouse_homolog_lookup.txt similarity index 100% rename from homologfinder/human_mouse_homolog_lookup.txt rename to src/ccbr_tools/homologfinder/human_mouse_homolog_lookup.txt diff --git a/Biowulf/intersect b/src/ccbr_tools/intersect.py similarity index 63% rename from Biowulf/intersect rename to src/ccbr_tools/intersect.py index eaea9e3..8f859f1 100644 --- a/Biowulf/intersect +++ b/src/ccbr_tools/intersect.py @@ -2,7 +2,7 @@ ################################################################### # Skyler Kuhn # intersect -# Find the intersect of the two files +# Find the intersect of the two files # Returns the inner join # USAGE: intersect file1 file2 ################################################################### @@ -11,15 +11,14 @@ def indexFile(filename, joinindex, header): - - fh = open(filename, 'r') + fh = open(filename, "r") filedict = {} if header == 1: - firstline = next(fh).strip().replace('"', '').split("\t") + firstline = next(fh).strip().replace('"', "").split("\t") filedict["headerstr"] = firstline for line in fh: - linelist = line.strip().replace('"', '').split("\t") + linelist = line.strip().replace('"', "").split("\t") joinon = linelist[joinindex] filedict[joinon] = linelist @@ -28,29 +27,32 @@ def indexFile(filename, joinindex, header): def intersect(fileDict, file2, joinindex, header): - - fh2 = open(file2, 'r') + fh2 = open(file2, "r") counter = 0 if header == 1: - firstline = next(fh2).strip().replace('"', '').split("\t") - headerline = "\t".join(fileDict["headerstr"]).rstrip("\n") + "\t" + "\t".join(firstline) + firstline = next(fh2).strip().replace('"', "").split("\t") + headerline = ( + "\t".join(fileDict["headerstr"]).rstrip("\n") + "\t" + "\t".join(firstline) + ) print(headerline) for line in fh2: - linelist = line.strip().replace('"', '').split("\t") - #print(linelist) + linelist = line.strip().replace('"', "").split("\t") + # print(linelist) joinon = linelist[joinindex] try: fileDict[joinon] except KeyError: continue # joinon key is not in the file1, go to next line in file - + counter += 1 - intersection = "\t".join(fileDict[joinon]).rstrip("\n") + "\t" + "\t".join(linelist) + intersection = ( + "\t".join(fileDict[joinon]).rstrip("\n") + "\t" + "\t".join(linelist) + ) print(intersection) - #print(counter) + # print(counter) fh2.close() @@ -66,11 +68,13 @@ def main(): f2index = int(sys.argv[4]) except IndexError: - exit("INCORRECT USGAE:\nintersect filename1 filename2 f1ColumnIndex F2ColumnIndex\n\t--Ex. intersect file1 file2 0 0") + exit( + "INCORRECT USGAE:\nintersect filename1 filename2 f1ColumnIndex F2ColumnIndex\n\t--Ex. intersect file1 file2 0 0" + ) indexedFile1 = indexFile(file1, f1index, header) intersect(indexedFile1, file2, f2index, header) + if __name__ == "__main__": main() - diff --git a/Biowulf/jobby b/src/ccbr_tools/jobby.py similarity index 100% rename from Biowulf/jobby rename to src/ccbr_tools/jobby.py diff --git a/src/ccbr_tools/jobinfo.py b/src/ccbr_tools/jobinfo.py new file mode 100755 index 0000000..ee7d334 --- /dev/null +++ b/src/ccbr_tools/jobinfo.py @@ -0,0 +1,318 @@ +#!/usr/bin/env python3 + +""" +About: + This wrapper script works only on BIOWULF! + This script usage the "dashboard_cli" utility on biowulf to get HPC usage metadata + for a list of slurm jobids. These slurm jobids can be either provided at command + line or extracted from a snakemake.log file. Using snakemake.log file option together + with --failonly option lists path to the STDERR files for failed jobs. This can be + very useful to debug failed Snakemake workflows. +USAGE: + $ jobinfo -h +Example: + $ jobinfo -j 123456,7891011 + $ jobinfo -s /path/to/snakemake.log + $ jobinfo -j 123456,7891011 -o /path/to/report.tsv + $ jobinfo -s /path/to/snakemake.log --failonly +""" + +__version__ = "v1.0.0" +__author__ = "Vishal Koparde" +__email__ = "vishal.koparde@nih.gov" + +import argparse, subprocess, json, os, datetime, time, textwrap, sys +import pandas as pd + +# SHORT_FIELDS used to display on screen +SHORT_FIELDS = "jobid,state,jobname,elapsed_time,timelimit,time_util,cpus,max_cpu_util,mem,max_mem_util,exit_code" +FAILONLY_FIELDS = "jobid,jobname,elapsed_time,timelimit,time_util,cpus,max_cpu_util,mem,max_mem_util,state_reason,eval,exit_code,std_err" +# LONG_FIELDS used to write to output file +LONG_FIELDS = "jobid,jobname,state,state_reason,eval,exit_code,nodelist,partition,qos,submit_time,queued_time,queued_time_seconds,elapsed_time,elapsed_time_seconds,timelimit,timelimit_seconds,user,cpus,cpu_min,cpu_avg,cpu_max,mem,mem_min,mem_avg,mem_max,gres,work_dir,std_out,std_err" +FAILONLY = "FAILED,TIMEOUT" + +# change FAILONLY state .. for debugging only +# FAILONLY="TIMEOUT" + + +def exit_w_msg(message): + """Gracefully exit with proper message""" + print("{} : EXITING!!".format(__file__)) + print(message) + sys.exit() + + +def check_help(parser): + """check if usage needs to be printed""" + if "-h" in sys.argv or "--help" in sys.argv or len(sys.argv) == 1: + print(__doc__) + parser.print_help() + parser.exit() + return + + +def check_host(): + if ( + os.environ.get("HOSTNAME") == "biowulf.nih.gov" + or os.environ.get("HOSTNAME") == "helix.nih.gov" + ): + pass + else: + exit_w_msg("This script only works on BIOWULF!") + + +def collect_args(): + # create parser + parser = argparse.ArgumentParser( + description="Get slurm job information using slurm job id or snakemake.log file" + ) + + # add version + parser.add_argument( + "-v", "--version", action="version", version="%(prog)s {}".format(__version__) + ) + + # add joblist + parser.add_argument( + "-j", + "--joblist", + help="comma separated list of jobids. Cannot be used together with -s option.", + required=False, + type=str, + ) + + # add snakemakelog + parser.add_argument( + "-s", + "--snakemakelog", + help="snakemake.log file. Slurm jobids are extracted from here. Cannot be used together with -j option.", + required=False, + type=argparse.FileType("r"), + ) + + # output file + parser.add_argument( + "-o", + "--output", + help="Path to output file. All jobs (all states) and all columns are reported in output file.", + type=str, + required=False, + ) + + # output only failed jobs + parser.add_argument( + "-f", + "--failonly", + help="output FAILED jobs only (onscreen). Path to the STDERR files for failed jobs. All jobs are reported with -o option.", + action="store_true", + required=False, + ) + + check_help(parser) + + # extract parsed arguments + args = parser.parse_args() + + if args.output: + args.output = os.path.abspath(args.output) + if not os.access(os.path.dirname(args.output), os.W_OK): + msg = "File is not writable: {}".format(args.output) + exit_w_msg(msg) + + if args.joblist and args.snakemakelog: + exit_w_msg("Either -j or -s (not BOTH) is required!") + + if args.joblist: + jobids = args.joblist + args.joblist = jobids.split(",") + + if ( + args.snakemakelog + ): # if snakemakelog file is given then extract the jobids from it. + cmd = ( + 'grep "external jobid" ' + + args.snakemakelog.name + + ' | awk \'{print $NF}\' | sed "s/\'//g" | sed "s/\.//g"' + ) + p1 = subprocess.run(cmd, capture_output=True, text=True, shell=True) + args.joblist = p1.stdout.strip().split("\n") + + return args + + +def mem2gb(memstr): + if memstr == "0": + return float("0") + value, unit = memstr.split() + if unit == "GB": + return float(value) + elif unit == "MB": + return float(value) / 1024 + elif unit == "KB": + return float(value) / 1024 / 1024 + + +def check_int_set_zero(s): + if s == "": + s = 0 + else: + s = int(s) + return s + + +def time2sec(timestr): + debug = 0 + dayHMSstr_list = timestr.split("-") + if debug == 1: + print(timestr) + if debug == 1: + print(dayHMSstr_list) + if debug == 1: + print(len(dayHMSstr_list)) + if len(dayHMSstr_list) == 2: + day = check_int_set_zero(dayHMSstr_list[0]) + HMSstr = dayHMSstr_list[1] + else: + day = 0 + HMSstr = dayHMSstr_list[0] + HMSstr_list = HMSstr.split(":") + if debug == 1: + print(HMSstr) + if debug == 1: + print(HMSstr_list) + if len(HMSstr_list) == 3: + hour = check_int_set_zero(HMSstr_list[0]) + minutes = check_int_set_zero(HMSstr_list[1]) + sec = check_int_set_zero(HMSstr_list[2]) + elif len(HMSstr_list) == 2: + hour = 0 + minutes = check_int_set_zero(HMSstr_list[0]) + sec = check_int_set_zero(HMSstr_list[1]) + elif len(HMSstr_list) == 1: + hour = 0 + minutes = 0 + sec = check_int_set_zero(HMSstr_list[0]) + if debug == 1: + print(day, hour, minutes, sec) + sec += int(day) * 24 * 60 * 60 + if debug == 1: + print(day, hour, minutes, sec) + sec += int(hour) * 60 * 60 + if debug == 1: + print(day, hour, minutes, sec) + sec += int(minutes) * 60 + if debug == 1: + print(day, hour, minutes, sec) + return float(sec) + + +def get_jobinfo(args): + # cmd = '/usr/local/bin/dashboard_cli jobs --joblist ' + ",".join(args.joblist[0:10]) + " --archive --json --fields " + LONG_FIELDS + cmd = ( + "/usr/local/bin/dashboard_cli jobs --joblist " + + ",".join(args.joblist) + + " --archive --json --fields " + + LONG_FIELDS + ) + p1 = subprocess.run(cmd, capture_output=True, text=True, shell=True) + if p1.returncode != 0: + exit_w_msg("dashboard_cli failed!") + p1_json = json.loads(p1.stdout) + p1_table = pd.json_normalize(p1_json) + p1_table["epochtime"] = p1_table.apply( + lambda row: time.mktime( + datetime.datetime.strptime(row.submit_time, "%Y-%m-%dT%H:%M:%S").timetuple() + ), + axis=1, + ) + p1_table = p1_table.sort_values(by=["epochtime"]) + p1_table["max_cpu_util"] = p1_table.apply( + lambda row: ( + "-" + if row["cpu_max"] == "-" + else "%.2f" % (float(row["cpu_max"]) * 100 / int(row["cpus"])) + " %" + ), + axis=1, + ) + p1_table["max_mem_util"] = p1_table.apply( + lambda row: ( + "-" + if row["mem_max"] == "-" + else "%.2f" % (mem2gb(row["mem_max"]) * 100 / mem2gb(row["mem"])) + " %" + ), + axis=1, + ) + p1_table["queued_time_seconds"] = p1_table.apply( + lambda row: "%d" % (int(time2sec(row["queued_time"]))), axis=1 + ) + p1_table["elapsed_time_seconds"] = p1_table.apply( + lambda row: "%d" % (int(time2sec(row["elapsed_time"]))), axis=1 + ) + p1_table["timelimit_seconds"] = p1_table.apply( + lambda row: "%d" % (int(time2sec(row["timelimit"]))), axis=1 + ) + p1_table["time_util"] = p1_table.apply( + lambda row: ( + "%.2f" + % ( + float(row["elapsed_time_seconds"]) + * 100 + / float(row["timelimit_seconds"]) + ) + + " %" + if float(row["timelimit_seconds"]) != 0 + else "- %" + ), + axis=1, + ) + if args.output: + try: + if not p1_table.empty: + p1_table.to_csv( + args.output, + sep="\t", + header=True, + index=False, + columns=LONG_FIELDS.split(","), + ) + except: + msg = "File is not writable: {}".format(args.output) + exit_w_msg(msg) + return p1_table + + +def filter_rows(func): + def wrapper(t, args): + if args.failonly: + t = t[t["state"].isin(FAILONLY.split(","))] + func(t, args) + + return wrapper + + +@filter_rows +def print2screen(t, args): + onscreenfields = SHORT_FIELDS + if args.failonly: + onscreenfields = FAILONLY_FIELDS + if t.empty: + print("Good News!! You have ZERO FAILED jobs!") + else: + print( + t.to_string(index=False, justify="left", columns=onscreenfields.split(",")) + ) + + +def main(): + # check host + check_host() + # collect all arguments + args = collect_args() + # query dashboard_cli to get details as a pandas table + t = get_jobinfo(args) + # filter table, print to screen and write to output file + print2screen(t, args) + + +if __name__ == "__main__": + main() diff --git a/src/ccbr_tools/peek.py b/src/ccbr_tools/peek.py new file mode 100755 index 0000000..62b5df1 --- /dev/null +++ b/src/ccbr_tools/peek.py @@ -0,0 +1,124 @@ +#!/usr/local/bin/python +# -*- coding: utf-8 -*- +from __future__ import print_function +from pathlib import Path +import sys + + +def usage(): + """Print usage information and exit program""" + bin_stem = Path(sys.argv[0]).stem + print(f"USAGE: {bin_stem} [buffer]\n") + print("Assumptions:\n\tInput file is tab delimited") + print("\t └── Globbing supported: *.txt\n") + print("Optional:\n\tbuffer = 40 (default)") + print("\t └── Changing buffer will increase/decrease output justification") + sys.exit() + + +def pargs(): + """Basic command-line parser""" + if "-h" in sys.argv or "--help" in sys.argv or len(sys.argv) == 1: + usage() + try: + fname = sys.argv[1] + except IndexError: + usage() + return + + +def max_string(data): + """Given a list of strings, finds the maximum strign length""" + max = -1 + for value in data: + if len(value) > max: + max = len(value) + return max + + +def print_header(filename, length): + """Print filenames and divider""" + print("# {}".format(filename)) + print("{}".format("=" * length)) + + +def justify(h, d, n, nr): + """Calculates the spacing for justifying to the right""" + xspaces = n - (h + d) + if nr < 10: + xspaces = xspaces - 2 + else: + xspaces = xspaces - 3 + spacing = xspaces * " " + return spacing + + +def pprint(headlist, data, linelength, fn): + """Re-formats first two lines on file so columns are left justified and values are right justified""" + # Print Filename + print_header(fn, linelength) + + # Print NR and justified contents of 1st and 2nd line + for i in range(len(headlist)): + rownumber = i + 1 + + # Attribute name and corresponding value + column = headlist[i].lstrip().rstrip() + if not column: + column = "NULL" + value = data[i].lstrip().rstrip() + + # Calculate spacing for justifying to the right + insert_spaces = justify(len(column), len(value), linelength, rownumber) + print("{} {}{}{}".format(rownumber, column, insert_spaces, value)) + + +def peek(filename, buffer, delim="\t"): + pargs() + + # Getting contents of first line + try: + fh = open(filename, "r") + except IOError as e: + # File does not exist + print("\n{}\nPlease check you filename!\n\n".format(e)) + usage() + + headerlist = fh.readline().split(delim) + fh.close() + + # Getting contents of second line + fh = open(filename, "r") + try: + datalist = fh.readlines()[1].split(delim) + except IndexError: + datalist = ["EMPTY_FIELD"] + fh.close() + + max_attr_length = max_string(datalist) + total_length = max_attr_length + buffer + + # Pretty print data (Right justify results) + pprint(headerlist, datalist, total_length, filename) + print() + + +def main(): + # Checking command-line usage before parsing + pargs() + + try: + buffer = int(sys.argv[-1]) + sys.argv.pop(-1) + except IndexError: + buffer = 40 + except ValueError: + buffer = 40 + + # Paring file(s) contents to support globbing + for file in sys.argv[1:]: + peek(file, buffer) + + +if __name__ == "__main__": + main() diff --git a/src/ccbr_tools/pipeline/util.py b/src/ccbr_tools/pipeline/util.py new file mode 100644 index 0000000..f6546aa --- /dev/null +++ b/src/ccbr_tools/pipeline/util.py @@ -0,0 +1,396 @@ +#!/usr/bin/env python3 +# -*- coding: UTF-8 -*- + +# Python standard library +from __future__ import print_function +from shutil import copytree +import sys +import hashlib +import subprocess +import json +import glob +import os +import warnings + + +def scontrol_show(): + """Run scontrol show config and parse the output as a dictionary + @return scontrol_dict : + """ + scontrol_dict = dict() + scontrol_out = subprocess.run( + "scontrol show config", shell=True, capture_output=True, text=True + ).stdout + if len(scontrol_out) > 0: + for line in scontrol_out.split("\n"): + line_split = line.split("=") + if len(line_split) > 1: + scontrol_dict[line_split[0].strip()] = line_split[1].strip() + return scontrol_dict + + +def get_hpcname(): + """Get the HPC name (biowulf, frce, or an empty string) + @return hpcname + """ + scontrol_out = scontrol_show() + hpc = scontrol_out["ClusterName"] if "ClusterName" in scontrol_out.keys() else "" + if hpc == "fnlcr": + hpc = "frce" + return hpc + + +def get_tmp_dir(tmp_dir, outdir, hpc=get_hpcname()): + """Get default temporary directory for biowulf and frce. Allow user override.""" + if not tmp_dir: + if hpc == "biowulf": + tmp_dir = "/lscratch/$SLURM_JOBID" + elif hpc == "frce": + tmp_dir = outdir + else: + tmp_dir = None + return tmp_dir + + +def get_genomes_list(hpcname=get_hpcname(), error_on_warnings=False): + """Get list of genome annotations available for the current platform + @return genomes_list + """ + return sorted( + list( + get_genomes_dict( + hpcname=hpcname, error_on_warnings=error_on_warnings + ).keys() + ) + ) + + +def get_genomes_dict(repo_base, hpcname=get_hpcname(), error_on_warnings=False): + """Get dictionary of genome annotation versions and the paths to the corresponding JSON files + @repo_base: function for getting the base directory of the repository + @return genomes_dict { genome_name: json_file_path } + """ + if error_on_warnings: + warnings.filterwarnings("error") + genomes_dir = repo_base("config", "genomes", hpcname) + if not os.path.exists(genomes_dir): + warnings.warn(f"Folder does not exist: {genomes_dir}") + search_term = genomes_dir + "/*.json" + json_files = glob.glob(search_term) + if len(json_files) == 0: + warnings.warn( + f"No Genome+Annotation JSONs found in {genomes_dir}. Please specify a custom genome json file with `--genome`" + ) + genomes_dict = { + os.path.basename(json_file).replace(".json", ""): json_file + for json_file in json_files + } + warnings.resetwarnings() + return genomes_dict + + +def md5sum(filename, first_block_only=False, blocksize=65536): + """Gets md5checksum of a file in memory-safe manner. + The file is read in blocks/chunks defined by the blocksize parameter. This is + a safer option to reading the entire file into memory if the file is very large. + @param filename : + Input file on local filesystem to find md5 checksum + @param first_block_only : + Calculate md5 checksum of the first block/chunk only + @param blocksize : + Blocksize of reading N chunks of data to reduce memory profile + @return hasher.hexdigest() : + MD5 checksum of the file's contents + """ + hasher = hashlib.md5() + with open(filename, "rb") as fh: + buf = fh.read(blocksize) + if first_block_only: + # Calculate MD5 of first block or chunk of file. + # This is a useful heuristic for when potentially + # calculating an MD5 checksum of thousand or + # millions of file. + hasher.update(buf) + return hasher.hexdigest() + while len(buf) > 0: + # Calculate MD5 checksum of entire file + hasher.update(buf) + buf = fh.read(blocksize) + + return hasher.hexdigest() + + +## copied directly from rna-seek +def check_cache(parser, cache, *args, **kwargs): + """Check if provided SINGULARITY_CACHE is valid. Singularity caches cannot be + shared across users (and must be owned by the user). Singularity strictly enforces + 0700 user permission on on the cache directory and will return a non-zero exitcode. + @param parser : + Argparse parser object + @param cache : + Singularity cache directory + @return cache : + If singularity cache dir is valid + """ + if not exists(cache): + # Cache directory does not exist on filesystem + os.makedirs(cache) + elif os.path.isfile(cache): + # Cache directory exists as file, raise error + parser.error( + """\n\t\x1b[6;37;41mFatal: Failed to provided a valid singularity cache!\x1b[0m + The provided --singularity-cache already exists on the filesystem as a file. + Please run {} again with a different --singularity-cache location. + """.format( + sys.argv[0] + ) + ) + elif os.path.isdir(cache): + # Provide cache exists as directory + # Check that the user owns the child cache directory + # May revert to os.getuid() if user id is not sufficient + if ( + exists(os.path.join(cache, "cache")) + and os.stat(os.path.join(cache, "cache")).st_uid != os.getuid() + ): + # User does NOT own the cache directory, raise error + parser.error( + """\n\t\x1b[6;37;41mFatal: Failed to provided a valid singularity cache!\x1b[0m + The provided --singularity-cache already exists on the filesystem with a different owner. + Singularity strictly enforces that the cache directory is not shared across users. + Please run {} again with a different --singularity-cache location. + """.format( + sys.argv[0] + ) + ) + + return cache + + +def permissions(parser, path, *args, **kwargs): + """Checks permissions using os.access() to see the user is authorized to access + a file/directory. Checks for existence, readability, writability and executability via: + os.F_OK (tests existence), os.R_OK (tests read), os.W_OK (tests write), os.X_OK (tests exec). + @param parser : + Argparse parser object + @param path : + Name of path to check + @return path : + Returns abs path if it exists and permissions are correct + """ + if not exists(path): + parser.error( + "Path '{}' does not exists! Failed to provide valid input.".format(path) + ) + if not os.access(path, *args, **kwargs): + parser.error( + "Path '{}' exists, but cannot read path due to permissions!".format(path) + ) + + return os.path.abspath(path) + + +def standard_input(parser, path, *args, **kwargs): + """Checks for standard input when provided or permissions using permissions(). + @param parser : + Argparse parser object + @param path : + Name of path to check + @return path : + If path exists and user can read from location + """ + # Checks for standard input + if not sys.stdin.isatty(): + # Standard input provided, set path as an + # empty string to prevent searching of '-' + path = "" + return path + + # Checks for positional arguments as paths + path = permissions(parser, path, *args, **kwargs) + + return path + + +def exists(testpath): + """Checks if file exists on the local filesystem. + @param parser : + argparse parser object + @param testpath : + Name of file/directory to check + @return does_exist : + True when file/directory exists, False when file/directory does not exist + """ + does_exist = True + if not os.path.exists(testpath): + does_exist = False # File or directory does not exist on the filesystem + + return does_exist + + +def ln(files, outdir): + """Creates symlinks for files to an output directory. + @param files list[]: + List of filenames + @param outdir : + Destination or output directory to create symlinks + """ + # Create symlinks for each file in the output directory + for file in files: + ln = os.path.join(outdir, os.path.basename(file)) + if not exists(ln): + os.symlink(os.path.abspath(os.path.realpath(file)), ln) + + +def which(cmd, path=None): + """Checks if an executable is in $PATH + @param cmd : + Name of executable to check + @param path : + Optional list of PATHs to check [default: $PATH] + @return : + True if exe in PATH, False if not in PATH + """ + if path is None: + path = os.environ["PATH"].split(os.pathsep) + + for prefix in path: + filename = os.path.join(prefix, cmd) + executable = os.access(filename, os.X_OK) + is_not_directory = os.path.isfile(filename) + if executable and is_not_directory: + return True + return False + + +def err(*message, **kwargs): + """Prints any provided args to standard error. + kwargs can be provided to modify print functions + behavior. + @param message : + Values printed to standard error + @params kwargs + Key words to modify print function behavior + """ + print(*message, file=sys.stderr, **kwargs) + + +def fatal(*message, **kwargs): + """Prints any provided args to standard error + and exits with an exit code of 1. + @param message : + Values printed to standard error + @params kwargs + Key words to modify print function behavior + """ + err(*message, **kwargs) + sys.exit(1) + + +def require(cmds, suggestions, path=None): + """Enforces an executable is in $PATH + @param cmds list[]: + List of executable names to check + @param suggestions list[]: + Name of module to suggest loading for a given index + in param cmd. + @param path list[]]: + Optional list of PATHs to check [default: $PATH] + """ + error = False + for i in range(len(cmds)): + available = which(cmds[i]) + if not available: + error = True + err( + """\x1b[6;37;41m\n\tFatal: {} is not in $PATH and is required during runtime! + └── Solution: please 'module load {}' and run again!\x1b[0m""".format( + cmds[i], suggestions[i] + ) + ) + + if error: + fatal() + + return + + +def safe_copy(source, target, resources=[]): + """Private function: Given a list paths it will recursively copy each to the + target location. If a target path already exists, it will NOT over-write the + existing paths data. + @param resources : + List of paths to copy over to target location + @params source : + Add a prefix PATH to each resource + @param target : + Target path to copy templates and required resources + """ + + for resource in resources: + destination = os.path.join(target, resource) + if not exists(destination): + # Required resources do not exist + copytree(os.path.join(source, resource), destination) + + +def git_commit_hash(repo_path): + """Gets the git commit hash of the RNA-seek repo. + @param repo_path : + Path to RNA-seek git repo + @return githash : + Latest git commit hash + """ + try: + githash = ( + subprocess.check_output( + ["git", "rev-parse", "HEAD"], stderr=subprocess.STDOUT, cwd=repo_path + ) + .strip() + .decode("utf-8") + ) + # Typecast to fix python3 TypeError (Object of type bytes is not JSON serializable) + # subprocess.check_output() returns a byte string + githash = str(githash) + except Exception as e: + # Github releases are missing the .git directory, + # meaning you cannot get a commit hash, set the + # commit hash to indicate its from a GH release + githash = "github_release" + return githash + + +def join_jsons(templates): + """Joins multiple JSON files to into one data structure + Used to join multiple template JSON files to create a global config dictionary. + @params templates : + List of template JSON files to join together + @return aggregated : + Dictionary containing the contents of all the input JSON files + """ + # Get absolute PATH to templates in rna-seek git repo + repo_path = os.path.dirname(os.path.abspath(__file__)) + aggregated = {} + + for file in templates: + with open(os.path.join(repo_path, file), "r") as fh: + aggregated.update(json.load(fh)) + + return aggregated + + +def check_python_version(): + # version check + # glob.iglob requires 3.11 for using "include_hidden=True" + MIN_PYTHON = (3, 11) + try: + assert sys.version_info >= MIN_PYTHON + print( + "Python version: {0}.{1}.{2}".format( + sys.version_info.major, sys.version_info.minor, sys.version_info.micro + ) + ) + except AssertionError: + exit( + f"{sys.argv[0]} requires Python {'.'.join([str(n) for n in MIN_PYTHON])} or newer" + ) diff --git a/src/ccbr_tools/pyproject.toml b/src/ccbr_tools/pyproject.toml new file mode 100644 index 0000000..e7c293d --- /dev/null +++ b/src/ccbr_tools/pyproject.toml @@ -0,0 +1,119 @@ +[build-system] +requires = [ + "setuptools >= 62.3.0", + "wheel >= 0.29.0", +] +build-backend = 'setuptools.build_meta' + +[project] +name = 'ccbr_tools' +dynamic = ['version','readme'] +description = "Utilities for CCBR Bioinformatics Software" +authors = [ + {name = "Kelly Sovacool", email = "kelly.sovacool@nih.gov"}, + {name = "Vishal Koparde", email = "vishal.koparde@nih.gov"}, + {name = "Skyler Kuhn"}, +] +maintainers = [ + {name = "CCR Collaborative Bioinformatics Resource", email = "ccbr@mail.nih.gov"}, +] +license = {file = "LICENSE"} +classifiers = [ + "Environment :: Console", + "Environment :: MacOS X", + "Intended Audience :: Science/Research", + "License :: OSI Approved :: MIT license", + "Natural Language :: English", + "Operating System :: POSIX :: Linux", + "Operating System :: MacOS :: MacOS X", + "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Topic :: Scientific/Engineering :: Bio-Informatics", +] +requires-python = ">=3.10, <3.12" +dependencies = [ + "biopython", + "cffconvert >= 2.0.0", + "Click >= 8.1.3", + "pandas", + "pyyaml >= 6.0", + "requests" +] + +[project.optional-dependencies] +dev = [ + "black >= 23.10.0", + "jupyter", + "pre-commit" +] +test = [ + "pytest", + "pytest-cov" +] + +[project.scripts] +ccbr_tools = "ccbr_tools.__main__:main" +gb2gtf = "ccbr_tools.gb2gtf:main" +hf = "ccbr_tools.homologfinder.hf:main" +jobby = "ccbr_tools.jobby:main" +jobinfo = "ccbr_tools.jobinfo:main" +intersect = "ccbr_tools.intersect:main" +peek = "ccbr_tools.peek:main" + +[project.urls] +Repository = "https://github.com/CCBR/Tools" + +[tool.numpydoc_validation] +checks = [ + "all", # report on all checks, except the below + "EX01", + "SA01", + "ES01", +] +# remember to use single quotes for regex in TOML +exclude = [ # don't report on objects that match any of these regex + '\.undocumented_method$', + '\.__repr__$', +] +override_SS05 = [ # override SS05 to allow docstrings starting with these words + '^Process ', + '^Assess ', + '^Access ', +] + +[tool.setuptools.package-data] +"*" = ["LICENSE", "VERSION", "CITATION.cff", "CHANGELOG.md", "pyproject.toml"] + +[tool.setuptools.dynamic] +version = {file = "VERSION"} +readme = {file = "README.md"} + +[tool.setuptools] +script-files = [ + "scripts/add_gene_name_to_count_matrix.R", + "scripts/aggregate_data_tables.R", + "scripts/argparse.bash", + "scripts/cancel_snakemake_jobs.sh", + "scripts/create_hpc_link.sh", + "scripts/extract_value_from_json.py", + "scripts/extract_value_from_yaml.py", + "scripts/filter_bam_by_readids.py", + "scripts/filter_fastq_by_readids_highmem.py", + "scripts/filter_fastq_by_readids_highmem_pe.py", + "scripts/gather_cluster_stats.sh", + "scripts/gather_cluster_stats_biowulf.sh", + "scripts/get_buyin_partition_list.bash", + "scripts/get_slurm_file_with_error.sh", + "scripts/gsea_preranked.sh", + "scripts/karyoploter.R", + "scripts/make_labels_for_pipeliner.sh", + "scripts/rawcounts2normalizedcounts_DESeq2.R", + "scripts/rawcounts2normalizedcounts_limmavoom.R", + "scripts/run_jobby_on_nextflow_log", + "scripts/run_jobby_on_nextflow_log_full_format", + "scripts/run_jobby_on_snakemake_log", + "scripts/run_jobby_on_snakemake_log_full_format", + "scripts/spooker", + "scripts/which_vpn.sh" +] diff --git a/src/ccbr_tools/shell.py b/src/ccbr_tools/shell.py new file mode 100644 index 0000000..5a75b6e --- /dev/null +++ b/src/ccbr_tools/shell.py @@ -0,0 +1,20 @@ +import contextlib +import io +import subprocess + + +def shell_run(command_str): + """Run a shell command and return stdout/stderr""" + out = subprocess.run(command_str, capture_output=True, shell=True, text=True) + return "\n".join([out.stdout, out.stderr]) + + +def exec_in_context(func, *args, **kwargs): + """Execute a function in a context manager to capture stdout/stderr""" + with ( + contextlib.redirect_stdout(io.StringIO()) as out_f, + contextlib.redirect_stderr(io.StringIO()) as err_f, + ): + func(*args, **kwargs) + out_combined = "\n".join([out_f.getvalue(), err_f.getvalue()]) + return out_combined diff --git a/src/ccbr_tools/util.py b/src/ccbr_tools/util.py new file mode 100644 index 0000000..536aac9 --- /dev/null +++ b/src/ccbr_tools/util.py @@ -0,0 +1,85 @@ +""" Miscellaneous utility functions """ + +from cffconvert.cli.create_citation import create_citation +from cffconvert.cli.validate_or_write_output import validate_or_write_output +import click +import importlib.resources +import importlib.metadata +import os +import pathlib +from time import localtime, strftime +import tomllib + + +def repo_base(*paths): + """Get the absolute path to a file in the repository + @return abs_path + """ + basedir = pathlib.Path(__file__).absolute().parent + return basedir.joinpath(*paths) + + +def get_version(debug=False): + """Get the current version of the ccbr_tools package + @param pkg_name : name of the package (default: ccbr_tools) + @return version + """ + version_path = repo_base("VERSION") + if debug: + print("VERSION file path:", version_path) + with open(version_path, "r") as infile: + return infile.read().strip().lstrip("v") + + +def get_package_version(pkg_name="ccbr_tools"): + """Get the current version of a package from the metadata + @param pkg_name : name of the package (default: ccbr_tools) + @return version + """ + importlib.metadata.metadata(pkg_name)["Version"] + + +def get_pyproject_toml(pkg_name="ccbr_tools"): + """Get the contents of the package's pyproject.toml file + @param pkg_name : name of the package (default: ccbr_tools) + @return pyproject_toml + """ + with open(repo_base("pyproject.toml"), "rb") as infile: + toml_dict = tomllib.load(infile) + return toml_dict + + +def get_project_scripts(pkg_name="ccbr_tools"): + """ + Get list of CLI tools in the package + """ + return sorted(get_pyproject_toml(pkg_name=pkg_name)["project"]["scripts"].keys()) + + +def get_external_scripts(pkg_name="ccbr_tools"): + """ + Get list of standalone scripts included in the package + """ + list(get_pyproject_toml(pkg_name=pkg_name)["tool"]["setuptools"]["script-files"]) + + +def print_citation( + citation_file=repo_base("CITATION.cff"), + output_format="bibtex", +): + citation = create_citation(citation_file, None) + # click.echo(citation._implementation.cffobj['message']) + validate_or_write_output(None, output_format, False, citation) + + +def msg(err_message): + tstamp = strftime("[%Y:%m:%d %H:%M:%S] ", localtime()) + click.echo(tstamp + err_message, err=True) + + +def msg_box(splash, errmsg=None): + msg("-" * (len(splash) + 4)) + msg(f"| {splash} |") + msg(("-" * (len(splash) + 4))) + if errmsg: + click.echo("\n" + errmsg, err=True) diff --git a/tests/test_cli.py b/tests/test_cli.py new file mode 100644 index 0000000..ccca48e --- /dev/null +++ b/tests/test_cli.py @@ -0,0 +1,52 @@ +from ccbr_tools.shell import shell_run +from ccbr_tools.pipeline.util import get_hpcname + + +def test_version(): + assert "ccbr_tools, version " in shell_run("ccbr_tools -v") + + +def test_help(): + assert "Utilities for CCBR Bioinformatics Software" in shell_run("ccbr_tools -h") + + +def test_help_cite(): + assert "Print the citation in the desired format" in shell_run("ccbr_tools cite -h") + + +def test_help_gb2gtf(): + assert "Usage: gb2gtf sequence.gb > sequence.gtf" in shell_run("gb2gtf -h") + + +def test_help_hf(): + assert "Get Human2Mouse (or Mouse2Human) homolog gene or genelist" in shell_run( + "hf -h" + ) + + +def test_help_jobby(): + assert "Will take your job(s)... and display their information!" in shell_run( + "jobby -h" + ) + + +def test_help_jobinfo(): + hpc = get_hpcname() + jobinfo_help = shell_run("jobinfo -h") + if hpc != "biowulf": + assert "This script only works on BIOWULF!" in jobinfo_help + else: + assert ( + "Get slurm job information using slurm job id or snakemake.log file" + in jobinfo_help + ) + + +def test_help_intersect(): + assert "intersect filename1 filename2 f1ColumnIndex F2ColumnIndex" in shell_run( + "intersect -h" + ) + + +def test_help_peek(): + assert "USAGE: peek [buffer]" in shell_run("peek -h") diff --git a/tests/test_scripts.py b/tests/test_scripts.py new file mode 100644 index 0000000..c397f14 --- /dev/null +++ b/tests/test_scripts.py @@ -0,0 +1,15 @@ +from ccbr_tools.shell import shell_run + + +def test_scripts_help(): + assert "extract value for key from JSON" in shell_run( + "extract_value_from_json.py --help" + ) + + +def test_which_vpn(): + which_vpn = shell_run("which_vpn.sh") + assert ( + "Are you really connected to VPN?? Doesn't look like it!" in which_vpn + or "Your VPN IP is" in which_vpn + ) diff --git a/tests/test_shell.py b/tests/test_shell.py new file mode 100644 index 0000000..4adefcd --- /dev/null +++ b/tests/test_shell.py @@ -0,0 +1,5 @@ +from ccbr_tools.shell import exec_in_context + + +def test_exec(): + assert exec_in_context(print, "hello", "world") == "hello world\n\n" diff --git a/which_vpn.sh b/which_vpn.sh deleted file mode 100644 index 850d102..0000000 --- a/which_vpn.sh +++ /dev/null @@ -1,42 +0,0 @@ -#!/bin/bash -# trying to find out which VPN you are connected to?? - -if [[ "$HOSTNAME" == "biowulf.nih.gov" ]] -then - echo "DO NOT RUN THIS ON BIOWULF HEADNODE! This is script is meant for your laptop." - exit 1 -elif [[ "$HOSTNAME" == "helix.nih.gov" ]] -then - echo "DO NOT RUN THIS ON HELIX! This script is meant for your laptop." - exit 1 -elif [[ "$HOSTNAME =~ cn[0-9]{4}$ ]] -then - echo "DO NOT RUN THIS ON a BIOWULF interactive node! This script is meant for your laptop" - exit 1 -fi - -# get ip -ip=$(ifconfig -a|grep "inet 10."|awk '{print $2}') - -if [[ "$ip" == "" ]] -then - echo "Are you really connected to VPN?? Doesnt look like it!" - exit 1 -fi - -echo "Your VPN IP is $ip" - -numbertwo=$(echo $ip|awk -F"." '{print $2}') - -if [[ "$numbertwo" == "247" || "$numbertwo" == "248" ]] -then - echo "You are connected to the FREDERICK VPN!" - exit 0 -elif [[ "$numbertwo" == "242" || "$numbertwo" == "243" ]] -then - echo "You are connected to the BETHESDA VPN!" - exit 0 -else - echo "Sorry, I cannot guess which VPN you are connect to!" - exit 0 -fi