Glossary & Acronyms¶
A
- action: automate a workflow in the context of CI/CD, see GitHub Actions
- agile: development methodology for organizing a team to complete tasks organized over short periods called ‘sprints’
- allocation: portion of a resource assigned to a particular recipient, typical unit is a core or node hour
- Anaconda: open source data science platform. Anaconda.com
- application: also called an ‘app’, a software designed to help the user to perform specific task
- awesome: a curated set of lists that provide insight into awesome software projects on GitHub
- AVU: Attribute-Value-Unit a components for iRODS metadata.
B
- beta: \(\beta\), a software version which is not yet ready for publication but is being tested
- bash: Bash is the GNU Project’s shell, the Bourne-Again Shell
- biocontainer: a community-driven project that provides the infrastructure and basic guidelines to create, manage and distribute bioinformatics packages (e.g conda) and containers (e.g docker, singularity)
- bioconda: a channel for the conda package manager specializing in bioinformatics software
C
- CLI: (1) the UNIX shell command line interface, most typically BASH (2) the CyVerse Learning Institute
- command: a set of instructions sent to the computer, typically in a typed interface
- conda: an installation type of the Anaconda data science platform. Command line application for managing packages and environments
- container: virtualization of an operating system run within an isolated user space
- Continuous Integration: (CI) is testing automation to check that the application is not broken whenever new commits are integrated into the main branch
- Continuous Delivery: (CD) is an extension of ‘continuous integration’ to make sure that you can release new changes in a sustainable way
- Continuous Deployment: a step further than ‘continuous delivery’, every change that passes all stages of your production pipeline is released
- Continuous Development: a process for iterative software development and is an umbrella over several other processes including ‘continuous integration’, ‘continuous testing’, ‘continuous delivery’ and ‘continuous deployment’
- Continuous Testing: a process of testing and automating software development.
- CRAN: The Comprehensive R Archive Network
- CyVerse tool: Software program that is integrated into the back end of the DE for use in DE apps
- CyVerse app: graphic interface of a tool made available for use in the DE
D
- Debian: a free OS, base of other Linux distributions such as Ubuntu
- Development: the environment on your computer where you write code
- DevOps Software *Dev*elopment and information techology *Op*erations techniques for shortening the time to change software in relation to CI/CD
- Discovery Environment (DE): a data science workbench for running executable, interactive, and high throughput applications in CyVerse DE
- distribution: abbreviated as ‘distro’, an operating system made from a software collection based upon the Linux kernel
- Docker: Docker is an open source software platform to create, deploy and manage virtualized application containers on a common operating system (OS), with an ecosystem of allied tools. A program that runs and handles life-cycle of containers and images
- DockerHub: an official registry of docker containers, operated by Docker. DockerHub
- DOI: a digital object identifier. A persistant identifier number, managed by the doi.org
- Dockerfile: a text document that contains all the commands you would normally execute manually in order to build a Docker image. Docker can build images automatically by reading the instructions from a Dockerfile
E
- environment: software that includes operating system, database system, specific tools for analysis
- entrypoint: In a Dockerfile, an ENTRYPOINT is an optional definition for the first part of the command to be run
F
- FOSS: (1) Free and Open Source Software, (2) Foundational Open Science Skills - this class!
- function: a named section of a program that performs a specific task
G
- git: a version control system software
- gitter: a Github based messaging service that uses markdown gitter.im
- GitHub: a website for hosting
git
repositories – owned by Microsoft GitHub - GitLab: a website for hosting
git
repositories GitLab - GitOps: using
git
framework as a means of deploying infrastructure on cloud using Kubernetes - GPU: graphic processing unit
- GUI: graphical user interface
H
- hack: a quick job that produces what is needed, but not well
- HPC: high performance computer, for large syncronous computation
- HTC: high throughput computer, for many parallel tasks
I
- IaaS: Infrastructure as a Service. online services that provide APIs
- iCommands: command line application for accessing iRODS Data Store
- IDE: integrated development environment, typically a graphical interface for working with code language or packages
- instance: a single virtul machine
- image: self-contained, read-only ‘snapshot’ of your applications and packages, with all their dependencies
- iRODS: an open source integrated Rule-Oriented Data Management System, iRODS.org
J
- Java: programming language, class-based, object-oriented
- JavaScript: programming language
- JSON: Java Script Object Notation, data interchange format that uses human-readable text
- Jupyter(Hub,Lab,Notebooks): an IDE, originally the iPythonNotebook, operates in the browser Project Jupyter
K
- kernel: central component of most operating systems (OS)
- Kubernetes: an open source container orchestration platform created by Google Kubernetes is often referred to as
K8s
L
- lib: a UNIX library
- linux: open source Unix-like operating system
M
- makefile: a file containing a set of directives used by a make build automation tool
- markdown: a lightweight markup language with plain text formatting syntax
- metadata:: data about data, useful for searching and querying
- multi-thread: a process which runs on more than one CPU or GPU core at the same time
- master node: responsible for deciding what runs on all of the cluster’s nodes. Can include scheduling workloads, like containerized applications, and managing the workloads’ lifecycle, scaling, and upgrades. The master also manages network and storage resources for those workloads
- Mac OS X: Apple’s popular desktop OS
N
- node: a computer, typically 1 or 2 core (with many threads) server in a cloud or HPC center
O
- ontology: formal naming and structural hierarchy used to describe data, also called a knowledge graph
- organization: a group, in the context of GitHub a place where developers contribute code to repositories
- Operating System (OS): software that manages computer hardware, software resources, and provides common services for computer programs
- Open Science Grid (OSG): national, distributed computing partnership for data-intensive research opensciencegrid.org
- ORCID: Open Researcher and Contributor ID (ORCiD), a persistent digital identifier that distinguishes you from every other researcher
P
- PaaS: Platform as a Service run and manage applications in cloud without complexity of developing it yourself
- package: an app designed for a particular langauge
- package manager: a collection of software tools that automates the process of installing, upgrading, configuring, and removing computer programs for a computer’s operating system in a consistent manner
- Production: environment where users access the final code after all of the updates and testing
- Python: interpreted, high-level, general-purpose programming language Python.org
Q
- QUAY.io: private Docker registry QUAY.io
R
- R: data science programming language R Project
- recipe file: a file with installation scripts used for building software such as containers, e.g. Dockerfile
- registry: a storage and content delivery system, such as that used by Docker
- remote desktop: a VM with a graphic user interface accessed via a browser
- repo(sitory): a directory structure for hosting code and data
- RST: ReStructuredText, a markdown type file
- ReadTheDocs: a web service for rendering documentation (that this website uses) readthedocs.org and readthedocs.com
- root: the administrative user on a linux kernel - use your powers wisely
S
- SaaS: Software as a Service web based platform for using software
- schema: a metadata standard for labeling, tagging or coding for recording & cataloging information or structuring descriptive records. see schema.org
- scrum: daily set of tasks and evalautions as part of a sprint.
- shell: is a command line interface program that runs other programs (may be complex, technical programs or very simple programs such as making a directory). These simple, stand-alone programs are called commands
- Singularity: a container software, used widely on HPC, created by SyLabs
- SLACK: Searchable Log of All Conversation and Knowledge, a team communication tool slack.com
- sprint: set period of time during which specific work has to be completed and made ready for review
- Singularity def file: (definition file) recipe for building a Singualrity container
- Stage: environment that is as similar to the production environment as can be for final testing
T
- tar: software utility for collecting many files into one archive file, often referred to as a tarball
- tensor: algebraic object that describes a linear mapping from one set of algebraic objects to another
- terminal: a windowed emulator for directly enterinc commands to a computer
- thread: a CPU process or a series of linked messages in a discussion board
- tool: In the context of CyVerse Discovery Environment, a Docker Container
- TPU: tensor processing unit
- Travis: Travis-CI, a continuous integration software
U
- Ubuntu: most popular Linux OS distribution, based on Debian
- UNIX: operating system
- user: the profile under which applications are started and run,
root
is the most powerful system administrator
V
- VICE: Visual Interactive Computing Environment - Cyverse Data Science Workbench
- virtual machine: is a software computer that, like a physical computer, runs an operating system and applications
W
- waterfall: software development broken into linear sequential phases, similar to a Gantt chart
- webGL: JavaScript API for rendering interactive 2D and 3D graphics within any compatible web browser without the use of plug-ins
- Windows: Microsoft’s most popular desktop OS
- workspace: (vs. repo)
- worker node: A cluster typically has one or more nodes, which are the worker machines that run your containerized applications and other workloads. Each node is managed from the master, which receives updates on each node’s self-reported status.
X
- XML: Extensible Markup Language, data interchange format that uses human-readable text
Y
- YAML: YAML Ain’t Markup Language, data interchange format that uses human-readable text
Z
- ZenHub: team collaboration solution built directly into GitHub that uses kanban style boards
- Zenodo: general-purpose open-access repository developed under the European OpenAIRE program and operated by CERN
- zip: a compressed file format
- zsh: Z-Shell, now the default shell on new Mac OS X
Fix or improve this documentation:
- On Github: Github Repo Link
- Send feedback: Tutorials@CyVerse.org