CyVerse_logo2

Home_Icon2 Learning Center Home

CI/CD

Building your own software, data management or workflow management systems requires you to do a significant amount of interpersonnel management, as well as tracking of development.

Software engineers having long suffered under the burden of disorganization and communication with clients, have come up with a framework for developing software and sharing it with their users.

While data science applications have a different audience and intended result, the organizational practices of software developers are a valuable and useful tool to consider integrating into your open science lab group.

Frequently Used Terms

  • Continuous Integration: (CI) is testing automation to check that the application is not broken whenever new commits are integrated into the main branch
  • Continuous Delivery: (CD) is an extension of ‘continuous integration’ to make sure that you can release new changes in a sustainable way
  • Continuous Deployment: a step further than ‘continuous delivery’, every change that passes all stages of your production pipeline is released
  • Continuous Development: a process for iterative software development and is an umbrella over several other processes including ‘continuous integration’, ‘continuous testing’, ‘continuous delivery’ and ‘continuous deployment’
  • Continuous Testing: a process of testing and automating software development.
  • Development: the environment on your computer where you write code
  • DevOps: Development and information techology Operations is the set of practices surrounding CI/CD
  • Production: environment where users access the final code after all of the updates and testing
  • Stage: environment that is as similar to the production environment as can be for final testing

Continuous Development

The software developer concept of ‘continuous delivery’ can be applied to your data science projects and lab.

As we’ve discussed, version control is an important component of modern software development. Critically, version control can also be used in data science applications and for research project management. There are two dominant forms of project management for continuous delivery in open source software: Waterfall and Agile Scrum.

https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment

Agile

Agile development practices involve organizing a team around short term (1-2 week long) ‘sprints’. Sprints are organized by scrum master. Team members are assigned tasks and evaluate their results during sprint reviews and planning sessions.

Waterfall

Similar to the common Gantt chart a waterfall model is a breakdown of project activities into linear sequential phases, where each phase depends on the deliverables of the previous one and corresponds to a specialisation of tasks.

Note

In this workshop, we’re working with GitHub, but there are other services, like GitLab or Bitbucket which might fit your needs better.

Continuous Integration

Doing reproducible science requires you to host your code and versioned software used to complete the analysis, in addition to the actual data. GitHub or Gitlab could become the central point supporting your data science lab.

Powerful uses of GitHub include integration with other web services, like container registries (DockerHub), websites (ReadTheDocs, web sites https://pages.github.com/), continuous integration (CircleCI, Jenkins, Travis), and workflow managers GitHub Actions.

Continous Integration (CI) is a practice of checking code repositories (typically a few times a day) to test for changes which may cause failures.

CI can be integrated into either scientific programming workflows or into code development

The most popular CI tools are:

  • Travis CI - fast, easy to set up, cloud based
  • Circle CI - fast, easy to set up, cloud based
  • Jenkins - free, can be hosted internally (requires server), highly customizable (plugins)

When to use CI?

  • building or hosting services to a community
  • developing versioned copies of containers for public consumption
  • DevOps + Data Science

Travis CI

Setup

Circle CI

Setup

Jenkins

Jenkins is a bit harder to set up because you need a dedicated server

Setup

GitHub Actions

GitHub now offers ‘actions’ which serve as an integrated CI for your repositories

Badges

Status badges can be embedded in a README.md. Badges let you show the state of code or documentation.

You can view a diverse list of different badges on Shields.io

Now you can pass the style GET argument, to get custom styled badges same as you would for shields.io. If no argument is passed, flat is used as default.

STYLE BADGE
flat Flat Badge
flat-square Flat-Square Badge
for-the-badge Badge
plastic Plastic Badge
social Social Badge

Self paced

Circle vs Jenkins vs Travis


Fix or improve this documentation: