Introducing the ReactiveInflux: non-blocking InfluxDB driver for Scala and Java supporting Apache Spark

https://github.com/pygmalios/reactiveinflux

I am excited to announce a very first release of ReactiveInflux developed at Pygmalios. InfluxDB missed a non-blocking driver for both Scala and Java. Immutability, testability and extensibility are key features of ReactiveInflux. Comming with a support for Apache Spark it is the weapon of choice.

It internally uses Play Framework WS API which is a rich asynchronous HTTP client built on top of Async Http Client.

Features

  • asynchronous (non-blocking) interface for Scala
  • synchronous (blocking) interface for Scala and Java
  • supports both Spark and Spark streaming
  • immutability
  • testability
  • extensibility

Compatibility

  • InfluxDB 0.11, 0.10 and 0.9 (maybe even older too)
  • Scala 2.11 and 2.10
  • Java 7 and above
  • Apache Spark 1.4 and above

Scala asynchronous (non-blocking) example

val result = withInfluxDb(new URI("http://localhost:8086/"), "example1") { db =>
  db.create().flatMap { _ =>
    val point = Point(
      time        = DateTime.now(),
      measurement = "measurement1",
      tags        = Map("t1" -> "A", "t2" -> "B"),
      fields      = Map(
        "f1" -> 10.3,
        "f2" -> "x",
        "f3" -> -1,
        "f4" -> true)
    )
    db.write(point).flatMap { _ =>
      db.query("SELECT * FROM measurement1").flatMap { queryResult =>
        println(queryResult.row.mkString)
        db.drop()
      }
    }
  }
}

Scala synchronous (blocking) example

implicit val awaitAtMost = 10.seconds
syncInfluxDb(new URI("http://localhost:8086/"), "example1") { db =>
  db.create()

  val point = Point(
    time        = DateTime.now(),
    measurement = "measurement1",
    tags        = Map("t1" -> "A", "t2" -> "B"),
    fields      = Map(
      "f1" -> 10.3,
      "f2" -> "x",
      "f3" -> -1,
      "f4" -> true)
  )
  db.write(point)

  val queryResult = db.query("SELECT * FROM measurement1")
  println(queryResult.row.mkString)

  db.drop()
}

Java synchronous (blocking) example

// Use Influx at the provided URL
ReactiveInfluxConfig config = new JavaReactiveInfluxConfig(
  new URI("http://localhost:8086/"));
long awaitAtMostMillis = 30000;
try (SyncReactiveInflux reactiveInflux = new JavaSyncReactiveInflux(
  config, awaitAtMostMillis)) {
    SyncReactiveInfluxDb db = reactiveInflux.database("example1");
    db.create();

    Map tags = new HashMap<>();
    tags.put("t1", "A");
    tags.put("t2", "B");

    Map fields = new HashMap<>();
    fields.put("f1", 10.3);
    fields.put("f2", "x");
    fields.put("f3", -1);
    fields.put("f4", true);

    Point point = new JavaPoint(
        DateTime.now(),
        "measurement1",
        tags,
        fields
    );
    db.write(point);

    QueryResult queryResult = db.query("SELECT * FROM measurement1");
    System.out.println(queryResult.getRow().mkString());

    db.drop();
}

Apache Spark Scala example

val point1 = Point(
  time        = DateTime.now(),
  measurement = "measurement1",
  tags        = Map(
    "tagKey1" -> "tagValue1",
    "tagKey2" -> "tagValue2"),
  fields      = Map(
    "fieldKey1" -> "fieldValue1",
    "fieldKey2" -> 10.7)
)
sc.parallelize(Seq(point1)).saveToInflux()

Apache Spark streaming Scala example

val point1 = Point(
  time        = DateTime.now(),
  measurement = "measurement1",
  tags        = Map(
    "tagKey1" -> "tagValue1",
    "tagKey2" -> "tagValue2"),
  fields      = Map(
    "fieldKey1" -> "fieldValue1",
    "fieldKey2" -> 10.7)
)
val queue = new mutable.Queue[RDD[Point]]
queue.enqueue(ssc.sparkContext.parallelize(Seq(point1)))
ssc.queueStream(queue).saveToInflux()

Apache Spark Java example

...
SparkInflux sparkInflux = new SparkInflux("example", 1000);
sparkInflux.saveToInflux(sc.parallelize(Collections.singletonList(point)));

Apache Spark streaming Java example

...
SparkInflux sparkInflux = new SparkInflux("example", 1000);
Queue> queue = new LinkedList<>();
queue.add(ssc.sparkContext().parallelize(Collections.singletonList(point)));
sparkInflux.saveToInflux(ssc.queueStream(queue));

Credit to Pygmalios

Top-tech startup based in Bratislava, Slovakia invests into cutting edge technologies to ensure rapid growth in the domain of real-time predictive retail analytics.

Fighting NotSerializableException in Apache Spark

Using Spark context in a class contructor can cause serialization issues. Move the logic and variables to a member method to avoid some of these problems. There are many reasons why you can get this nasty SparkException: Task not serializable. StackOverflow is full of answers but this one was not so obvious. At least not for me.

I had simple Spark application which created direct stream to Kafka, did some filtering and then saved results to Cassandra. When I ran it, I got the exception saying that the filtering task cannot be serialized. Check the code and try to tell me what’s wrong with it:

import akka.actor._

class MyActor(ssc: StreamingContext) extends Actor {
  // Create direct stream to Kafka
  val kafkaStream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, ...)

  // Save raw data to Cassandra
  kafkaStream.saveToCassandra("cassandraKeyspace", "cassandraTableRaw")

  // Get some data from another Cassandra table
  val someTable = ssc.sparkContext.cassandraTable[SomeTable]("cassandraKeyspace", "someTable")

  // Filter and save data to Cassandra
  kafkaStream
    .filter { message =>
      // Whatever logic can be here, the point is that "someTable" is used
      someTable.filter(_.message == message).count > 42
    }
    .saveToCassandra(cassandraKeyspace, cassandraTableAggNewVisitors)

  def receive = Actor.emptyBehavior
}

Ok. Do you see that someTable variable inside the filter function? That’s the cause of the problem. It is an RDD which is, of course, by definition serializable. Firstly I thought that the concrete implementation is for some reason not serializable, but that’s just also wrong way of thinking.

Whom does the variable belong to? I looked at it as a “local” variable inside the class constructor. But it’s not. someTable variable is a public member of the MyActor class! It belongs to the class which is not serializable. (Side note: we don’t want Akka actors to be serializable beacuse it doesn’t make sense to send actors over the wire)

That explains everything. Spark needs to serialize the whole closure and the actor instance is a part of it. Let’s just put the whole logic inside a method. That makes all variables method-local causing that the actor doesn’t have to be serialized anymore.

import akka.actor._

class MyActor(ssc: StreamingContext) extends Actor {
  def init(): Unit = {
    // Create direct stream to Kafka ... the same code as before, only inside this methos
    val kafkaStream = ...
    ...
  }

  init()

  def receive = Actor.emptyBehavior
}

How simple. You’re welcome.

Exclude log4j to use slf4j with logback in a Gradle project

The goal is to remove log4j from all transitive dependencies in a Gradle project and replace it by slf4j and logback. I had to write this after spending an hour again trying to setup proper logging. Nothing complicated, you only need to know what to do.

Add following to your build.gradle file:

dependencies {
    compile "ch.qos.logback:logback-classic:1.1.3"
    compile "org.slf4j:log4j-over-slf4j:1.7.13"
}

configurations.all {
    exclude group: "org.slf4j", module: "slf4j-log4j12"
    exclude group: "log4j", module: "log4j"
}

That’s it. If it still doesn’t work as expected, enable debugging in logback.xml and dig deeper. Good luck.

Remote Monitoring of Apache Cassandra running in Docker via JMX using Datadog

This is a step-by-step guide on how to monitor Apache Cassandra database running as a Docker container using cloud monitoring service Datadog.

1. Create your own Docker image of Cassandra

If you haven’t done it already, create a new Git repository and add two files there:

  • Dockerfile
  • jmxremote.password

Dockerfile:

FROM cassandra:latest

# We need this to enable JMX monitoring for Datadog agent
COPY ./jmxremote.password /etc/cassandra/jmxremote.password
RUN chmod 400 /etc/cassandra/jmxremote.password

COPY ./jmxremote.password /etc/java-8-openjdk/management/jmxremote.password

jmxremote.password:

monitorRole QED

With this we allow user named “monitorRole” with password “QED” to connect to Cassandra using JMX.

2. Run Cassandra Docker image with additional parameters

Run the Docker image created in the step before with two additional environment variables:

  • JVM_OPTS=-Djava.rmi.server.hostname=[HERE GOES HOSTNAME OF YOUR CASSANDRA]
  • LOCAL_JMX=no

By default, Cassandra allows local JMX connections only.

3. Create your own Docker image of Datadog agent

Create new Git repository and put two file there:

  • Dockerfile
  • cassandra.yaml

Dockerfile:

# Agent running a Cassandra monitoring
FROM datadog/docker-dd-agent

# Install JMXFetch dependencies
RUN apt-get update \
&& apt-get install openjdk-7-jre-headless -qq --no-install-recommends

# Add Cassandra check configuration
ADD cassandra.yaml /etc/dd-agent/conf.d/cassandra.yaml

cassandra.yaml:

instances:
  - host: [HERE GOES HOSTNAME OF YOUR CASSANDRA]
    port: [HERE GOES JMX PORT OF YOUR CASSANDRA, TYPICALLY 7199]
    cassandra_aliasing: true
    user: monitorRole
    password: QED
    #name: cassandra_instance
    #trust_store_path: /path/to/trustStore.jks # Optional, should be set if ssl is enabled
    #trust_store_password: password
    #java_bin_path: /path/to/java #Optional, should be set if the agent cannot find your java executable

# List of metrics to be collected by the integration
# Visit http://docs.datadoghq.com/integrations/java/ to customize it
init_config:
  # List of metrics to be collected by the integration
  # Read http://docs.datadoghq.com/integrations/java/ to learn how to customize it
  conf:
    - include:
        domain: org.apache.cassandra.metrics
        type: ClientRequest
        scope:
          - Read
          - Write
        name:
          - Latency
          - Timeouts
          - Unavailables
        attribute:
          - Count
          - OneMinuteRate
    - include:
        domain: org.apache.cassandra.metrics
        type: ClientRequest
        scope:
          - Read
          - Write
        name:
          - TotalLatency
    - include:
        domain: org.apache.cassandra.metrics
        type: Storage
        name:
          - Load
          - Exceptions
    - include:
        domain: org.apache.cassandra.metrics
        type: ColumnFamily
        name:
          - TotalDiskSpaceUsed
          - BloomFilterDiskSpaceUsed
          - BloomFilterFalsePositives
          - BloomFilterFalseRatio
          - CompressionRatio
          - LiveDiskSpaceUsed
          - LiveSSTableCount
          - MaxRowSize
          - MeanRowSize
          - MemtableColumnsCount
          - MemtableLiveDataSize
          - MemtableSwitchCount
          - MinRowSize
      exclude:
        keyspace:
          - system
          - system_auth
          - system_distributed
          - system_traces
    - include:
        domain: org.apache.cassandra.metrics
        type: Cache
        name:
          - Capacity
          - Size
        attribute:
          - Value
    - include:
        domain: org.apache.cassandra.metrics
        type: Cache
        name:
          - Hits
          - Requests
        attribute:
          - Count
    - include:
        domain: org.apache.cassandra.metrics
        type: ThreadPools
        path: request
        name:
          - ActiveTasks
          - CompletedTasks
          - PendingTasks
          - CurrentlyBlockedTasks
    - include:
        domain: org.apache.cassandra.db
        attribute:
          - UpdateInterval

The cassandra.yaml file contains connection information for the Datadog agent and also list of metrics to collect.

4. Run the Datadog agent

Probably it makes sense to run the Datadog Docker image on the same machine as Cassandra so that it can collect metrics about the same HW. But I am not sure about what I am saying here.

5. Enable Cassandra integration in Datadog

To start collecting data you have to install integration in Datadog. Quick check can be to visualise cassandra.latency.one_minute_rate metric which represents number of read/write requests.

Integration testing with Gradle

Integration testing with Gradle

Unit testing works automatically using Gradle, but if you would like to have a separate set of integration tests you need to do a small exercise. Actually they don’t have to be integration tests at all. This guide shows you how to configure Gradle to use any kind of tests and run them independently from others. I will use Scala language here but the same works for any JVM language.

The goal

We are about to define a new Gradle task named itest which will run only tests implemented in a specific folder “src/itest/scala”. The standard built-in task test will work without any change running only tests in “src/test/scala” directory.

Standard Java/Scala Project

We will start with a standard Gradle Java or Scala project. The programming language doesn’t matter here. Typically the directory structure looks like this:

<project root>
  + src
    + main
      + scala
    + test
      + scala
  - build.gradle

Main source code (being tested) resides in “src/main/scala” and all unit tests are in “src/test/scala”.

Where to put integration test classes and how to name them?

We already know where our unit tests are. A good habit is to name them using by the class they test, followed by “Test” or “Spec” suffix. For example if the tested class is named “Miracle” then unit tests for it should go to a class named “MiracleSpec” (or MiracleTest if you like). It’s just a convention, nothing more.

We will use the same principle for integration tests but we will put them inside “src/itest/scala” directory and use “ITest” or “ISpec” suffix. This is also a convention, but it allows us to run them separately from unit tests.

Why a special directory and also a special name suffix?

I recommend to put integration tests physically to a different directory and also use a different naming pattern so that you can distinguish the tests from the rest of your code in many other cases.

For example if you package the whole application into a one big fat JAR and you want to run integration tests only. How would you do that? Some test runners support filtering by class/file name only. You would use “*ISpec” regular expression to achieve it.

Another example is that it is very convenient to right-click a directory in your favourite IDE (IntelliJ IDEA for example) and run all tests inside the directory. Also IDEA allows you to run tests by providing class name pattern which is the reason why I like to use different suffixes for integration and unit tests.

Example project structure

Imagine a Scala project with one implementation class named Fujara (an awesome Slovak musical instrument). Its unit tests are in FujaraSpec class and integration tests in FujaraISpec. Often we need some data for integration tests (itest-data.xml) or logging configuration (logback-test.xml) different from the main application logging configuration.

<project root>
  + src
    + itest
      + resources
        + com
          + buransky
            - itest-data.xml
        logback-test.xml
      + scala
        + com
          + buransky
            - FujaraISpec.scala
    + main
      + resources
        - logback.xml
      + scala
        + com
          + buransky
            - Fujara.scala
    + test
      + scala
        + com
          + buransky
            - FujaraSpec.scala
  - build.gradle

The build.gradle

I am using Gradle 2.4 but this solution has worked for older versions too. I am not going to provide complete build script, but only the parts relevant to this topic.

configurations {
  itestCompile.extendsFrom testCompile
  itestRuntime.extendsFrom testRuntime
}

sourceSets {
  itest {
    compileClasspath += main.output + test.output
    runtimeClasspath += main.output + test.output

    // You can add other directories to the classpath like this:
    //runtimeClasspath += files('src/itest/resources/com/buransky')

    // Use "java" if you don't use Scala as a programming language
    scala.srcDir file('src/itest/scala')
  }

  // This is just to trick IntelliJ IDEA to add integration test
  // resources to classpath when running integration tests from
  // the IDE. It's is not a good solution but I don't know about
  // a better one.
  test {
    resources.srcDir file('src/itest/resources')
  }
}

task itest(type: Test) {
  testClassesDir = sourceSets.itest.output.classesDir
  classpath = sourceSets.itest.runtimeClasspath

  // This is not needed, but I like to see which tests have run
  testLogging {
    events "passed", "skipped", "failed"
  }
}

Run integration tests

Now we should be able to run integration test simply by running “gradle itest” task. In our example it should run FujaraISpec only. To run unit tests in FujaraSpec, execute “gradle test”.

Define other test types

If you would like to use the same principle for functional tests, performance tests, acceptance tests, or whatever tests, just copy&paste the code above and replace “itest” with “ftest”, “ptest”, “atest”, “xtest”, …

Sunrise in Slovakia

Build and release Scala/Java Gradle project in GitLab using Jenkins to Artifactory

I am going to show in detail how to regularly build your project and then how to make a release build. It involves cooperation of a number of tools which I found tricky to set up properly, that’s why I wrote this.

The goal

I am about to show you how to achieve two following scenarios. The first one is how to make a regular development non-release build:

  1. Implement something, commit and push it to GitLab.
  2. Trigger Jenkins build by a web hook from GitLab.
  3. Build, test, assemble and then publish binary JAR to Artifactory repository.

The second and more interesting goal is when you want to build a release version:

  1. Run parametric Jenkins build(s) that uses Gradle release plugin to:
    1. Verify that the project meets certain criteria to be released.
    2. Create Git tag with the release version number.
    3. Modify Gradle project version to allow further development.
    4. Commit this change and push it to GitLab.
  2. Trigger another generic parametric Jenkins build to publish release artifact(s) to Artifactory.

The situation

I will demonstrate the process describing a real Scala project which I build using Gradle. The build server is Jenkins. Binary artifacts are published to a server running free version of Artifactory. Version control system is a free community edition of GitLab. I am sure that you can follow this guide for any Java application. For clarity of this guide let’s assume that your URLs are following:

  • GitLab repository (SSH) = git@gitlab.local:com.buransky/release-example.git
  • Jenkins server = http://jenkins/
  • Artifactory server = http://artifactory/

Project structure

Nothing special is needed. I use common directory structure:

<project root>
  + build (build output)
  + gradle (Gradle wrapper)
  + src (source code)
  + main
    + scala
  + test
    + scala
  - build.gradle
  - gradle.properties
  - gradlew
  - gradlew.bat
  - settings.gradle

Gradle project

I use Gradle wrapper which is just a convenient tool to download and install Gradle itself if it is not installed on the machine. It is not required. But you need to have these three files:

settings.gradle – common Gradle settings for multi-projects, not really required for us

rootProject.name = name

gradle.properties – contains group name, project name and version

group=com.buransky
name=release-example
version=1.0.0-SNAPSHOT

build.gradle – the main Gradle project definition

buildscript {
  repositories {
    mavenCentral()
    maven { url 'http://repo.spring.io/plugins-release' }
  }
  ...
}

plugins {
  id 'scala'
  id 'maven'
  id 'net.researchgate.release' version '2.1.2'
}

group = group
version = version

...

release {
  preTagCommitMessage = '[Release]: '
  tagCommitMessage = '[Release]: creating tag '
  newVersionCommitMessage = '[Release]: new snapshot version '
  tagTemplate = 'v${version}'
}

Add following to generate JAR file with sources too:

task sourcesJar(type: Jar, dependsOn: classes) {
  classifier = 'sources'
  from sourceSets.main.allSource
}

artifacts {
  archives sourcesJar
  archives jar
}

Let’s test it. Run this from shell:

$ gradle assemble
:compileJava
:compileScala
:processResources
:classes
:jar
:sourcesJar
:assemble

BUILD SUCCESSFUL

Now you should have two JAR files in build/libs directory:

  • release-example-1.0.0-SNAPSHOT.jar
  • release-example-1.0.0-SNAPSHOT-sources.jar

Ok, so if this is working, let’s try to release it:

$ gradle release
:release
:release-example:createScmAdapter
:release-example:initScmAdapter
:release-example:checkCommitNeeded
:release-example:checkUpdateNeeded
:release-example:unSnapshotVersion
> Building 0% > :release > :release-example:confirmReleaseVersion
??> This release version: [1.0.0]
:release-example:confirmReleaseVersion
:release-example:checkSnapshotDependencies
:release-example:runBuildTasks
:release-example:beforeReleaseBuild UP-TO-DATE
:release-example:compileJava UP-TO-DATE
:release-example:compileScala
:release-example:processResources UP-TO-DATE
:release-example:classes
:release-example:jar
:release-example:assemble
:release-example:compileTestJava UP-TO-DATE
:release-example:compileTestScala
:release-example:processTestResources
:release-example:testClasses
:release-example:test
:release-example:check
:release-example:build
:release-example:afterReleaseBuild UP-TO-DATE
:release-example:preTagCommit
:release-example:createReleaseTag
> Building 0% > :release > :release-example:updateVersion
??> Enter the next version (current one released as [1.0.0]): [1.0.1-SNAPSHOT]
:release-example:updateVersion
:release-example:commitNewVersion

BUILD SUCCESSFUL

Because I haven’t run the release task with required parameters, the build is interactive and asks me first to enter (or confirm) release version, which is 1.0.0. And then later it asks me again to enter next working version which the plugin automatically proposed to be 1.0.1-SNAPSHOT. I haven’t entered anything, I just confirmed default values by pressing enter.

Take a look at Git history and you should see a tag named v1.0.0 in your local repository and also in GitLab. Also open the gradle.properties file and you should see that version has been changed to version=1.0.1-SNAPSHOT.

The release task requires a lot of things. For example your working directory must not contain uncommitted changes. Or all your project dependencies must be release versions (they cannot be snapshots). Or your current branch must be master. Also you must have permissions to push to master branch in GitLab because the release plugin will do git push.

Setup Artifactory

There is nothing special required to do at Artifactory side. I assume that it is up and running at let’s say http://artifactory/. Of course your URL is probably different. Default installation already has two repositories that we will publish to:

  • libs-release-local
  • libs-snapshot-local

Jenkins Artifactory plugin

This plugin integrates Jenkins with Artifactory which enables publishing artifacts from Jenkins builds. Install the plugin, go to Jenkins configuration, in Artifactory section add new Artifactory server and set up following:

  • URL = http://artifactory/ (yours is different)
  • Default Deployer Credentials
    • provide user name and password for an existing Artifactory user who has permissions to deploy

Click the Test connection button to be sure that this part is working.

Continuous integration Jenkins build

This is the build which is run after every single commit to master branch and push to GitLab. Create it as a new freestyle project and give it a name of your fancy. Here is the list of steps and settings for this build:

  • Source Code Management – Git
    • Repository URL = git@gitlab.local:com.buransky/release-example.git (yours is different)
    • Credentials = none (at least I don’t need it)
    • Branches to build, branch specifier = */master
  • Build Triggers
    • Poll SCM (this is required so that the webhook from GitLab works)
  • Build Environment
    • Gradle-Artifactory integration (requires Artifactory plugin)
  • Artifactory Configuration
    • Artifactory server = http://artifactory/ (yours is different)
    • Publishing repository = libs-snapshot-local (we are going to publish snapshots)
    • Capture and publish build info
    • Publish artifacts to Artifactory
      • Publish Maven descriptors
    • Use Maven compatible patterns
      • Ivy pattern = [organisation]/[module]/ivy-[revision].xml
      • Artifact pattern = [organisation]/[module]/[revision]/[artifact]-[revision](-[classifier]).[ext]
  • Build – Invoke Gradle script
    • Use Gradle wrapper
    • From Root Build Script Dir
    • Tasks = clean test

Run the build and then go to Artifactory to check if the snapshot has been successfully published. I use tree browser to navigate to libs-snapshot-local / com / buransky / release-example / 1.0.1-SNAPSHOT. There you should find:

  • binary JARs
  • source JARs
  • POM files

Every time you run this build new three files are added here. You can configure Artifactory to delete old snapshots to save space. I keep only 5 latest snapshots.

Trigger Jenkins build from GitLab

We are too lazy to manually run the continuous integration Jenkins build that we have just created. We can configure GitLab to do it for us automatically after each push. Go to your GitLab project settings, Web Hooks section. Enter following and then click the Add Web Hook button:

  • URL = http://jenkins/git/notifyCommit?url=git@gitlab.local:com.buransky/release-example.git
    • Hey! Think. Your URL is different, but the pattern should be the same.
  • Trigger = Push events

If you try to test this hook and click the Test Hook button, you may be surprised that no build is triggered. A reason (very often) can be that mechanism is very intelligent and if there are no new commits then the build is not run. So make a change in your source code, commit it, push it and then the Jenkins build should be triggered.

Have a break, make yourself a coffee

This has already been a lot of work. We are able to do a lot of stuff now. Servers work and talk to each other. I expect that you probably may need to set up SSH between individual machines, but that’s out of scope of this rant. Ready to continue? Let’s release this sh*t.

Generic Jenkins build to publish a release to Artifactory

We are about to create a parametric Jenkins build which checks out release revision from git, builds it and deploys artifacts to Artifactory. This build is generic so that it can be reused for individual projects. Let’s start with new freestyle Jenkins project and then set following:

  • Project name = Publish release to Artifactory
  • This build is parameterized
    • String parameter
      • Name = GIT_REPOSITORY_URL
    • Git parameter
      • Name = GIT_RELEASE_TAG
      • Parameter type = Tag
      • Tag filter = *
    • String parameter
      • Name = GRADLE_TASKS
      • Default value = clean assemble
  • Source Code Management – Git
    • Repository URL = $GIT_REPOSITORY_URL
    • Branches to build, Branch Specifier = */tags/${GIT_RELEASE_TAG}
  • Build Environment
    • Delete workspace before build starts
    • Gradle-Artifactory Integration
  • Artifactory Configuration
    • Artifactory server = http://artifactory/ (yours is different)
    • Publishing repository = libs-release-local (we are going to publish a release)
    • Capture and publish build info
    • Publish artifacts to Artifactory
      • Publish Maven descriptors
    • Use Maven compatible patterns
      • Ivy pattern = [organisation]/[module]/ivy-[revision].xml
      • Artifact pattern = [organisation]/[module]/[revision]/[artifact]-[revision](-[classifier]).[ext]
  • Build – Invoke Gradle script
    • Use Gradle wrapper
    • From Root Build Script Dir
    • Tasks = $GRADLE_TASKS

Generic Jenkins build to release a Gradle project

We also need a reusable parametric Jenkins build which runs the Gradle release plugin with provided parameters and then it triggers the generic publish Jenkins build which we have already created.

  • Project name = Release Gradle project
  • This build is parameterized
    • String parameter
      • Name = GIT_REPOSITORY_URL
    • String parameter
      • Name = RELEASE_VERSION
    • String parameter
      • Name = NEW_VERSION
  • Source Code Management – Git
    • Repository URL = $GIT_REPOSITORY_URL
    • Branches to build, Branch Specifier = */master
  • Additional Behaviours
    • Check out to specific local branch
      • Branch name = master
  • Build – Invoke Gradle script
    • Use Gradle wrapper
    • From Root Build Script Dir
    • Switches = -Prelease.useAutomaticVersion=true -PreleaseVersion=$RELEASE_VERSION -PnewVersion=$NEW_VERSION
    • Tasks = release
  • Trigger/call builds on another project (requires Parameterized Trigger plugin)
    • Projects to build = Publish release to Artifactory
    • Predefined parameters
      • GIT_RELEASE_TAG=v$RELEASE_VERSION
      • GIT_REPOSITORY_URL=$GIT_REPOSITORY_URL

Final release build

Now we are finally ready to create a build for our project which will create a release. It will do nothing but call the previously created generic builds. For the last time, create new freestyle Jenkins project and then:

  • Project name = Example release
  • This build is parameterized
    • String parameter
      • Name = RELEASE_VERSION
    • String parameter
      • Name = NEW_VERSION
  • Prepare an environment for the run
    • Keep Jenkins Environment Variables
    • Keep Jenkins Build Variables
    • Properties Content
      • GIT_REPOSITORY_URL=git@gitlab.local:com.buransky/release-example.git
  • Source Code Management – Git
    • Use SCM from another project
      • Template Project = Release Gradle project
  • Build Environment
    • Delete workspace before build starts
  • Build
    • Use builders from another project
      • Template Project = Release Gradle project

 

Let’s try to release our example project. If you followed my steps then the project should be currently in version 1.0.1-SNAPSHOT. Will release version 1.0.1 and advance current project version to the next development version which will be 1.0.2-SNAPSHOT. So simply run the Example release build and set:

  • RELEASE_VERSION = 1.0.1
  • NEW_VERSION = 1.0.2-SNAPSHOT

Tools used

Jenkins plugins (thanks Andreas Mack)

Conclusion

I am sure there must be some mistakes in this guide and maybe I also forgot to mention a critical step. Let me know if you experience any problems and I’ll try to fix it. It works on my machine so there must be a way how to make it working on yours.

Publish JAR artifact using Gradle to Artifactory

So I have wasted (invested) a day or two just to find out how to publish a JAR using Gradle to a locally running Artifactory server. I used Gradle Artifactory plugin to do the publishing. I was lost in endless loop of including various versions of various plugins and executing all sorts of tasks. Yes, I’ve read documentation before. It’s just wrong. Perhaps it got better in the meantime.

Executing following has uploaded build info only. No artifact (JAR) has been published.

$ gradle artifactoryPublish
:artifactoryPublish
Deploying build info to: http://localhost:8081/artifactory/api/build
Build successfully deployed. Browse it in Artifactory under http://localhost:8081/artifactory/webapp/builds/scala-gradle-artifactory/1408198981123/2014-08-16T16:23:00.927+0200/

BUILD SUCCESSFUL

Total time: 4.681 secs

This guy has saved me, I wanted to kiss him: StackOverflow – upload artifact to artifactory using gradle

I assume that you already have Gradle and Artifactory installed. I had a Scala project, but that doesn’t matter. Java should be just fine. I ran Artifactory locally on port 8081. I have also created a new user named devuser who has permissions to deploy artifacts.

Long story short, this is my final build.gradle script file:

buildscript {
    repositories {
        maven {
            url 'http://localhost:8081/artifactory/plugins-release'
            credentials {
                username = "${artifactory_user}"
                password = "${artifactory_password}"
            }
            name = "maven-main-cache"
        }
    }
    dependencies {
        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:3.0.1"
    }
}

apply plugin: 'scala'
apply plugin: 'maven-publish'
apply plugin: "com.jfrog.artifactory"

version = '1.0.0-SNAPSHOT'
group = 'com.buransky'

repositories {
    add buildscript.repositories.getByName("maven-main-cache")
}

dependencies {
    compile 'org.scala-lang:scala-library:2.11.2'
}

tasks.withType(ScalaCompile) {
    scalaCompileOptions.useAnt = false
}

artifactory {
    contextUrl = "${artifactory_contextUrl}"
    publish {
        repository {
            repoKey = 'libs-snapshot-local'
            username = "${artifactory_user}"
            password = "${artifactory_password}"
            maven = true

        }       
        defaults {
            publications ('mavenJava')
        }
    }
}

publishing {
    publications {
        mavenJava(MavenPublication) {
            from components.java
        }
    }
}

I have stored Artifactory context URL and credentials in ~/.gradle/gradle.properties file and it looks like this:

artifactory_user=devuser
artifactory_password=devuser
artifactory_contextUrl=http://localhost:8081/artifactory

Now when I run the same task again, it’s what I wanted. Both Maven POM file and JAR archive are deployed to Artifactory:

$ gradle artifactoryPublish
:generatePomFileForMavenJavaPublication
:compileJava UP-TO-DATE
:compileScala UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:jar UP-TO-DATE
:artifactoryPublish
Deploying artifact: http://localhost:8081/artifactory/libs-snapshot-local/com/buransky/scala-gradle-artifactory/1.0.0-SNAPSHOT/scala-gradle-artifactory-1.0.0-SNAPSHOT.pom
Deploying artifact: http://localhost:8081/artifactory/libs-snapshot-local/com/buransky/scala-gradle-artifactory/1.0.0-SNAPSHOT/scala-gradle-artifactory-1.0.0-SNAPSHOT.jar
Deploying build info to: http://localhost:8081/artifactory/api/build
Build successfully deployed. Browse it in Artifactory under http://localhost:8081/artifactory/webapp/builds/scala-gradle-artifactory/1408199196550/2014-08-16T16:26:36.232+0200/

BUILD SUCCESSFUL

Total time: 5.807 secs

Happyend:
Screenshot from 2014-08-16 16:32:07

Scala for-comprehension with concurrently running futures

Can you tell what’s the difference between the following two? If yes, then you’re great and you don’t need to read further.

Version 1:

val milkFuture = future { getMilk() }
val flourFuture = future { getFlour() }

for {
  milk <- milkFuture
  flour <- flourFuture
} yield (milk + flour)

Version 2:

for {
  milk <- future { getMilk() }
  flour <- future { getFlour() }
} yield (milk + flour)

You are at least curious if you got here. The difference is that the two futures in version 1 (can possibly) run in parallel, but in version 2 they can not. Function getFlour() is executed only after getMilk() is completed.

In the first version both futures are created before they are used in the for-comprehension. Once they exists it's only up to execution context when they run, but nothing prevents them to be executed. I am trying not to say that they for sure run in parallel becuase that depends on many factors like thread pool size, execution time, etc. But the point is that they can run in parallel.

The second version looks very similar, but the problem is that the "getFlour()" future is created only once the "getMilk()" future is already completed. Therefore the two futures can never run concurrently no matter what. Don't forget that the for-comprehension is just a syntactic sugar for methods "map", "flatMap" and "withFilter". There's no magic behind.

That's all folks. Happy futures to you.

Init.d shell script for Play framework distributed applications

I wrote a shell script to control Play framework applications packaged using built-in command dist. Applications packaged this way are zipped standalone distributions without any need to have Play framework installed on the machine that it’s supposed to run on. Everything needed is inside the package. Inside the zip, in the bin directory, there is an executable shell script named just like your application. You can start your application by running this script. That’s all it does, but I want more.

Script setup

Download the script from GitHub and make it executable:
chmod +x ./dist-play-app-initd

Before you run the script, you have to set values of NAME, PORT and APP_DIR variables.

  1. NAME – name of the application, must be the same as the name of shell script generated by Play framework to run the app
  2. PORT – port number at which the app should run
  3. APP_DIR – path to directory where you have unzipped the packaged app

Let’s take my side project Jugjane as an example. I ran “play dist” and it has generated “jugjane-1.1-SNAPSHOT.zip” file. If I unzip it, I get single directory named “jugjane-1.1-SNAPSHOT” which I move to “/home/rado/bin/jugjane-1.1-SNAPSHOT“. The shell script generated by Play framework is “/home/rado/bin/jugjane-1.1-SNAPSHOT/bin/jugjane“. I would like to run the application on port 9000. My values would be:

NAME=jugjane
PORT=9000
APP_DIR=/home/rado/bin/jugjane-1.1-SNAPSHOT

Start, stop, restart and check status

Now I can conveniently run my Play application as a daemon. Let’s run it.

Start

To start my Jugjane application I simply run following:

$ ./dist-play-app-initd start
Starting jugjane at port 9000... OK [PID=6564]

Restart


$ ./dist-play-app-initd restart
Stopping jugjane... OK [PID=6564 stopped]
Starting jugjane at port 9000... OK [PID=6677]

Status


$ ./dist-play-app-initd status
Checking jugjane at port 9000... OK [PID=6677 running]

Stop


$ ./dist-play-app-initd stop
Stopping jugjane... OK [PID=6677 stopped]

Start your application when machine starts

This depends on your operating system, but typically you need to move this script to /etc/init.d directory.

Implementation details

The script uses RUNNING_PID file generated by Play framework which contains ID of the application server process.

Safe start

After starting the application the script checks whether the RUNNING_PID file has been created and whether the process is really running. After that it uses wget utility to issue an HTTP GET request for root document to do yet another check whether the server is alive. Of course this assumes that your application serves this document. If you don’t like (or have) wget I have provided curl version for your convenience as well.

Safe stop

Stop checks whether the process whose ID is in the RUNNING_PID file really belongs to your application. This is an important check so that we don’t kill an innocent process by accident. Then it sends termination signals to the process starting with the most gentle ones until the process dies.

Contribution

I thank my employer Dominion Marine Media allowing me to share this joy with you. Feel free to contribute.

The best code coverage for Scala

The best code coverage metric for Scala is statement coverage. Simple as that. It suits the typical programming style in Scala best. Scala is a chameleon and it can look like anything you wish, but very often more statements are written on a single line and conditional “if” statements are used rarely. In other words, line coverage and branch coverage metrics are not helpful.

Java tools

Scala runs on JVM and therefore many existing tools for Java can be used for Scala as well. But for code coverage it’s a mistake to do so.

One wrong option is to use tools that measure coverage looking at bytecode like JaCoCo. Even though it gives you a coverage rate number, JaCoCo knows nothing about Scala and therefore it doesn’t tell you which piece of code you forgot to cover.

Another misfortune are tools that natively support line and branch coverage metrics only. Cobertura is a standard in Java world and XML coverage report that it generates is supported by many tools. Some Scala code coverage tools decided to use Cobertura XML report format because of its popularity. Sadly, it doesn’t support statement coverage.

Statement coverage

Why? Because a typical Scala statement looks like this (a single line of code):
def f(l: List[Int]) = l.filter(_ > 0).filter(_ < 42).takeWhile(_ != 3).foreach(println(_))

Neither line nor branch coverage works here. When would you consider this single line as being covered by a test? If at least one statement of that line has been called? Maybe. Or all of them? Also maybe.

Where is a branch? Yes, there are statements that are executed conditionally, but the decision logic is hidden in internal implementation of List. Branch coverage tools are helpless, because they don't see this kind of conditional execution.

What we need to know instead is whether individual statements like _ > 0, _ < 42 or println(_) have beed executed by an automated test. This is the statement coverage.

Scoverage to the rescue!

Luckily there is a tool named Scoverage. It is a plugin for Scala compiler. There is also a plugin for SBT. It does exactly what we need. It generates HTML report and also own XML report containing detailed information about covered statements.

Scoverage plugin for SonarQube

Recently I have implemented a plugin for Sonar 4 so that statement coverage measurement can become an integral part of your team's continuous integration process and a required quality standard. It allows you to review overall project statement coverage as well as dig deeper into sub-modules, directories and source code files to see uncovered statements.

Project dashboard with Scoverage plugin:
01_dashboard

Multi-module project overview:
Multi-module project overview:

Columns with statement coverage, total number of statements and number of covered statements:
Columns with statement coverage, total number of statements and number of covered statements:

Source code markup with covered and uncovered lines:
Source code markup with covered and uncovered lines: