CAMERA

Machine Learning With CameraX and MLKit

With a single page of code, turn your normal camera app into a machine learning app.

Himanshu Choudhary

Published in

The Startup

8 min readJan 4, 2021

Project Overview

Today, we will be going to build an Android Camera App that will detect the objects which will come in the camera preview. In this project, we will be using a custom ML model or you can also use your own, if you have any.

Note: The detection of the objects will depend on the camera quality and the trained ML model.

Don't’ know about cameraX? The new camera library by Android.

Don’t worry!! You can check that out in the blog below.

CameraX: HDR, Night and Portrait mode

Simplest way to click and analyse photos in Android

medium.com

What we are going to build ?

👇🏼

Which libraries we will be using in this project?

MLKit
CameraX

Is there any experience required in ML to make this application?

Let’s start with dependencies

For cameraX, we will require 3 dependencies:

implementation "androidx.camera:camera-camera2:1.0.0-rc01"
implementation "androidx.camera:camera-lifecycle:1.0.0-rc01"
implementation "androidx.camera:camera-view:1.0.0-alpha20"

For MLKit, we only need custom model detection dependency:

implementation 'com.google.mlkit:object-detection-custom:16.3.0'

Since we will be using custom models, it is required to tell Android Studio not to compress the imported ML model.

And for data binding, we have to enable it in the app’s Gradle file with the kotlin-kapt plugin.

After adding all the dependencies and plugin, your app’s Gradle file will look like this:

plugins {
    id 'com.android.application'
    id 'kotlin-android'
    id 'kotlin-kapt' // <-- Add this plugin for data binding
}

android {    ... // This represents the existing code    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }
    kotlinOptions {
        jvmTarget = '1.8'
    }
    aaptOptions {
        noCompress "tflite" // To avoid ML model compression
    }
    dataBinding {
        enabled = true // To enable data binding
    }
}

dependencies {   ....  // This represents the existing dependencies.    implementation 'com.google.mlkit:object-detection-custom:16.3.0'

    implementation "androidx.camera:camera-camera2:1.0.0-rc01"
    implementation "androidx.camera:camera-lifecycle:1.0.0-rc01"
    implementation "androidx.camera:camera-view:1.0.0-alpha20"

}

It’s time to integrate ML Model😰

We will get our model from TensorFlow Hub. You can find n number of models for different use cases. In this project, we will be using an image classification model which will be used in analyzing the images.

The model we will be using is published by Google and you can find that here.

With MLKit, we can only use custom models with tflite and lite extensions. If you use any other extension model then it will throw an exception while analyzing the image.

Why am I always specifying model as a custom model ?

There is a difference between a model and a custom model. With the model, we mean a normal model and that can be from any source, a third-party library can also be defined as a model. MLKit also provides different APIs for different models such as for text recognition, object detection, etc. but with custom models, we are trying to say that we are using an actual trained ML model not any library.

Once we had downloaded the model, let’s paste it into the assets folder of the app module.

Let’s Code 🧑‍💻

Time to add Data Binding in our MainActivity

<?xml version="1.0" encoding="utf-8"?>
<layout
    xmlns:android="http://schemas.android.com/apk/res/android"
    xmlns:tools="http://schemas.android.com/tools"
    tools:context=".MainActivity">

    <RelativeLayout
        android:id="@+id/layout"
        android:layout_width="match_parent"
        android:layout_height="match_parent">

        <androidx.camera.view.PreviewView
            android:id="@+id/previewView"
            android:layout_width="match_parent"
            android:layout_height="match_parent"/>

    </RelativeLayout>

</layout>

Here, PreviewView is the view where we see the preview of our camera.

With layout tag we enable the data binding for an activity/fragment.

Add late initialized variables of ObjectDetector and ProcessCameraProvider

private lateinit var objectDetector: ObjectDetector
private lateinit var cameraProviderFuture : ListenableFuture<ProcessCameraProvider>

ObjectDetector is an MLKit property that is used to detect the objects from the image frames.

ProcessCameraProvider is used to bind the camera with the lifecycle of any LifecycleOwner within an application’s process.

ListenableFuture is a future that accepts completion listeners. Each listener has an associated executor, and it is invoked using this executor once the future’s computation is done. To know more about the future click here

We will initialize the variable in the onCreate() method of our MainActivity.

cameraProviderFuture = ProcessCameraProvider.getInstance(this)

cameraProviderFuture.addListener({
    // get() is used to get the instance of the future.
    val cameraProvider = cameraProviderFuture.get()
    // Here, we will bind the preview
}, ContextCompat.getMainExecutor(this))

Now, we will get our trained ML model from the assets folder to use it with the object detector.

val localModel = LocalModel.Builder()
        .setAssetFilePath("object_labeler.tflite")
        .build()

object_labeler is the name by which we had saved the model in the assets folder and tflite is its extension.

There are some options that we have to provide to the ObjectDetector which are used as parameters which detection.

val customObjectDetectorOptions =
        CustomObjectDetectorOptions.Builder(localModel)            .setDetectorMode(CustomObjectDetectorOptions.STREAM_MODE)
                .enableClassification()
                .setClassificationConfidenceThreshold(0.5f)
                .setMaxPerObjectLabelCount(3)
                .build()

We had passed the local model which we have created by fetching the model from the assets folder.
CustomObjectDetectorOptions.STREAM_MODE is used to tell the object detector that we will detect the objects from a live stream/ live camera/video.
enableClassification() is used to get the different classification that is given for the detected object in the model.
ClassificationConfidenceThreshold is setting a 50% margin for the object detector which means that if the object detector is more than 50% sure of the object detected then only it will pass that object to us.

Now, we will initialize the objectDetector by passing the customObjectDetectionOptions to it.

objectDetector =
        ObjectDetection.getClient(customObjectDetectorOptions)

All the above code is done in the onCreate() method of the MainActivity which now looks like this:

override fun onCreate(savedInstanceState: Bundle?) {
    super.onCreate(savedInstanceState)
    binding = DataBindingUtil.setContentView(this, R.layout.activity_main) // data binding 

    cameraProviderFuture = ProcessCameraProvider.getInstance(this)

    cameraProviderFuture.addListener({
        val cameraProvider = cameraProviderFuture.get()
        // Here, we will bind the preview
    }, ContextCompat.getMainExecutor(this))

    val localModel = LocalModel.Builder()
            .setAssetFilePath("object_labeler.tflite")
            .build()

    val customObjectDetectorOptions =
            CustomObjectDetectorOptions.Builder(localModel)
                    .setDetectorMode(CustomObjectDetectorOptions.STREAM_MODE)
                    .enableClassification()
                    .setClassificationConfidenceThreshold(0.5f)
                    .setMaxPerObjectLabelCount(3)
                    .build()

    objectDetector =
            ObjectDetection.getClient(customObjectDetectorOptions)
}

Let’s bind the camera preview with the lifecycle

Make a cameraX preview which we will integrate with the PreviewView defined in the xml file by getting its surfaceProvider.

val preview : Preview = Preview.Builder().build()
preview.setSurfaceProvider(binding.previewView.surfaceProvider)

We will define some conditions for the camera such as using an opening lens, any filter (which is not required today) using CameraSelector.

val cameraSelector : CameraSelector = CameraSelector.Builder()
        .requireLensFacing(CameraSelector.LENS_FACING_BACK)
        .build()

Before detecting any objects we need to have an image from which we will detect those objects. Now, to get that image we will use the ImageAnalysis.

val point = Point()
val size = display?.getRealSize(point)
val imageAnalysis = ImageAnalysis.Builder()
        .setTargetResolution(Size(point.x, point.y))       .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
        .build()

With setTargetResolution, we are passing the device’s screen resolution.
ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST is used in the case of a live stream or a video. With this strategy, we allow the image analyzer to skip some image frames in the case where the camera moves too fast. If we don’t skip those frames then it will look like our app is lagging.

Now, we will pass the ImageAnalysis object to the ImageAnalyzer where it will return the imageProxy from which we will get the image to detect the objects.

Setting an analyzer will signal to the camera that it should begin sending data. The executor passed is the executor in which imageProxy will be run.

imageAnalysis.setAnalyzer(ContextCompat.getMainExecutor(this), { imageProxy ->
    val rotationDegrees = imageProxy.imageInfo.rotationDegrees
    val image = imageProxy.image
    if(image != null) {
        val inputImage = InputImage.fromMediaImage(image, rotationDegrees)
        objectDetector
                .process(inputImage)
                .addOnFailureListener {
                    imageProxy.close()
                }.addOnSuccessListener { objects ->
               // Here, we get a list of objects which are detected.
                    imageProxy.close()
                }
    }
})

First, we will get the rotation degrees from the imageInfo and image from the imageProxy.

After feeding the image and rotationDegrees to the InputImage, we will get the image which will be processed by the ObjectDetector.

And once it is processed successfully, we will get the result of detected objects in the onSuccessListener. In case of error, we will get an exception in the onFailureListener.

Note: We have close the imageProxy after the result is received either a success or a failure. The closing will allow the next image to process.

The last step, binding the cameraProvider with the lifecycle of the LifecycleOwner

cameraProvider.bindToLifecycle(this as LifecycleOwner, cameraSelector, imageAnalysis, preview)

The cameraProvider is passed from the cameraProviderFuture which we are listening to in the onCreate() method.

override fun onCreate(savedInstanceState: Bundle?) {
    ....cameraProviderFuture.addListener({
        val cameraProvider = cameraProviderFuture.get()
        bindPreview(cameraProvider) // Call the function here
    }, ContextCompat.getMainExecutor(this))

    ....
}@SuppressLint("UnsafeExperimentalUsageError")
private fun bindPreview(cameraProvider : ProcessCameraProvider) {
    val preview : Preview = Preview.Builder().build()
    preview.setSurfaceProvider(binding.previewView.surfaceProvider)

    val cameraSelector : CameraSelector = CameraSelector.Builder()
            .requireLensFacing(CameraSelector.LENS_FACING_BACK)
            .build()

    val point = Point()
    val size = display?.getRealSize(point)
    val imageAnalysis = ImageAnalysis.Builder()
            .setTargetResolution(Size(point.x, point.y))
            .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST)
            .build()

    imageAnalysis.setAnalyzer(ContextCompat.getMainExecutor(this), { imageProxy ->
        val rotationDegrees = imageProxy.imageInfo.rotationDegrees
        val image = imageProxy.image
        if(image != null) {

            val inputImage = InputImage.fromMediaImage(image, rotationDegrees)
            objectDetector
                    .process(inputImage)
                    .addOnFailureListener {
                        imageProxy.close()
                    }.addOnSuccessListener { objects ->
                        
                        imageProxy.close()
                    }
        }
    })

    cameraProvider.bindToLifecycle(this as LifecycleOwner, cameraSelector, imageAnalysis, preview)
}

UnsafeExperimentalUsageError is due to the image which we are getting from the imageProxy. It says that imageProxy is a wrapper on the image but it is not always true that it will return an image when imageProxy.image is called.

Since cameraX is in the beta stage, hopefully, it will be solved by the time it moves to production.

Now, we have detected the objects but only our app knows that for users to see them we will have to draw something around them like a rectangle.

When an object is detected, apart from its name/label, it also provides us a boundingBox. A boundingBox is a Rect that gives the coordinates of the object which is detected.

We know the coordinates, so let’s draw a rectangle around that object.

We will create a Draw class for drawing the dynamic rectangle view around the object.

class Draw(context: Context?, var rect: Rect, var text: String) : View(context) {

    lateinit var paint: Paint
    lateinit var textPaint: Paint

    init {
        init()
    }

    private fun init() {
        paint = Paint()
        paint.color = Color.RED
        paint.strokeWidth = 20f
        paint.style = Paint.Style.STROKE

        textPaint = Paint()
        textPaint.color = Color.RED
        textPaint.style = Paint.Style.FILL
        textPaint.textSize = 80f
    }

    override fun onDraw(canvas: Canvas) {
        super.onDraw(canvas)
        canvas.drawText(text, react.centerX().toFloat(), react.centerY().toFloat(), textPaint)
        canvas.drawRect(react.left.toFloat(), react.top.toFloat(), react.right.toFloat(), react.bottom.toFloat(), paint)
    }
}

We are passing a Rect and String variable along with Context to the Draw class.

rect contains the coordinates of the detected object
text is the label associated with that object. It could be the name/type of it.

In draw class, we have created 2 Paint variables, one for drawing the rectangle boundary and another for drawing the text from the center of that rectangle.

Let’s draw the rectangle

objectDetector
        .process(inputImage)
        .addOnFailureListener {
            imageProxy.close()
        }.addOnSuccessListener { objects ->
            for( it in objects) {
                if(binding.layout.childCount > 1)  binding.layout.removeViewAt(1)
                val element = Draw(this, it.boundingBox, it.labels.firstOrNull()?.text ?: "Undefined")
                binding.layout.addView(element,1)

            }
            imageProxy.close()
        }

In onSuccessListener,

First, we will check if a rectangle box is already drawn on the screen by checking the child count. If the child count is greater than 1 then the rectangle view is already drawn.
If the view is already drawn then we will remove it.
After removing, we will add a new rectangle view to the parent view.

Everything is done.

Now, it’s

Let’s run the app.

If this blog helped you in some way, hit the clap icon 👏🏼 on your left.

If there is something I am missing, comment below.

Happy coding !! See you next time😉

Still, having some doubts?

Check out the video!!