Atlassian Codegeist 2020 Demo

Follow our entry on DevPost and leave a shout-out there if you find it mildly useful!

In-depth Walkthroughs

Question Answering

Permissions & Restrictions

Federated Search

Computer Vision for Images

Inspiration

I personally use Jira and Confluence at work on a daily basis and I’m aware of some of it’s limitations when it comes to search. And it also happens that I’m obsessed about Natural Language Understanding; specifically, I have been playing around with Question Answering (QA) for more than two years. Initially I implemented this add-on with a simple CQL query and QA. I asked if I would install this myself? The answer was an honest no. It’ll be useful but not enough to justify its position in my add-on list. At some point, somehow I recalled Elon Musk saying this an interview:

"If you're entering anything where there's an existing marketplace, against large, entrenched competitors, then your product or service needs to be much better than theirs ... It can't just be slightly better. It's got to be a lot better."

On the other corner of my brain, Eric Schmidt goes:

“You often hear people talk about search as a solved problem. But we are nowhere near close.”

My only intention was to incorporate QA into Confluence but since I’m already here I thought I might as well take a stab at it. Search is a hard problem. An incredibly hard problem if you want to do it at a world wide web scale but not as much if you’re in a safe, contained, well structured and well intended environment like Confluence. And that’s what all this is about.

What it does

Basically indexes information from Confluence & Jira and makes it easily accessible through various forms of search.

How we built it

We spent more a year intensively researching the Question Answering system which clearly came in handy. Also years of our prior experience in actively studying and researching state-of-the-art machine learning models helped quickly deploy models for People Also Ask and Reverse Image Search features.

We just use ElasticSearch as our primary search engine; MongoDB and Nodejs with Atlassian Connect to glue various microservices together. We use Tensorflow extensively to train and deploy models. It goes without saying that we primarily use Python for all our ML workloads. A messy combination of GRPC & REST for inter-service communication, Redis for cache and j*uery for frontend. I chose jQuery instead of something like React as that would slow me down even further, I already had a lot of things to learn in great painful detail.

We run all our workloads inside a single Kubernetes cluster on Google Cloud. Kubernetes allowed us to dynamically scale ridiculously expensive GPU instances down to zero instance when it’s not being actively used. On top of that we also use preemptible instances to reduce our operating costs even further. We mostly use TPUs for training and GPUs for inference.

Challenges we ran into

Accomplishments that we're proud of

We worked on training a machine learning model that automatically builds Knowledge Graph from raw text. It basically extracts relationships with various entities in a paragraph. It was performing relatively well to our surprise! For example, given the wikipedia page of Google as input, the model can generate subject, object verb triplets like below:

Google, subsidiaryOf, Alphabet
        Google, foundedOn, September 4, 1998
        Google, foundedBy, Larry Page
        Google, foundedBy, Sergey Brin
        

We never got it interfaced with the rest of our system in time to feature on our demos. I’m super excited about this!

What we learned

The whole is greater than the sum of its parts. Each of these features can seem incremental on their own, but when put together, they truly are impressive. And hopefully useful.

What's next for Semantica

And if there's any actual commercial interest:

Don't take our word for it, try it yourself!

Install on Confluence Install on Jira

Couple of tips:

  • Once you install the add-on the crawler will run in the background to scrape the contents, search results might not be accurate until this process completes.
  • Search question expressed in natural language ("Why is the sky blue?") to trigger our Question Answering system
  • Create frequently asked questions pages with "FAQ" in their title to trigger People Also Ask system
  • Clear your cache in the Search Settings page if you made permission changes, as building an ACL currently takes couple hundred calls to Atlassian

Our machine learning models run on preemptible GPUs and scales down to zero instances when not actively in use to reduce cost. As such, some services might experience sporadic degradation.

Search on.