I personally use Jira and Confluence at work on a daily basis and I’m aware of some of it’s limitations when it comes to search. And it also happens that I’m obsessed about Natural Language Understanding; specifically, I have been playing around with Question Answering (QA) for more than two years. Initially I implemented this add-on with a simple CQL query and QA. I asked if I would install this myself? The answer was an honest no. It’ll be useful but not enough to justify its position in my add-on list. At some point, somehow I recalled Elon Musk saying this an interview:
"If you're entering anything where there's an existing marketplace, against large, entrenched competitors, then your product or service needs to be much better than theirs ... It can't just be slightly better. It's got to be a lot better."
On the other corner of my brain, Eric Schmidt goes:
“You often hear people talk about search as a solved problem. But we are nowhere near close.”
My only intention was to incorporate QA into Confluence but since I’m already here I thought I might as well take a stab at it. Search is a hard problem. An incredibly hard problem if you want to do it at a world wide web scale but not as much if you’re in a safe, contained, well structured and well intended environment like Confluence. And that’s what all this is about.
Basically indexes information from Confluence & Jira and makes it easily accessible through various forms of search.
We spent more a year intensively researching the Question Answering system which clearly came in handy. Also years of our prior experience in actively studying and researching state-of-the-art machine learning models helped quickly deploy models for People Also Ask and Reverse Image Search features.
We just use ElasticSearch as our primary search engine; MongoDB and Nodejs with Atlassian Connect to glue various microservices together. We use Tensorflow extensively to train and deploy models. It goes without saying that we primarily use Python for all our ML workloads. A messy combination of GRPC & REST for inter-service communication, Redis for cache and j*uery for frontend. I chose jQuery instead of something like React as that would slow me down even further, I already had a lot of things to learn in great painful detail.
We run all our workloads inside a single Kubernetes cluster on Google Cloud. Kubernetes allowed us to dynamically scale ridiculously expensive GPU instances down to zero instance when it’s not being actively used. On top of that we also use preemptible instances to reduce our operating costs even further. We mostly use TPUs for training and GPUs for inference.
We worked on training a machine learning model that automatically builds Knowledge Graph from raw text. It basically extracts relationships with various entities in a paragraph. It was performing relatively well to our surprise! For example, given the wikipedia page of Google as input, the model can generate subject, object verb triplets like below:
Google, subsidiaryOf, Alphabet Google, foundedOn, September 4, 1998 Google, foundedBy, Larry Page Google, foundedBy, Sergey Brin
We never got it interfaced with the rest of our system in time to feature on our demos. I’m super excited about this!
The whole is greater than the sum of its parts. Each of these features can seem incremental on their own, but when put together, they truly are impressive. And hopefully useful.
And if there's any actual commercial interest:
Couple of tips:
Our machine learning models run on preemptible GPUs and scales down to zero instances when not actively in use to reduce cost. As such, some services might experience sporadic degradation.