Tools, Tech, and Security to Apply in a Machine Learning Startup
Based on an interview with Nicolas Joseph, Vice President of Engineering at Datalogue Inc.
Nicolas leads a Machine Learning startup Datalogue, which is building a platform to allow big enterprises to clean up their data automatically. He formerly worked in computer security at the Department of Defense in France and has had experience establishing his own company before. In our previous article, we talked a lot about teams, people, and collaboration. This post is devoted to the technical side of what Nicolas does as the CTO.
Machine learning at the core
According to Nicolas, the core technology behind Datalogue is the capacity to train a neural network at scale with their infrastructure on customer data for them to be able to use personal neural networks for their data to understand them better.
“We have a very generalizable architecture that we put on our platform and that we are able to train on different types of data and specialize, depending on the use cases that our customers have.”
Nicolas says they are doing NLP for their customers, either CNN or RNN, depending on the task. Their system has an internal streaming engine and inference, and the combination of these two makes it able to work on data at scale. To make sure that their software runs well on the hardware, Nicolas and his team partner closely with media and some global companies such as Dell and HP. Such collaboration allows them to optimize the bandwidth to ensure maximum performance. In point of fact, they managed to drive extremely good results in this field:
“Machine learning model is actually much faster than a human. One of our customers reported to us that our model spotted 30 percent more accurate than humans.”
Tools to enhance workflows
Among the tools that Nicolas and his team prefer, some are well known and others a bit rare but still quite useful. One of the latter, Clubhouse, is a story tracking system. Nicolas likes that it gives him the burndown charts on epics and an ability to define one’s own milestone.
“It gives you a prediction based on your private burndown chart like when you will be done with a milestone. It’s a really nice tool. Before that, we were using GitHub issues, but because of the distributed nature of the project, aggregating everything into a central thing was much better.”
One of tech leaders’ responsibilities is to review the code written by their colleagues. For that purpose, Datalogue uses GitHub. However, Nicolas doesn’t like the GitHub review page. In his opinion, it’s not the best way to do code reviews because there’s a lack of context awareness.
“You don’t really know how people organize the code, for example, in terms of directories. It removes some context you can click through on, like functions or definitions to actually know what’s happening.”
One of the tools that Nicolas wishes was out there is an intelligent one for code reviews that has powerful features like being able to click through the code and comment on it. Such a tool could have made his life much easier, Nicolas admits. Also, he wouldn’t mind using a tool with automated code review capabilities.
“Computers need to be able to write code to be able to review code properly; we’re not there yet. So, until that’s the case, I think we still need humans to do it.”
The gold standard of security
Nicolas has a rich experience in ensuring security, as his previous job was at the Department of Defense in France. Every time there’s something sensitive, Nicolas and his team try to be proactive and think about how someone could take advantage of it. In terms of the culture, he says, one should always refer to the standards.
“‘Hey, we have a problem. We’re not the first one to have that problem.’ Don’t ship your own cryptography thing; that’s the worst idea ever. Just look for like prebuilt modules, how people have solved this problem before, and just apply the solution. I’ve never faced a security problem that hasn’t been solved by someone else before.”
The other side of the coin is compliance. So, Datalogue start talking about security and the clients’ own requirements at the sales stage. What they’re doing is both negotiating with security reps on some security aspects and integrating as they see more and more questionnaires and more and more security requirements.
For ensuring security, Datalogue uses password encryption, open ID authentication integration, CRF protections, and personal identification tokens for users for when they use SDKs—as well as https, SSL, encryption in transit, and other must-haves for security. Additionally, shipping their product on premises puts Nicolas and his company in a very privileged position because their product is usually protected by security measures that their customers have because it sits behind their firewall on the infrastructure.
“In my mind, another important part is how you ingrain security in the company. We’ve enacted a company password manager. We ask people or everyone to use 2FA on all the services they use for the company. Making that effort and setting those policies in place is a way for people to really be cognizant about this and feel and think about security every day.”