on April 18, 2019
by Hugo Mougard
Harder than Computer Vision, NLP
3 sub-problems
2 projects
Some gory details
gitbase
bblfsh
Types shared across languages
apollo
tmsc
&
snippet-ranger
id2vec
ml
3 targets initial targets
Goal: automate formatting
Must explain false positives
OR
Must not have false positives
Unsupervised learning with explainable rules
Reproduction task: ~94.3% precision
Tweak of the general algo
→ helps to learn some problems
Locality is key
Need to handle sequences of variable lengths
Large vocabularies. To handle them:
You shall know a word by the company it keeps
Which tool will be efficient?
Code = two separate channels:
Ongoing, Early stage. Goal: use expressive models
Predict formatting characters before each leaf of the AST
For the next seminar!
Thank you for your attention!
Questions & Discussion
Hugo Mougard <hugo@sourced.tech>