90% accuracy claim. #1

varadhbhatnagar · 2020-09-16T11:17:22Z

It is mentioned on the website (https://codist-ai.com/) that this model gives 90% accuracy.
Can you elaborate what exactly is this accuracy and how is it measured?

rcshubhadeep · 2020-09-16T14:51:32Z

Hello @varadhbhatnagar we did a pre-training on a corpus of over 6M lines of logical python code. (we injected some special tokens such as (indent), (dedent) etc. to keep the logical structure of the code.) And then we fine tuned the model on a Binary Classification problem. Where the model is shown a pair of tokens, where we have the first set of tokens from code and the second set from the comments and the task is to predict whether they match or not. We fine tuned for this task using about 35K pairs. In that task, at the training time, the training F1 score reaches 90%.

Hope this answers your question.

varadhbhatnagar · 2020-09-16T15:08:43Z

Thanks. Is there a paper associated with this project? And it this related to the Microsoft Codebert in anyway?

rcshubhadeep · 2020-09-16T15:15:38Z

It is not related to MS Codebert (Apart from sharing the same name). The methodology we followed is inspired by the CuBERT paper with our own methods and ideas blend into it. We have not published any paper on yet. But the model is open sourced for everyone to use.

rcshubhadeep · 2020-09-16T15:15:58Z

Thanks for asking the questions 👍

varadhbhatnagar · 2020-09-19T12:58:26Z

I wanted to get an idea about the method complexity that this model can handle. For training and testing, did you use simple methods similar to files in test_files directory ?

rcshubhadeep · 2020-09-20T12:06:25Z

Hi,

We fine tuned this model on the task using py150k Dataset

But just to clarify this dataset has 150K Python files. We used our open source library tree-hugger to mine those files to create a (method, docstring) tuple dataset. We then swapped about 50% of those docstrings and marked them as a negative class while the rest is positive. And then used the pretrained model for fine tuning on this task.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

90% accuracy claim. #1

90% accuracy claim. #1

varadhbhatnagar commented Sep 16, 2020 •

edited

Loading

rcshubhadeep commented Sep 16, 2020 •

edited

Loading

varadhbhatnagar commented Sep 16, 2020 •

edited

Loading

rcshubhadeep commented Sep 16, 2020

rcshubhadeep commented Sep 16, 2020

varadhbhatnagar commented Sep 19, 2020

rcshubhadeep commented Sep 20, 2020

90% accuracy claim. #1

90% accuracy claim. #1

Comments

varadhbhatnagar commented Sep 16, 2020 • edited Loading

rcshubhadeep commented Sep 16, 2020 • edited Loading

varadhbhatnagar commented Sep 16, 2020 • edited Loading

rcshubhadeep commented Sep 16, 2020

rcshubhadeep commented Sep 16, 2020

varadhbhatnagar commented Sep 19, 2020

rcshubhadeep commented Sep 20, 2020

varadhbhatnagar commented Sep 16, 2020 •

edited

Loading

rcshubhadeep commented Sep 16, 2020 •

edited

Loading

varadhbhatnagar commented Sep 16, 2020 •

edited

Loading