On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside significantly larger models; it outpaces DeepSeek-V3.2, which scores 70.2%, ...
Rachel Williams has been an editor for nearly two decades. She has spent the last five years working on small business content to help entrepreneurs start and grow their businesses. She’s well-versed ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results