home / hackernews / items

items: 47250423

This data as json

id type by time text dead parent poll url score title descendants deleted kids parts cached_at
47250423 comment simonw 1772643618 It&#x27;s the number of active parameters for a Mixture of Experts (misleading name IMO) model.<p>Qwen3.5-35B-A3B means that the model itself consists of 35 billion floating point numbers - very roughly 35GB of data - which are all loaded into memory at once.<p>But... on any given pass through the model weights only 3 billion of those parameters are &quot;active&quot; aka have matrix arithmetic applied against them.<p>This speeds up inference considerably because the computer has to do less operations for each token that is processed. It still needs the full amount of memory though as the 3B active it uses are likely different on every iteration. 0 47250369           0 [47252645]   1774123218453
Powered by Datasette · Queries took 1.243ms