Things need not go the way we plan. I planned to be a founder and CEO by 40. From my set of “friends, family, and fools”, I looked hard for the fools, but in vain. The money from friends and family is never enough to run a business. I planned to drive the best cars. I planned to be the most remarkable data scientist in the world. I planned to have money! 

I planned to understand how and why large language models work and to find out when you call a model a small language model. Surprisingly, there is no standard definition, and the boundary between an SLM and an LLM is pretty blurred. At one point, a 400 million parameter model was an LLM. We consider a model of that size an SLM today. I guess I can call a 3 billion parameter model an SLM these days. It is not wrong. This means we are raising our bar quite high and quite fast in what we expect these models to deliver to us. Talk about raising expectation bars in quick commerce.

“Give me 10 minutes. I am in the middle of something. I will call you back.” You get the call after two and a half days. “We have almost reached. We will be there in 10 minutes. You guys start ordering the starters.” They reach after 50 minutes. “It will hardly take me 10 minutes to build the application end to end.” The message is just that building the application is not very difficult. Therefore, the true definition of 10 minutes is “You mind your own business. I will mind mine.”

Getting back to SLMs and parameters, I recently learned that we count only the weights of the ANN (Artificial Neural Network) to arrive at the number of model parameters. The earlier school of thought was to count also the biases. Hence, you are not wrong to take either approach. It goes without saying that SLMs are the models of choice when we want to run them on consumer hardware or edge devices for inferencing. I am still not talking about “training” SLMs. I am talking about “inferencing” using the pretrained SLMs. 

The smaller size of the SLM means that hosting and inferencing resource requirements and costs come down. It can be easily deployed in resource-constrained environments. The reduction in quality is less than proportional to the decrease in size. Therefore, we don’t lose much. Speed increases. SLMs are usually created focusing on specific tasks or a specific domain. This is achieved by ensuring that the training dataset used to build the SLM is of high quality and relevant to the intended purpose of the SLM. Compression techniques such as pruning and quantisation are utilised. 

As you would guess, an SLM will not have the same generalisation capabilities as an LLM. Hence, we avoid using them for diverse topics. My only concern is that we must not reduce the size of SLMs to achieve specialisation so much that it stops being a language model. After all, human language interpretation and generation are the hallmark properties of all these models. Unlike their elder brothers, SLMs might not handle complex user queries or long contexts well. 

“Papa, Papa, Papa!” (screamed out thrice within 2 seconds) before I could respond with “Haan Kanna.” “Papa, inge pare. Fire Engine and Doctor Bus damaal. You rescue the Fire Engine while I rescue the Doctor Bus.” The first thing that crossed my mind listening to the rescue operation was “Koi dushman thhens lagaaye, to meet jiyaa bahalaaye / Manameet jo ghaanw lagaaye, use kaun mitaye?” However, I said, “Don’t disturb me, please. You rescue both. I am planning something.”

To end this article with the same spirit, I advise all sincere planners to keep the following two lines handy, just in case. “Hum se mat poochho kaise, mandir tootaa sapanon kaa / Logon kee baat naheen hai, ye kissaa hain apanon kaa.

Linkedin
Disclaimer

Views expressed above are the author's own.

END OF ARTICLE