Table 3. The results measured the time taken from the user's input of the two models until the model runs and outputs a response.

LLMs Measured Time
Inference Engine: Phi-3.5 (3.82GB) PC Responding from user’s question: 10,809msEncoding user’s input: 10msDecoding LLM’s output: 13ms
Laptop Responding from user’s question: 13,358msEncoding user’s input: 12msDecoding LLM’s output: 13ms
OpenAI API: GPT-4o PC Responding from user’s question: 913ms“openai-processing-ms”: 354ms
Laptop (LAN) Responding from user’s question: 990ms“openai-processing-ms”: 343ms
Laptop (Wi-Fi) Responding from user’s question: 1,498ms“openai-processing-ms”: 380ms