Post Reply 
Statistics variables and commands
12-11-2023, 08:39 PM
Post: #1
Statistics variables and commands
Hello!

What are the way to calculate
MEAN, MODE and MEDIAN for a given list
from inside of a program?

Carlos

Carlos - Brazil
Time Zone: GMT -3
http://area48.com
Visit this user's website Find all posts by this user
Quote this message in a reply
12-11-2023, 10:55 PM (This post was last modified: 12-11-2023 11:06 PM by StephenG1CMZ.)
Post: #2
RE: Statistics variables and commands
For which calculator or programming language?
Here are some of my examples for the HP Prime PPL:
various means and medians:
https://www.hpmuseum.org/forum/thread-98...ight=means
mode:
https://www.hpmuseum.org/forum/thread-94...t=list+api

Stephen Lewkowicz (G1CMZ)
https://my.numworks.com/python/steveg1cmz
Visit this user's website Find all posts by this user
Quote this message in a reply
12-12-2023, 04:58 PM
Post: #3
RE: Statistics variables and commands
I'm guessing Carlos may have intended this to apply to a 50g, since that is in his hand in his avatar. Smile Hopefully he will clarify. I'll assume that is indeed the case until we hear otherwise.

There's a built-in MEAN command, but that operates on a matrix that must be stored in the special global variable ΣDAT. Since Carlos mentioned that his data is already in list form, it's easy enough to do this manually with no dependence on the built-in statistical variables.

The following assumes that the input is always a list of numeric values, with at least one element in the list. An empty list is undefined, and will cause errors for the calculations of mean and median.

Here's one way to find the arithmetic mean:
Code:
\<<
  DUP SIZE
  0 ROT +
  \GSLIST
  SWAP /
\>>

The only extra consideration needed with the above is that ΣLIST throws an error when given a list with only 1 element. I worked around this by adding a 0 element, which insures that there will be at least 2 elements in the list (we already ruled out an empty list as invalid).

Calculating the median is a bit more interesting, only in that the final result depends on whether the element count is even or odd. Here's one way to compute the median:
Code:
\<<
  SORT
  DUP SIZE 2 /
  IF
    DUP FP
  THEN
    GET
  ELSE
    GETI
    UNROT GET
    + 2 /
  END
\>>

Perhaps the most important concept in the above approach is in determining the list size/2. That intermediate result can then be used in multiple ways. Namely:
  • The presence of a fractional part implies an odd number of list elements (note the "IF" clause).
  • When used as a list index, the 50g will round that number up since the fractional part would always be 0.5 for an odd number of list elements.
  • The lack of a fractional part in that result signals an even number of list elements, and gives us the first of the two indices of values to be averaged for the result.

Determining the mode of a list of numbers is definitely more complex than the mean or median. While it can certainly be done with standard RPL commands, I'm inclined to make use of some specific functions from the ListExt library that make short work of this:
Code:
\<<
  SORT
  LRPCT
  EVAL
  KSORT
  DUP REV HEAD
  MPOS
  LPICK
\>>

The result of this program is always a list, since there can be more than one value that meets the criteria. The list elements are the mode values for the given input.

I'm reasonably certain that there's no more than 1-2 people that could make sense of the above (including myself Smile). Using the following input list as an example, here's an explanation of the steps:

Given { 5 1 3 1 5 1 3 1 3 3 4 2 } as input

SORT sorts the given list of numbers in increasing order
{ 1 1 1 1 2 3 3 3 3 4 5 5 }

LRPCT (List RePeat CounT) converts the sorted list to a list containing 2 sublists: the unique list elements, and a corresponding list containing the counts of each element encountered.
{ { 1 2 3 4 5 } { 4. 1. 4. 1. 2. } }

EVAL simply explodes the above list of sublists into two separate lists to be left on the stack.
{ 1 2 3 4 5 }
{ 4. 1. 4. 1. 2. }

KSORT (Key SORT) sorts both lists on the stack using the list of elements in stack level 1 as the keys.
{ 2 4 5 1 3 }
{ 1. 1. 2. 4. 4. }

DUP REV HEAD obtains the final element from the stack level 1 list (which in this case is the highest count).
{ 2 4 5 1 3 }
{ 1. 1. 2. 4. 4. }
4.

MPOS results in a list showing the position of each stack level 1 item in the list in stack level 2. So this gives us the positions of all elements with the (same) highest count.
{ 2 4 5 1 3 }
{ 4. 5. }

LPICK simply extracts the elements in the stack level 2 list that are in the positions indicated in the stack level 1 list.
{ 1 3 }

So the mode of the given input list is a list containing the elements 1 and 3, since both of those appear the maximum count of times.
Find all posts by this user
Quote this message in a reply
12-12-2023, 05:36 PM
Post: #4
RE: Statistics variables and commands
...and of course I realized an even shorter way to compute the mode right after I posted that novel:
Code:
\<<
  SORT
  LRPCT
  EVAL
  DUP LMAX
  MPOS
  LPICK
\>>

You can probably figure out what LMAX does without explanation. Smile
Find all posts by this user
Quote this message in a reply
Post Reply 




User(s) browsing this thread: 1 Guest(s)