Work-In-Progress

armonika

Encoding/decoding/handling of variable length numbers in bit addressable memory

Instead of an ad with tracking…
Like to donate some appreciation for the use or inspiration this gives you?

Welcome to the Wonderful World of self describing streaming numbers

When everything is relative and you missed the origin

variable length/serial bus vs. fixed width/parallel.

advantage of serial/small footprint with self terminiating encoding.

The terminator marks either that “0” or “1” would continue indefinitely.

3 components,

the streaming i/o port
simple arithmetic and logic.
validation

implelemted as pull-event driven.

serial has the added advantage of a smaller footprint.

Encodes a delimiter in the data stream. That is N-consecutive same values. To allow that pattern to occur in data, the third bit

Encoding/decoding/handling of variable length numbers in bit addressable memory.

Addressing data in memory requires two components: location (address) and size (bit count).

With fixed width memory (multiple of power-of-2 bytes) the maximum storage size is determined at compile-time whereas with variable-width memory the unrestricted storage size is determined at run-time.

This project is an experimental environment for storing data in bit-addressable memory. It is intended to bridge the worlds of the xtools and untangle projects.

The proposed encoding has these features:

Streaming.
No maximum size.
Simple and fast encoding/decoding.
Simple and fast length detection.
Simple functional implementation of bitwise and integer arithmetic operators.
Simple hardware implementation.

@date 2020-07-04 02:28:54

This encoding gives two major benefits.

There is no maximum to the amount of storage you want to address.
Every value has an infinite number of leading “0” or “1”’s. Overflows will never occur because storage and encoding is variable-width. Being variable-width, strings no longer need a null terminator. end-of-sequence is embedded in the encoding of the value. The only thing needed to know is where the sequence ends, not how. Encounding is immune for “Y2K-bug” and “8-bit/64-bit compatible” issues.

Encoding/decoding scheme

When you lost track of how many bits there are in an int…

    After n consecutive "0", force emit (on encoding) a "1" which is (force ignored) during decoding.
    n+1 consecutive "0" indicitates leading-zeros/end-of-number.

untangle in unfamiliar with the concept “two-complement” whereas xtools assumed its presence. Negative Two-complement numbers have an infinite number of leading “1”. Extend the encoding that:

    After n consecutive "1", force emit (on encoding) a "0" which is (force ignored) during decoding.
    n+1 consecutive "1" indicitates leading-ones/end-of-number.

Examples

The following tables demonstrates the encoding with nthe numbers 0-19.

The binary input is read right-to-left, the string output is read left-to-right.

Input binary	Output N=2	output N=3
—-<	>——-	>——-
00000	000	0000
00001	1000	10000
00010	01000	010000
00011	11000	110000
00100	0011000	0010000
00101	101000	1010000
00110	011000	0110000
00111	1101000	1110000
01000	00101000	000110000
01001	10011000	10010000
01010	0101000	01010000
01011	110011000	11010000
01100	001101000	00110000
01101	1011000	10110000
01110	01101000	01110000
01111	11011000	111010000
10000	0010011000	0001010000
10001	100101000	1000110000
10010	010011000	010010000
10011	1100101000	110010000

The first number that does not fit in 32 bits:

for N=2. 43691 (0x0000aaab) which is encoded as “110011001100110011001100110011000”.
for N=3. 629144 (0x00099998) which is encoded as “110011001100110011001100110011000”.
for N=4. 2098063 (0x0020038f) which is encoded as “111100001111000010000100001100000”.
for N=5. 2220512 (0x0021e1e0) which is encoded as “000001111100000111110000011000000”.

For unsigned encoding only the number 15 is different: N=2, signed “11011000”, unsigned “111010000”
N=2, signed “111010000”, unsigned “11110000”

Unsigned

Although the encoding supports variable-length negative numbers, this functionality is not always used. Most procedural languages with 32/64 bits wordsize all work in modulo 2^32 (or 64) numerical space. “-1” is not “minus 1” which is negative, “-1” is actually “-1 modulo 2^32 (or 64)” which is unsigned and therefor positive.

An additional implementation is included for unsigned variable-length numbers.

Frequency count

Lowest N is simplest implementation, higher N is greater storage efficiency. To aid in making a choice the following table contains frequency counts for encoded lengths with different settings of N. Counted are all the values in range 0-65535.

length	N = 2	N = 3	N = 4	N = 5	notes
3	1
4	1	1
5	2	1	1
6	2	2	1	1
7	4	4	2	1
8	6	6	4	2
9	10	12	8	4
10	16	22	14	8
11	26	40	28	16
12	42	74	54	30
13	68	136	104	60
14	110	250	200	118
15	178	460	386	232
16	288	846	744	456	<— maximum length with binary encoding
17	466	1556	1434	896
18	754	2862	2764	1762
19	1220	5264	5328	3464
20	1972	9682	10270	6810
21	3162	14614	19796	13388
22	4924	15076	16940	26320
23	7176	9836	6384	10606
24	9370	3864	1022	1324
25	10540	842	52	38
26	9900	84
27	7570	2
28	4600
29	2160
30	754
31	184
32	28
33	2

Using N=2 are clearly not efficient and using N>2 does not show significant differences.

Taking N=3 as default has an average of ~22 bits which is ~40% more storage than 16-bits binary encoded.

With unsigned encoding:

length	N = 2	N = 3	N = 4	N = 5	notes
3	1
4	1	1
5	2	1	1
6	3	2	1	1
7	6	4	2	1
8	11	7	4	2
9	20	14	8	4
10	37	27	15	8
11	68	52	30	16
12	125	100	59	31
13	230	193	116	62
14	423	372	228	123
15	778	717	448	244
16	1431	1382	881	484	<— maximum length with binary encoding
17	2632	2664	1732	960
18	4841	5135	3405	1904
19	8904	9898	6694	3777
20	13793	19079	13160	7492
21	15106	17263	25872	14861
22	10812	7178	11215	29478
23	4846	1351	1602	5816
24	1281	95	63	271
25	176	1		1
26	9

The average encoding length is somewhere between 20 and 22. With N=2 the encoding has possibly the simplest implementation with a high yield.

Add/Subtract

@date 2020-07-03 01:39:28

Considering that add/subtract are the most basic arithmetic operator The only moment that raises an issue is when the result of the subtract is negative. Negative also reflects an active carry-out.

The core logic of add/subtract is identical.

Per bit:

emit = carryin ^ left ^ right;
carryout = carryin ? left right : left & right;

For ADD the initial carry-in is “0” For SUB the initial carry-in is “1” and right is bit-inverted. Implies that all subtracts can be rewritten as additions. With no subtract functionality being used, removed the use of an active carry-out.

Source code

Grab one of the tarballs at https://github.com/RockingShip/smile/releases or checkout the latest code:

  git clone https://github.com/RockingShip/armonika.git

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the GNU General Public License v3 - see the LICENSE.txt file for details