Selecting from Arrays
In an array-oriented language, perhaps it's no surprise that there are umpteen ways to select values from arrays. There are also many ways to modify or assign values within arrays.
The exact terminology can vary between array languages, and even APLers use these words interchangeably sometimes. However, on this page we will say that:
- Scalars (0-cells) are the things returned by indexing expressions
- Elements (or items) are the arrays inside of scalars. For a simple scalar this is the same thing! Remember enclosing and diclosing scalars before?.
These notes summarise the different constructs available. There is also a Dyalog webinar dedicated to selecting from arrays.
Square bracket indexing
This is the type of indexing we have been using so far. For vectors, it is very intuitive:
'LE CHAT'[6 4 1 2 3 5 6]
THE CAT
For higher rank arrays, we can return rectangular sub-arrays by separating the indices into each axis by semicolons:
(2 3 4⍴⎕A)[1 2;1 3;1 4] ⍝ The corner elements of the cuboid
AD
IL
MP
UX
- What happens if you omit an axis? For example,
array[3;4 5;;]
? - What happens if you use too many or too few semicolons?
Squad (A.K.A. "Functional") indexing
Square-bracket indexing requires you to know the exact rank of the array and have the correct number of semicolons in your indexing expression. You might also notice that it is a special or anomalous syntax.
There is also an index function ⍺⌷⍵
which has two distinctions:
- It is a function with the same syntax as other functions
- It applies to any rank array by automatically filling in less-major cells (those cells defined by trailing axes)
(1 2)(2 3)⌷(2 3 4⍴⎕A)
EFGH
IJKL
QRST
UVWX
(2 3 4⍴⎕A)[1 2;2 3;]
EFGH
IJKL
QRST
UVWX
Take and drop
We can chop off the edges of an array using take ⍺↑⍵
and drop ⍺↓⍵
.
¯1 3 2↑2 3 4⍴⎕A
MN
QR
UV
1 0 ¯2↓2 3 4⍴⎕A
MN
QR
UV
Note
While similar subarrays can be retrieved using indexing, take or drop, note that take and drop return arrays of the same rank as their argument.
≢⍴1 1↑2 3 4⍴⎕A
3
≢⍴1 1⌷2 3 4⍴⎕A
1
Simple indexing
The selection of rectangular sub-arrays as demonstrated above using square brackets []
and squad ⌷
is also known as simple indexing.
Choose indexing
Simple indexing with square brackets uses scalars or vectors separated by semicolons. Index using square brackets and a nested array of numeric vectors and we can select any collection of scalars:
(2 3 4⍴⎕A)[(1 1 1)(2 1 4)(1 3 4)]
APL
An interesting relationship appears between indices into an array and indices into its ravel when ⎕IO←0
:
⎕IO←0
(2 3 4⍴⎕A)[↓[0]2 3 4⊤0 15 11]
APL
⎕A⌷⍨⊂2 3 4⊥↑[0](0 0 0)(1 0 3)(0 2 3)
APL
Reach indexing
Indexing into an array will retrieve some cell of an array. If it is a nested array, then selecting a scalar will return an enclosed array. Sometimes what you actually want is the item inside of that scalar.
While it is common and perfectly valid to simply use first ⊃⍵
to disclose the contents of a scalar, the pick function ⍺⊃⍵
can be used to retrieve the element directly:
3⌷'here' 'are' 'some' 'words' ⍝ With ]Boxing on
┌─────┐
│words│
└─────┘
3⊃'here' 'are' 'some' 'words'
words
Reach indexing allows you to pull items from deep within a nested array:
(2 1)(2 2) ⊃ 2 3⍴0 1 2 (2 3⍴'AB' 'CD' 'EF' 'GH' 'IJ' 'KL') 4 5
IJ
Select / From
Some APLers find squad-index semantics awkward, and have proposed yet another mechanism, called select or from. It can be defined as:
I←(⊃⍤⊣⌷⊢)⍤0 99
Select provides the best of both simple indexing and choose indexing, allowing you to select arbitrary collections of cells.
Warning
Select is a very general and convenient function, but it is potentially much slower than using the in-built indexing constructs. We provide it here for completeness and your interest.
So which type of indexing do I use?
Over time you will learn from experience what is the most appropriate thing to use in different situations. However, here is a rough guide:
Selection type | Selection construct |
---|---|
Arbitrary scalars from a vector | Square bracket simple or compress |
Rectangular subarrays | Simple |
Arbitrary scalars from an array of rank ≥2 | Choose |
Nested arrays | Reach |
Arbitrary collections of cells | Select |
Problem set 7
Search, sort, slice and select
-
Anna, Ben and Charlie are having a competition. They want to see who can eat the most fruit in a week.
fruits ← 4 7⍴'Apples MangoesOrangesBananas' days ← 7 3⍴'SunMonTueWedThuFriSat' names ← 3 7⍴'Anna Ben Charlie' ⎕RL ← 42 1 ⋄ ate ← ?3 4 7⍴3
What is
⎕RL
?The roll function
?⍵
generates random numbers for each simple scalar number in⍵
.Setting the Random Link system variable
⎕RL
lets us generate the same random numbers repeatedly.- Compute the names of the people who ate the most fruit on Tuesday and Sunday combined.
- Compute the name of the person who ate the most mangoes and bananas combined.
- What is the name of the person who ate the most fruit overall?
Answer
There are many different ways to find these answers. The following are just one set of solutions.
-
Anna and Charlie both ate 10 fruits total on Tuesday and Sunday combined. Ben only ate 8 fruits.
d←2 3⍴'TueSun' total ← +/+/ate[;;days⍳d] (total=⌈/total)⌿names
Anna Charlie
-
Charlie ate the most mangoes and bananas across the whole week.
f←2 7⍴'MangoesBananas' total ← +/+/ate[;fruits⍳f;] (total=⌈/total)⌿names
Charlie
-
Anna ate the most fruit overall.
total ← +/+/ate (total=⌈/total)⌿names
Anna
Any of these totals could have been expressed as a single sum. Either by ravelling submatrices for each person:
total ← +/(,⍤2)ate
Or by merging the last two axes:
total ← +/,[2 3]ate
A discussion comparing these expressions will be added later.
-
Write a function
FindWord
which accepts a character matrix left argument⍺
and a character vector right argument⍵
and returns a Boolean vector where a1
indicates a row in⍺
which matches the word⍵
.fruits←↑'apples' 'mangoes' 'oranges' 'bananas' fruits FindWord 'apples' 1 0 0 0 fruits FindWord 'oranges' 0 0 1 0
What is
↑
?We created a nested vector of different length character vectors using strand notation. The mix function
↑⍵
is used to turn this from a nested vector of vectors into a flat matrix made of simple character scalars. In order to make the matrix rectangular, shorter vectors are padded with spaces.' '=↑'apples' 'mangoes' 'oranges' 'bananas' 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Answer
There are many ways to solve this problem. A comparison of different approaches is worthy of a fuller discussion, which will be added later. For now we will simply show a few alternatives:
FindWord ← {∧/∨/⍺∘.=⍵↑⍨2⌷⍴⍺} FindWord ← {∨/(⍵↑⍨⊢/⍴⍺)⍷⍺} FindWord ← {(⍵↑⍨⊢/⍴⍺)(≡⍤1)⍺} FindWord ← {⍺∧.=⍵↑⍨2⌷⍴⍺}
-
From the nested 3D array
nest←2 3 4⍴(⍳17),(⊂2 3⍴'ab'(2 3⍴'dyalog'),'defg'),6↑⎕A
use a single selection to obtain:
- The character scalar
'y'
- The numeric scalar
6
Answers
It can be tricky to simplify these to a single use of pick
⍺⊃⍵
. Although understanding these selections can help with understanding complicated nested array structures, it is not very common to need to do this in real code.-
(2 2 2)(1 2)(1 2)⊃nest y
-
(⊂1 2 2)⊃nest 6
- The character scalar
-
What type of indexing is used in the expression
grid[⍸grille=' ']
?Answer
Because
grille
is a matrix, the equality with the space character is also a matrix. The where function⍸⍵
returns a nested vector of indices, which when used with square brackets forms a choose indexing expression. -
What indexing array can be used to select a simple scalar from itself?
Answer
For choose indexing, an enclosed empty numeric vector:
'a'[⊂⍬]
a
For squad indexing, an empty numeric vector:
⍬⌷'a'
a
For reach indexing, either:
⍬⊃'a'
a
(⊂⍬)⊃'a'
a
-
Define
n←5 5⍴⍳25
in your workspace.Using selections, find at least four different ways to set the bottom-right 3 by 3 submatrix in
n
to0
. For example,(2 2↓n)←0
.Hint
See which primitives may be used in a selective assignment
Answers
Compute the indices:
n[2+⍳3;2+⍳3]←0
Use negative take:
(¯3 ¯3↑n)←0
Use two compressions:
b←2 3/0 1 (b/b⌿n)←0
Positive take after reversals:
(3 3↑⌽⊖n)←0
Visit to the museum
Here are some data and questions about visits to a museum.
The section_names
are the names of each of the four sections in the museum.
section_names ← 'Bugs' 'Art' 'Fossils' 'Sea Life'
The variable sections
is a nested list of text matrices. Each matrix lists the items or creatures which belong to each section.
sections ← ↑¨('Grasshopper' 'Giant Cicada' 'Earth-boring Dung Beetle' 'Scarab Beetle' 'Miyama Stag' 'Giant Stag' 'Brown Cicada' 'Giraffe Stag' 'Horned Dynastid' 'Walking Stick' 'Walking Leaf') ('The Blue Boy by Thomas Gainsborough' ('Rooster and Hen with Hydrangeas by It',(⎕ucs 333),' Jakuch',(⎕ucs 363)) 'The Great Wave off Kanagawa by Hokusai' 'Mona Lisa by Leonardo da Vinci' 'Sunflowers by Vincent van Gogh' 'Still Life with Apples and Oranges by Paul Cézanne' 'Girl with a Pearl Earring by Johannes Vermeer' ('Shak',(⎕ucs 333),'ki dog',(⎕ucs 363),' by Unknown') 'David by Michelangelo di Lodovico Buonarroti Simoni' 'Rosetta Stone by Unknown') ('Amber' 'Ammonite' 'Diplodocus' 'Stegosaur' 'Tyrannosaurus Rex' 'Triceratops') ('Puffer Fish' 'Blue Marlin' 'Ocean Sunfish' 'Acorn Barnacle' 'Mantis Shrimp' 'Octopus' 'Pearl Oyster' 'Scallop' 'Sea Anemone' 'Sea Slug' 'Sea Star' 'Whelk' 'Horseshoe Crab')
The visits
table represents 1000 visits to museum sections over a four week period. The four columns represent:
- The section that was visited as an index into the
section_names
- The day of the visit in Dyalog Date Number format.
- The arrival time in minutes from midnight. For example, 15:30 is 930 minutes.
- The departure time in minutes from midnight.
⎕RL←42 ⋄ days←43589+?1000⍴28 ⋄ (arr lïv)←539+?2⍴⊂1000⍴510 ⋄ section←?1000⍴4
visits←(⊂⍋days)⌷section,days,(⊂∘⍋⌷⊢)⍤1⍉↑arr lïv
In the boolean matrix display
, each row corresponds to a museum piece and each column corresponds to a day. A 1
indicates days when a particular museum piece was out on display. The order of rows corresponds to the order of pieces in the sections
table.
display ← 40 28⍴(9/0),1,(4/0),1,(9/0),1,(3/0),(5/1),0,(10/1),0,(5/1),0,(8/1),0,(8/1),0,(4/1),0 0 1 1 0 1 0 1 0 1 1 0 1 0,(5/1),(3/0),1 0 1 0 1 1,(4/0),1 0 1 1 0 0 1 1 0,(6/1),0 1 0 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 0,(3/1),(3/0),(4/1),0 1 1 0 1 0 0,(7/1),0 1 0 1 1 0 1 1 0 1 1 0,(3/1),0 1 1 0,(4/1),0,(3/1),0 1 0,(3/1),0 0 1 1,(5/0),1 1 0,(3/1),0 1 0 0 1 1,(3/0),(5/1),0,(9/1),0,(3/1),0 1,(3/0),(5/1),0,(3/1),0,(3/1),(3/0),1 1 0 0 1 0 1,(4/0),1 1 0 1 0 1 0 1 0,(9/1),0,(7/1),0,(3/1),0 0 1 1 0 1 1 0 0 1 0 0 1 0,(5/1),0 1,(3/0),1 1 0 1 0 0,(3/1),0,(4/1),0 0 1 1,(7/0),(3/1),(3/0),1 1,(3/0),1 1 0 1 0 1,(6/0),1 1,(4/0),1 0 1 1,(5/0),1 0 1 0 1,(6/0),(3/1),(9/0),1 1,(3/0),1 0 1 0 1 1,(13/0),1 1,(11/0),1 0 1 1,(4/0),1 0 0,(4/1),0,(12/1),0,(5/1),0 1 0 0 1 1 0,(5/1),0,(4/1),0,(4/1),0 0 1,(5/0),1 1,(3/0),(8/1),0 0 1,(3/0),1,(3/0),1,(3/0),1 0 0 1 0 1 0 1 0 1 0 1 1 0,(3/1),(4/0),(3/1),0,(3/1),0 1 1,(3/0),(4/1),0 1 1 0 1 1,(3/0),1 1 0 1 0 1 0 1,(6/0),1 1,(14/0),(8/1),(4/0),(8/1),0,(3/1),0,(4/1),(6/0),1 0 0 1 1,(3/0),1 1 0 0 1 0 1 0 0 1 0 1,(5/0),1 0 0 1 0 1 0 0 1 1,(3/0),1,(8/0),1 0 1 0,(6/1),0 0,(7/1),0 1 1 0,(3/1),0,(9/1),0,(12/1),0 1 1 0,(9/1),0,(3/1),0 0,(3/1),(3/0),(3/1),0,(3/1),(5/0),(7/1),0 1 0,(5/1),0,(3/1),0 0,(3/1),0 0 1 1 0,(4/1),0 1,(3/0),(3/1),(5/0),1 0 1 1 0 1 0,(3/1),0,(5/1),0,(3/1),0,(4/1),0 1,(4/0),1 0 1 0 0 1 1,(5/0),1,(3/0),1 0 0 1 0 1,(3/0),1 0 1 0 0 1,(4/0),1 0 0 1,(6/0),1,(14/0),1 0 0,(4/1),(3/0),(6/1),0 0 1 0,(3/1),0,(4/1),0,(3/1),0 1 0 1,(3/0),(5/1),(3/0),1 0 0 1 0,(3/1),0 1,(4/0),1 0 1 1,(11/0),1,(15/0),(3/1),(4/0),1,(15/0),(5/1),0 1 0,(8/1),0,(3/1),(4/0),(5/1),0 1,(9/0),1 0 1 1 0 0 1 0 0 1,(4/0),1 0,(4/1),0,(7/1),(3/0),1 0 0 1,(3/0),(3/1),0 1 1
- How many visitors arrived before 10AM?
- What was the most popular section by visit duration?
- Estimate the opening and closing times of each of the sections.
- Which animal being on display corresponded with the highest increase in visit duration for its section?