Data Input/Output
Although we have ⎕IO
, "IO" in APL can still refer to input/output.
Hello, World!
If you have seen any kind of computer programming before, you are probably aware of a famous program called "Hello, World!".
Here it is in APL:
⎕←'Hello, World!'
If you have learned programming before, maybe it is strange to have gotten so far in an introductory tutorial without meeting the language's "print" function.
Note
By default, non-assignment expressions output results to the session log. We strongly recommend using ⎕←
when you deliberately intend for an expression to print to the session log. You are then able to search for print statements for easier debugging.
Back 2 School 4 Maths
The function Test
will ask the user n
single-digit arithmetic questions, and return the number of correct answers.
∇ points←Test n;answer;solution
[1] points←0
[2] :While n>0
[3] solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10
[4] answer←⎕
[5] points+←answer≡solution
[6] n-←1
[7] :EndWhile
[8] ⎕←'You scored',points,'points'
∇
Test 3
Examine the Test
function and try it out. Which line asks for user input?
Note
The ∇
del representation of the Test
function above is the vector representation result of ⎕VR'Test'
which can be directly input into the session. Copy the Test
function above, paste it into the session and press Enter to define the Test
function in your workspace.
You will see that it is quite possible to cheat the Test
function by entering the same expression that it asks. To be even more sly, simply move the text cursor with the up arrow to the printed problem statement and press Enter
.
To ameliorate this, we can verify and fix input with ⎕VFI
. Also note the use of quote-quad ⍞
.
∇ points←Test2 n;answer;input;solution;valid
[1] points←0
[2] :While n>0
[3] solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10
[4] input←⍞
[5] (valid answer)←⎕VFI input
[6] answer←valid/answer
[7] points+←answer≡,solution
[8] n-←1
[9] :EndWhile
[10] ⎕←'You scored',points,'points'
∇
Test2 3
In this case, "fix" means to define as an APL value in the workspace, as if it had been typed into the session.
Verify and Fix Input ⎕VFI
is used when you need to process numeric data from an external source, but it has arrived in a text format. This is very common when working with data from the internet or from files.
You might be tempted to use the Execute function ⍎⍵
but this is very dangerous because it will execute any text as APL code.
More about ⎕VFI
(valid numbers) ← ⎕VFI text
The Boolean vector valid
indicates the locations of elements in numbers
which were converted from text
.
By default, any valid number representation - including engineering exponential notation XeY
and complex numbers of the form xJy
- surrounded by spaces is considered valid input. You may provide a list of separator characters as left argument.
⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌───────────┬──────────────────┐
│1 1 1 1 0 0│7 4.3 1000 3J4 0 0│
└───────────┴──────────────────┘
' ,'⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌─────────────┬──────────────────────┐
│1 1 1 1 1 1 0│7 4.3 1000 3J4 5 300 0│
└─────────────┴──────────────────────┘
Convenient text input
Single quotes '
in APL character arrays must be escaped by doubling. It can be sometimes easier to paste input by assigning ⍞
:
text←⍞
My great string 'which has some quoted text'
]Repr text
'My great string ''which has some quoted text'' '
Note
The user command ]Repr
can generate APL expressions which produce most arrays. In some sense, it is like an inverse to execute ⍎
. There is also a utility function ⎕SE.Dyalog.Utils.repObj
which can be used in code, but we do not recommend using it in applications; use the primitives to test the properties of arrays, as explained in the sections on error handling.
Convenient text output
Once upon a time, APL was considered an incredible, revolutionary tool for scientists, artists and business people alike to be able to get work done using computers. In a time before spreadsheet software was so ubiquitous, APL terminals offered a way to quickly and easily process data, produce reports and format them for printing.
Take a look at the Chapter F of Mastering Dyalog APL for how to use the formatting functionality of ⍕
and ⎕FMT
.
-
It is easy (but inefficient) to round numbers to a specific precision with dyadic format
⍕
:⎕←rand←?5⍴0 0.2225024074 0.3282243862 0.314984696 0.9533625773 0.757200184 ⍎2⍕rand 0.22 0.33 0.31 0.95 0.76
- Write a function equivalent to
{⍎⍺⍕⍵}
without using⍎
or⍕
. - Why does
{⍎⍺⍕⍵}
fail for large values of⍺
?
- Write a function equivalent to
-
The following expression formats the current date as YY/MM/DD.
Change the expression to produce YYYY-MM-DD.'I2,2(</>,ZI2)'⎕FMT 1 3⍴100|3↑⎕TS
-
In Dyalog version 18.0,
1200⌶
(twelve hundred eye beam) can convert date times into human readable formats according to some specification. For example:'Dddd Mmmm Doo YYYY'(1200⌶)1⎕dt⊂3↑⎕ts ┌──────────────────────────┐ │Wednesday August 12th 2020│ └──────────────────────────┘
Write a function
DTFMT
to generate a similar output using a 3-element vector like3↑⎕TS
. That is,- Full day name
- Full month name
- Ordinal day number (1st, 2nd 3rd, 4th etc.)
- Full year number
DTFMT 2020 8 12 ┌──────────────────────────┐ │Wednesday August 12th 2020│ └──────────────────────────┘
Native Files
The term "Native Files" refers to any type of file on a hard disk. These can be text or media files, or even executable files. Usually we are interested in various kinds of text files.
⎕CSV
Comma separated values are a very common and convenient . While we encourage you to read the documentation for a full description, here is an overview of features of ⎕CSV
:
- Read data from and write data to files directly
data ← ⎕CSV '/path/to/file.csv'
- Separate the header (first row) from the rest of the data
(data header) ← ⎕CSV '/path/to/file.csv' ⍬ ⍬ 1
- Treat specific columns of input as numeric or text, depending on the options provided.
The4
here indicates to convert numeric values if possible, else keep the value as text.numeric_if_possible ← ⎕CSV '/path/to/file.csv' ⍬ 4
- Use a separator other than commas, using the "Separator" variant option, for example using tabs (
⎕UCS 9
) for Tab Separated Values (.tsv).
tsv ← ⎕CSV⍠'Separator' (⎕UCS 9)⊢'/path/to/file.csv'
- Read data chunks at a time so as to not fill the workspace, using the "Records" variant option.
path ← '/path/to/file.csv' ⍝ The file path as simple character vector ReadCSV10←⎕CSV⍠'Records' 10 ⍝ A function to read CSV 10 records at a time tn←path ⎕NTIE 0 ⍝ Tie the file - this locks it from use by other applications first10 ← ReadCSV10 tn ⍝ Read the first 10 records (rows) second10 ← ReadCSV10 tn ⍝ Read the next 10 ≢¨first10 second10 10 10 first10 second10 ┌──────────┬──────────┐ │┌──┬─────┐│┌──┬─────┐│ ││1 │JQZUK│││11│DECJM││ │├──┼─────┤│├──┼─────┤│ ││2 │ANPYW│││12│PXPGL││ │├──┼─────┤│├──┼─────┤│ ││3 │WYVSR│││13│SYSCN││ │├──┼─────┤│├──┼─────┤│ ││4 │ZOGOX│││14│EKDPS││ │├──┼─────┤│├──┼─────┤│ ││5 │CXKRS│││15│XCOHA││ │├──┼─────┤│├──┼─────┤│ ││6 │BFTYO│││16│RDAHR││ │├──┼─────┤│├──┼─────┤│ ││7 │VFLAS│││17│KPUTW││ │├──┼─────┤│├──┼─────┤│ ││8 │BAFYD│││18│TPDOD││ │├──┼─────┤│├──┼─────┤│ ││9 │XPEBP│││19│BGIVA││ │├──┼─────┤│├──┼─────┤│ ││10│UVBFG│││20│IITSO││ │└──┴─────┘│└──┴─────┘│ └──────────┴──────────┘ ⎕NUNTIE tn ⍝ Don't forget to untie the file after use!
⎕JSON
JSON is not only a convenient way to represent nested data structures, but also a convenient data representation for the modern web since it is natively handled by JavaScript. ⎕JSON
converts between APL arrays, including namespaces and text vector representations of JSON.
'ns'⎕NS⍬
ns.var←1 2 3
ns.char←'abc'
⎕JSON ns
{"char":"abc","var":[1,2,3]}
A JSON object in Dyalog uses dot-syntax to access members. Some JSON object keys are invalid APL names, so Dyalog works around this using special characters:
(⎕JSON'{"$name": "steve", "3var": "what"}').⎕nl-⍳9
┌─────┬─────────┐
│⍙3var│⍙⍙36⍙name│
└─────┴─────────┘
Be aware of incompatible namespaces, although most of the time you will be converting data rather than namespaces.
'ns'⎕NS⍬
ns.fn←{⍵}
⎕JSON ns
DOMAIN ERROR: JSON export: item "fn" of the right argument cannot be
converted (⎕IO=1)
⎕JSON ns
∧
Recall the expression for an empty JSON object.
Using ⎕JSON
, we can also display error information in a human-readable format.
⎕XML
XML is a format that has fallen out of favour in recent years, but is still useful to be able to import and export it easily when you need to.
Text Files
Generally the ⎕N...
family of system functions are for reading and writing native files as described in the documentation. ⎕NGET
and ⎕NPUT
are useful for reading and writing text files without having to tie and untie them.
(⊂words)⎕NPUT'data/words.txt' ⍝ Write words to a unicode text file
(content encoding newline)←⎕NGET'data/words.txt' ⍝ Read words from a unicode text file
words←(⎕UCS newline)((~∊⍨)⊆⊢)content ⍝ Split words on each new line
⎕N...
This is a quick summary. For more details see the Native Files cheat sheet and system functions and variables A-Z in the online documentation.
In the chapter on selecting from arrays there was an example of reading a text file using ⎕NGET
. Before Dyalog version 15.0, reading text files required a couple of extra steps. Some ⎕N...
native file functions are general and can be used to read and write any type of file. As a simple example, here we tie the file words.txt, read the data and store it in a variable, and finally untie the file.
Note
For multi-user systems, take care to set appropriate file access permissions when using ⎕NCREATE
, ⎕NTIE
and ⎕NLOCK
.
tn←'assets/words.txt'⎕NTIE 0
⎕←10↑words←(⎕UCS 10)(≠⊆⊢)⎕NREAD tn 82(⎕NSIZE tn)0
┌─┬───┬────┬────┬─────┬────┬──────┬────┬──────┬────┐
│A│A's│AA's│AB's│ABM's│AC's│ACTH's│AI's│AIDS's│AM's│
└─┴───┴────┴────┴─────┴────┴──────┴────┴──────┴────┘
⎕NUNTIE⎕NNUMS
⎕MAP
The memory mapping function ⎕MAP
associates a file on disk with an APL array in the workspace. This is useful if you are working with data that cannot fit inside the available workspace memory. One approach might be to read the data in chunks and process one chunk at a time (for example, see the "Records" variant option for ⎕CSV
). Another approach is to use ⎕MAP
.
Component files
If it is only APL systems that need to store data, the most convenient and efficient way to store that data is in APL component files.
Here we will briefly look at the basic usage of component files. A full treatment of component files is provided in Chapter N of Mastering Dyalog APL and more information can be found in the component file documentation.
System functions that deal with component files begin ⎕F...
.
Tie and untie
In Dyalog, component files have the extension .dcf (Dyalog Component File) and must be tied and untied.
A component file may be exclusively tied (⎕FTIE
) or have a shared tie (⎕FSTIE
). With an exclusive tie, no other process may access the file.
tn←'cfile'⎕FCREATE 0 ⍝ The file is exclusively tied
⎕FUNTIE tn ⍝ The file is untied, it can now be used by other applications and processes
The next time we want to use this file, we can use ⎕FTIE
instead of ⎕FCREATE
. The right argument to these functions specifies a tie number (which can be different each time the file is tied), but with a right argument of 0
the next available tie number is used (component file tie numbers start at 1).
tn←'cfile'⎕FTIE 0 ⍝ The file on disk is cfile.dcf, but this extension is assumed if not specified
The structure of a component file is analogous to a nested vector of arrays. We add new values by appending them to the end of a file.
(3 3⍴⍳9)⎕FAPPEND tn
(↑'Dave' 'Sam' 'Ellie' 'Saif')⎕FAPPEND tn
nested←2 2⍴'this' 0 'that' (1 2 3)
nested ⎕FAPPEND tn
Each array stored in a component file (a component) is referred to by its index in the file (its component number), starting from 1 (not affected by ⎕IO
).
⎕FREAD¨tn,¨1 2 3
┌─────┬─────┬────────────┐
│1 2 3│Dave │┌────┬─────┐│
│4 5 6│Sam ││this│0 ││
│7 8 9│Ellie│├────┼─────┤│
│ │Saif ││that│1 2 3││
│ │ │└────┴─────┘│
└─────┴─────┴────────────┘
A component can be replaced by any other array.
'Hello'⎕FREPLACE tn 2
⎕FREAD tn 2
Hello
Use ⎕FSIZE
to find the range of components and file size:
⎕FSIZE tn
1 4 1744 1.8446744073709552E19
The elements of ⎕FSIZE
are:
[1]
The number of the first component[2]
1 + the number of the last component (that is, where a new component will be if⎕FAPPEND
is used)[3]
The current size of the file in bytes[4]
The file size limit in bytes
Components can be removed from the beginning or end of a component file, with the ⎕FDROP
function analogous to ⍺↓⍵
.
⎕FDROP tn 1
⎕FDROP tn ¯1
⎕FREAD¨tn,¨1 2 3
FILE INDEX ERROR: cfile.dcf: No such component
⎕FREAD¨tn,¨1 2 3
∧
⎕FREAD tn 2 ⍝ Only component number 2 remains
Dave
Sam
Ellie
Saif
After use, don't forget to untie all tied component files using ⎕FUNTIE ⎕FNUMS
.
Multi-user access
If you are working on a system through which multiple users need to access the same component files, it is important to become familiar with multi-user access techniques and potential pitfalls. In particular, you will need to use ⎕FSTIE
, ⎕FHOLD
, ⎕FSTACK
and probably ⎕AN
.
Multi-user access can mean manual access by actual human users, or automated access by separate computers or processes.
Downloading data from the internet
The HttpCommand class is built on top of the Conga framework for TCP/IP communications. At the most basic level, it can be used to perform HTTP requests to retrieve data from servers.
]Get HttpCommand
#.HttpCommand
⍴(#.HttpCommand.Get 'https://google.com').Data
14107
Using HttpCommand
with ⎕FIX
is a way to download APL code from the internet.
For more information, see the online documentation for HttpCommand. Alternatively, there is documentation within the comments of the code for the HttpCommand class; simply use )ed HttpCommand
or press Shift+Enter with the text cursor on the name in the session.
Problem set 13
Indian Summer
IndiaRainfall.csv is a file of comma separated values. It is adapted from IndiaRainfallSource.csv to remove incomplete records.
The India Meteorological Department(IMD) has shared this dataset under Govt. Open Data License - India. It can be downloaded from the links above or from the Kaggle data science website.
The data contains the total measured monthly rain fall in millimeters for 30
regions in India from the years 1915
to 2015
inclusive.
-
Load the data into the workspace
By default,
⎕CSV
will load all fields as text data:⎕←3↑1 2↓⎕CSV'assets/IndiaRainfall.csv'
With the following parameters,
⎕CSV
will try to interpret all fields as numeric, and fall back to text if that fails. It will also import the first line as a separate array:(raindata header)←⎕CSV'assets/IndiaRainfall.csv' ⍬ 4 1 ⎕←3↑0 2↓raindata
Bonus
Try reading IndiaRainfallSource.csv and removing the missing records for yourself. When data sets contain a very small amount of missing data, sometimes it is appropriate to estimate those values in a process called imputation. Most of the time, it is best to just remove the sections containing missing records.
-
What was the total rainfall in Punjab in 1995?
- Which month in which region had the highest rainfall?
- Use a least squares linear fit to estimate the total rainfall in all 30 regions in 2018
-
Use a least squares linear fit to estimate the total rainfall in Punjab in 2018
Hint
No one would expect you to derive an expression for the least squares linear fit with little APL experience. If you have done it, kudos to you. The expression
Mv(⊢⌹1,∘⍪⊣)Nv
from APLcart will compute coefficients of a least squares linear fit given a vector of X valuesMv
and a vector of Y valuesNv
. -
Inspect the data in IndiaRainfallSource.csv to see how close the true values were to your estimates. Were they within your standard error?
Hint
If the error
e
is a vector of the differences between Y values predicted by the linear fit and the actual Y values\[e_i=Y_i^{\text{predicted}}-Y_i^{\text{actual}}\]then an estimate for the variance is given by
\[s^2=\sum_{i=1}^n{{e_i^2}\over{n-2}}\]where the standard deviation (standard error) is \(s\).
MarkDown Sort
Write a program which reads in a markdown file, rearranges the sections by the alphabetical order of their headers, and writes the sorted file to a new file. For extra credit, include a method by which the user can decide whether to overwrite the existing file or provide the name or path to a new file. For example files, feel free to use any of the source files for these course materials.
Fun facts
If you are not very familiar with the workings of modern software, you might be surprised to see how accessible file types are. Many text editors might try to open a wide range of files by interpreting their data as text. In the audio editing program Audacity, native files can be inspected and manipulated as audio waveforms. These are a couple of techniques used in an art style called databending.