Data Input/Output
Although we have ⎕IO
, "IO" in APL can still refer to input/output.
This page refers to APL tools for reading and writing data to and from files, databases, and the internet. If you are already familiar with Python, R or .NET then you can use one of the external language bridges to bring data into APL from files via one of these languages. However, it can be simpler and faster in many cases to use one of the following tools.
Hello, World!
If you have seen any kind of computer programming before, you are probably aware of a famous program called "Hello, World!".
Here it is in APL:
⎕←'Hello, World!'
If you have learned programming before, maybe it is strange to have gotten so far in an introductory tutorial without meeting the language's "print" function.
Note
By default, non-assignment expressions output results to the session log. We strongly recommend using ⎕←
when you deliberately intend for an expression to print to the session log. You are then able to search for print statements for easier debugging.
Back 2 School 4 Maths
The function Test
will ask the user n
single-digit arithmetic questions, and return the number of correct answers.
∇ points←Test n;answer;solution
[1] points←0
[2] :While n>0
[3] solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10
[4] answer←⎕
[5] points+←answer≡solution
[6] n-←1
[7] :EndWhile
[8] ⎕←'You scored',points,'points'
∇
Test 3
Examine the Test
function and try it out. Which line asks for user input?
Note
The ∇
del representation of the Test
function above is the vector representation result of ⎕VR'Test'
which can be directly input into the session. Copy the Test
function above, paste it into the session and press Enter to define the Test
function in your workspace.
You will see that it is quite possible to cheat the Test
function by entering the same expression that it asks. To be even more sly, simply move the text cursor with the up arrow to the printed problem statement and press Enter
.
To ameliorate this, we can verify and fix input with ⎕VFI
. Also note the use of quote-quad ⍞
.
∇ points←Test2 n;answer;input;solution;valid
[1] points←0
[2] :While n>0
[3] solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10
[4] input←⍞
[5] (valid answer)←⎕VFI input
[6] answer←valid/answer
[7] points+←answer≡,solution
[8] n-←1
[9] :EndWhile
[10] ⎕←'You scored',points,'points'
∇
Test2 3
In this case, "fix" means to define as an APL value in the workspace, as if it had been typed into the session.
Verify and Fix Input ⎕VFI
is used when you need to process numeric data from an external source, but it has arrived in a text format. This is very common when working with data from the internet or from files.
You might be tempted to use the Execute function ⍎⍵
but this is very dangerous because it will execute any text as APL code.
More about ⎕VFI
(valid numbers) ← ⎕VFI text
The Boolean vector valid
indicates the locations of elements in numbers
which were converted from text
.
By default, any valid number representation - including engineering exponential notation XeY
and complex numbers of the form xJy
- surrounded by spaces is considered valid input. You may provide a list of separator characters as left argument.
⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌───────────┬──────────────────┐
│1 1 1 1 0 0│7 4.3 1000 3J4 0 0│
└───────────┴──────────────────┘
' ,'⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌─────────────┬──────────────────────┐
│1 1 1 1 1 1 0│7 4.3 1000 3J4 5 300 0│
└─────────────┴──────────────────────┘
Convenient text input
Single quotes '
in APL character arrays must be escaped by doubling. It can be sometimes easier to paste input by assigning ⍞
:
text←⍞
My great string 'which has some quoted text'
]Repr text
'My great string ''which has some quoted text'' '
Note
The user command ]Repr
can generate APL expressions which produce most arrays. In some sense, it is like an inverse to execute ⍎
. There is also a utility function ⎕SE.Dyalog.Utils.repObj
which can be used in code, but we do not recommend using it in applications; use the primitives to test the properties of arrays, as explained in the sections on error handling.
Convenient text output
Once upon a time, APL was considered an incredible, revolutionary tool for scientists, artists and business people alike to be able to get work done using computers. In a time before spreadsheet software was so ubiquitous, APL terminals offered a way to quickly and easily process data, produce reports and format them for printing.
Take a look at the Chapter F of Mastering Dyalog APL for how to use the formatting functionality of ⍕
and ⎕FMT
.
-
It is easy (but inefficient) to round numbers to a specific precision with dyadic format
⍕
:⎕←rand←?5⍴0 0.2225024074 0.3282243862 0.314984696 0.9533625773 0.757200184 ⍎2⍕rand 0.22 0.33 0.31 0.95 0.76
- Write a function equivalent to
{⍎⍺⍕⍵}
without using⍎
or⍕
. - Why does
{⍎⍺⍕⍵}
fail for large values of⍺
?
- Write a function equivalent to
-
The following expression formats the current date as YY/MM/DD.
Change the expression to produce YYYY-MM-DD.'I2,2(</>,ZI2)'⎕FMT 1 3⍴100|3↑⎕TS
-
In Dyalog version 18.0, the experimental
1200⌶
(twelve hundred eye beam) function can convert date times into human readable formats according to some specification. For example:'Dddd Mmmm Doo YYYY'(1200⌶)1⎕DT⊂3↑⎕TS ┌──────────────────────────┐ │Wednesday August 12th 2020│ └──────────────────────────┘
Write a function
DTFMT
to generate a similar output using a 3-element vector like3↑⎕TS
. That is,- Full day name
- Full month name
- Ordinal day number (1st, 2nd 3rd, 4th etc.)
- Full year number
DTFMT 2020 8 12 ┌──────────────────────────┐ │Wednesday August 12th 2020│ └──────────────────────────┘
Importing code and data while developing
The experimental ]Get
user command can be used in the interactive IDE to obtain code and data from the internet or local file system in various formats. For example:
- APL code from files, folders and online repositories like GitHub
- Workspaces and text source shipped with the interpreter, for example dfns and HttpCommand
- Text data including plain text, CSV, XML and JSON
]Get
is a development tool intended as a one-stop utility for quickly bringing resources into the workspace while programming. Do not use it at run time, as exact results can vary. Instead, use precisely documented features like ⎕JSON
, ⎕CSV
, ⎕XML
, and ⎕FIX
in combination with loading tools like ⎕NGET
, HttpCommand
, ⎕SE.Link.Import
, etc.
Enter ]Get -?
into the interactive session to see more information.
Downloading data from the internet
HttpCommand is a utility for making requests to interact with web services. The HttpCommand class is built on top of the Conga framework for TCP/IP communications.
Load HttpCommand into the active workspace.
]Get HttpCommand
#.HttpCommand
Make an HTTP GET request to receive plain text data.
(HttpCommand.Get 'https://catfact.ninja/fact').Data
{"fact":"Cats have about 130,000 hairs per square inch (20,155 hairs per square centimeter).","length":83}
The GetJSON method automatically converts JSON payloads to APL namespaces. Remember to specify the HTTP method ('GET'
in the following example).
(HttpCommand.GetJSON 'GET' 'https://catfact.ninja/fact').Data.fact
There are approximately 60,000 hairs per square inch on the back of a cat and about 120,000 per square inch on its underside.
The result of a call to an HttpCommand method is a namespace including information about the request and its response.
r←HttpCommand.Get 'https://catfact.ninja/fact'
r.(HttpStatus HttpMessage)
┌───┬──┐
│200│OK│
└───┴──┘
Using HttpCommand
with ⎕FIX
is a way to download APL code from the internet.
Native files
The term "Native Files" refers to any type of file on a hard disk. These can be text or media files, or even executable files. Usually we are interested in various kinds of text files.
Text files
Read Text File ⎕NGET
documentation
Write Text File ⎕NPUT
documentation
Generally, the ⎕N...
family of system functions are for reading and writing native files as described in the documentation. ⎕NGET
and ⎕NPUT
are useful for reading and writing text files without having to tie and untie them.
(⊂words)⎕NPUT'data/words.txt' ⍝ Write words to a unicode text file
(content encoding newline)←⎕NGET'data/words.txt' ⍝ Read words from a unicode text file
words←⊃⎕NGET'data/words.txt' 1 ⍝ Split words on each new line
⎕CSV
Comma Separated Values documentation
Parsing content from text files using ⎕CSV
The Comma Separator Values system function ⎕CSV
can read tabular data from .csv files as APL matrices. Here are some features of ⎕CSV
:
- Read data from and write data to files directly
data ← ⎕CSV '/path/to/file.csv' ⍝ Read from file.csv data ⎕CSV '/path/to/file.csv' ⍝ Write to file.csv
- Separate the header (first row) from the rest of the data
(data header) ← ⎕CSV '/path/to/file.csv' ⍬ ⍬ 1
-
Import specific columns as numbers or characters, depending on the options provided.
numeric_if_possible ← ⎕CSV '/path/to/file.csv' ⍬ 4
The
4
in this example indicates to convert numeric values if possible, else keep the value as text. -
Use a separator other than commas, using the "Separator" variant option, for example using tabs (
⎕UCS 9
) for Tab Separated Values (.tsv).
tsv ← ⎕CSV⍠'Separator' (⎕UCS 9)⊢'/path/to/file.csv'
- Read data chunks at a time so as to not fill the workspace, using the "Records" variant option.
path ← '/path/to/file.csv' ⍝ The file path as simple character vector ReadCSV10←⎕CSV⍠'Records' 10 ⍝ A function to read CSV 10 records at a time tn←path ⎕NTIE 0 ⍝ Tie the file - this locks it from use by other applications first10 ← ReadCSV10 tn ⍝ Read the first 10 records (rows) second10 ← ReadCSV10 tn ⍝ Read the next 10 ≢¨first10 second10 10 10 first10 second10 ┌──────────┬──────────┐ │┌──┬─────┐│┌──┬─────┐│ ││1 │JQZUK│││11│DECJM││ │├──┼─────┤│├──┼─────┤│ ││2 │ANPYW│││12│PXPGL││ │├──┼─────┤│├──┼─────┤│ ││3 │WYVSR│││13│SYSCN││ │├──┼─────┤│├──┼─────┤│ ││4 │ZOGOX│││14│EKDPS││ │├──┼─────┤│├──┼─────┤│ ││5 │CXKRS│││15│XCOHA││ │├──┼─────┤│├──┼─────┤│ ││6 │BFTYO│││16│RDAHR││ │├──┼─────┤│├──┼─────┤│ ││7 │VFLAS│││17│KPUTW││ │├──┼─────┤│├──┼─────┤│ ││8 │BAFYD│││18│TPDOD││ │├──┼─────┤│├──┼─────┤│ ││9 │XPEBP│││19│BGIVA││ │├──┼─────┤│├──┼─────┤│ ││10│UVBFG│││20│IITSO││ │└──┴─────┘│└──┴─────┘│ └──────────┴──────────┘ ⎕NUNTIE tn ⍝ Don't forget to untie the file after use!
⎕JSON
JSON Convert ⎕JSON
documentation
⎕JSON
Table Support
JavaScript Object Notation (JSON) can be translated to and from APL.
-
Lists can be represented as APL vectors
1⎕JSON (1 2 3)'ABCD' [[1,2,3],"ABCD"]
-
Objects can be represented as APL namespaces.
0⎕JSON '{"name":"David", "age": 42}' #.[JSON object]
-
Both can be represented as a matrix of depth, name, value and type columns somewhat similar to that used by
⎕XML
.0 (⎕JSON ⎕OPT'Format' 'M')'[{"name":"David", "age": 42}, {"name": "Sandra", "age": 42}]' ┌─┬────┬──────┬─┐ │0│ │ │2│ ├─┼────┼──────┼─┤ │1│ │ │1│ ├─┼────┼──────┼─┤ │2│name│David │4│ ├─┼────┼──────┼─┤ │2│age │42 │3│ ├─┼────┼──────┼─┤ │1│ │ │1│ ├─┼────┼──────┼─┤ │2│name│Sandra│4│ ├─┼────┼──────┼─┤ │2│age │42 │3│ └─┴────┴──────┴─┘
JSON is not only a convenient way to represent nested data structures, but also a convenient data representation for the modern web since it is natively handled by JavaScript.
A JSON object in Dyalog uses dot-syntax to access members. Some JSON object keys are invalid APL names, so Dyalog works around this using special characters:
(⎕JSON'{"$name": "steve", "3var": "what"}').⎕nl-⍳9
┌─────┬─────────┐
│⍙3var│⍙⍙36⍙name│
└─────┴─────────┘
Be aware of incompatible namespaces, although most of the time you will be converting data rather than namespaces.
'ns'⎕NS⍬
ns.fn←{⍵}
⎕JSON ns
DOMAIN ERROR: JSON export: item "fn" of the right argument cannot be
converted (⎕IO=1)
⎕JSON ns
∧
Recall the expression for an empty JSON object.
Using ⎕JSON
, we can also display error information in a human-readable format.
⎕XML
XML Convert ⎕XML
documentation
⎕XML
converts between XML character vectors and a nested matrices of node depth, tag name, value, attribute key/value pairs and markup description columns.
⎕XML'<name born="1920">Ken</name><name born="1925">Jean</name>'
┌─┬────┬────┬───────────┬─┐
│0│name│Ken │┌────┬────┐│5│
│ │ │ ││born│1920││ │
│ │ │ │└────┴────┘│ │
├─┼────┼────┼───────────┼─┤
│0│name│Jean│┌────┬────┐│5│
│ │ │ ││born│1925││ │
│ │ │ │└────┴────┘│ │
└─┴────┴────┴───────────┴─┘
Binary files and other arbitrary file types
Native Files Cheat Sheet
System Functions Categorised
In the chapter on selecting from arrays there was an example of reading a text file using ⎕NGET
. Before Dyalog version 15.0, reading text files required a couple of extra steps. Some ⎕N...
native file functions are general and can be used to read and write any type of file. As a simple example, here we tie the file words.txt, read the data and store it in a variable, and finally untie the file.
Note
For multi-user systems, take care to set appropriate file access permissions when using ⎕NCREATE
, ⎕NTIE
and ⎕NLOCK
.
tn←'assets/words.txt'⎕NTIE 0
⎕←10↑words←(⎕UCS 10)(≠⊆⊢)⎕NREAD tn 82(⎕NSIZE tn)0
┌─┬───┬────┬────┬─────┬────┬──────┬────┬──────┬────┐
│A│A's│AA's│AB's│ABM's│AC's│ACTH's│AI's│AIDS's│AM's│
└─┴───┴────┴────┴─────┴────┴──────┴────┴──────┴────┘
⎕NUNTIE⎕NNUMS
⎕MAP
The memory mapping function ⎕MAP
allows you to treat a file on disk as if it were a variable in the workspace. This is useful if you are working with data that cannot fit inside the available workspace memory. One approach might be to read the data in chunks and process one chunk at a time (for example, see the "Records" variant option for ⎕CSV
). Another approach is to use ⎕MAP
.
text ← 80 ¯1 ⎕MAP '/path/to/file.txt'
You must specify the type according to the Data Representation ⎕DR
of the data to be read.
APL Component files
Chapter N of Mastering Dyalog APL
Component File documentation
If it is only APL systems that need to store data, the most convenient and efficient way to store that data is in APL component files.
System functions that deal with component files begin ⎕F...
.
Tie and untie
In Dyalog, component files have the extension .dcf (Dyalog Component File) and must be tied and untied.
A component file may be exclusively tied (⎕FTIE
) or have a shared tie (⎕FSTIE
). With an exclusive tie, no other process may access the file.
tn←'cfile'⎕FCREATE 0 ⍝ The file is exclusively tied
⎕FUNTIE tn ⍝ The file is untied, it can now be used by other applications and processes
The next time we want to use this file, we can use ⎕FTIE
instead of ⎕FCREATE
. The right argument to these functions specifies a tie number (which can be different each time the file is tied), but with a right argument of 0
the next available tie number is used (component file tie numbers start at 1).
tn←'cfile'⎕FTIE 0 ⍝ The file on disk is cfile.dcf, but this extension is assumed if not specified
The structure of a component file is analogous to a nested vector of arrays. We add new values by appending them to the end of a file.
(3 3⍴⍳9)⎕FAPPEND tn
(↑'Dave' 'Sam' 'Ellie' 'Saif')⎕FAPPEND tn
nested←2 2⍴'this' 0 'that' (1 2 3)
nested ⎕FAPPEND tn
Each array stored in a component file (a component) is referred to by its index in the file (its component number), starting from 1 (not affected by ⎕IO
).
⎕FREAD¨tn,¨1 2 3
┌─────┬─────┬────────────┐
│1 2 3│Dave │┌────┬─────┐│
│4 5 6│Sam ││this│0 ││
│7 8 9│Ellie│├────┼─────┤│
│ │Saif ││that│1 2 3││
│ │ │└────┴─────┘│
└─────┴─────┴────────────┘
A component can be replaced by any other array.
'Hello'⎕FREPLACE tn 2
⎕FREAD tn 2
Hello
Use ⎕FSIZE
to find the range of components and file size:
⎕FSIZE tn
1 4 1744 1.8446744073709552E19
The elements of ⎕FSIZE
are:
[1]
The number of the first component[2]
1 + the number of the last component (that is, where a new component will be if⎕FAPPEND
is used)[3]
The current size of the file in bytes[4]
The file size limit in bytes
Components can be removed from the beginning or end of a component file, with the ⎕FDROP
function analogous to ⍺↓⍵
.
⎕FDROP tn 1
⎕FDROP tn ¯1
⎕FREAD¨tn,¨1 2 3
FILE INDEX ERROR: cfile.dcf: No such component
⎕FREAD¨tn,¨1 2 3
∧
⎕FREAD tn 2 ⍝ Only component number 2 remains
Dave
Sam
Ellie
Saif
After use, don't forget to untie all tied component files using ⎕FUNTIE ⎕FNUMS
.
Multi-user access
If you are working on a system through which multiple users need to access the same component files, it is important to become familiar with multi-user access techniques and potential pitfalls. In particular, you will need to use ⎕FSTIE
, ⎕FHOLD
, ⎕FSTACK
and probably ⎕AN
.
Multi-user access can mean manual access by actual human users, or automated access by separate computers or processes.
SQL Databases
SQAPL ships with Dyalog and can be used out-of-the-box provided that a database is installed and a corresponding ODBC data source has been set up.
'SQA'⎕CY'sqapl'
SQA.Connect cid odbc_datasource_name sql_password sql_user
SQA.Do cid 'USE my_database'
SQA.Do cid 'SELECT * FROM my_table'
Some freely available ODBC drivers allow you to connect to databases and are sufficient for most use cases, such as the MySQL ODBC Connector or the MariaDB ODBC Connector. If you cannot find one which works for your particular hardware and software, Dyalog resells Progress DataDirect ODBC drivers, but these require a different version of SQAPL which is licensed separately. Contact Dyalog sales if you require the use of Progress DataDirect ODBC drivers.
Problem set 13
Indian Summer
IndiaRainfall.csv is a file of comma separated values. It is adapted from IndiaRainfallSource.csv to remove incomplete records.
The India Meteorological Department(IMD) has shared this dataset under Govt. Open Data License - India. It can be downloaded from the links above or from the Kaggle data science website.
The data contains the total measured monthly rain fall in millimeters for 30
regions in India from the years 1915
to 2015
inclusive.
-
Load the data into the workspace
By default,
⎕CSV
will load all fields as text data:⎕←3↑1 2↓⎕CSV'assets/IndiaRainfall.csv'
With the following parameters,
⎕CSV
will try to interpret all fields as numeric, and fall back to text if that fails. It will also import the first line as a separate array:(raindata header)←⎕CSV'assets/IndiaRainfall.csv' ⍬ 4 1 ⎕←3↑0 2↓raindata
Bonus
Try reading IndiaRainfallSource.csv and removing the missing records for yourself. When data sets contain a very small amount of missing data, sometimes it is appropriate to estimate those values in a process called imputation. Most of the time, it is best to just remove the sections containing missing records.
-
What was the total rainfall in Punjab in 1995?
- Which month in which region had the highest rainfall?
- Use a least squares linear fit to estimate the total rainfall in all 30 regions in 2018
-
Use a least squares linear fit to estimate the total rainfall in Punjab in 2018
Hint
No one would expect you to derive an expression for the least squares linear fit with little APL experience. If you have done it, kudos to you. The expression
Mv(⊢⌹1,∘⍪⊣)Nv
from APLcart will compute coefficients of a least squares linear fit given a vector of X valuesMv
and a vector of Y valuesNv
. -
Inspect the data in IndiaRainfallSource.csv to see how close the true values were to your estimates. Were they within your standard error?
Hint
If the error
e
is a vector of the differences between Y values predicted by the linear fit and the actual Y values\[e_i=Y_i^{\text{predicted}}-Y_i^{\text{actual}}\]then an estimate for the variance is given by
\[s^2=\sum_{i=1}^n{{e_i^2}\over{n-2}}\]where the standard deviation (standard error) is \(s\).
MarkDown Sort
Write a program which reads in a markdown file, rearranges the sections by the alphabetical order of their headers, and writes the sorted file to a new file. For extra credit, include a method by which the user can decide whether to overwrite the existing file or provide the name or path to a new file. For example files, feel free to use any of the source files for these course materials.
Fun facts
If you are not very familiar with the workings of modern software, you might be surprised to see how accessible file types are. Many text editors might try to open a wide range of files by interpreting their data as text. In the audio editing program Audacity, native files can be inspected and manipulated as audio waveforms. These are a couple of techniques used in an art style called databending.