Skip to content

Data Input/Output

Although we have ⎕IO, "IO" in APL can still refer to input/output.

This page refers to APL tools for reading and writing data to and from files, databases, and the internet. If you are already familiar with Python, R or .NET then you can use one of the external language bridges to bring data into APL from files via one of these languages. However, it can be simpler and faster in many cases to use one of the following tools.

Hello, World!

If you have seen any kind of computer programming before, you are probably aware of a famous program called "Hello, World!".

Here it is in APL:

      ⎕←'Hello, World!'

If you have learned programming before, maybe it is strange to have gotten so far in an introductory tutorial without meeting the language's "print" function.

Note

By default, non-assignment expressions output results to the session log. We strongly recommend using ⎕← when you deliberately intend for an expression to print to the session log. You are then able to search for print statements for easier debugging.

Back 2 School 4 Maths

The function Test will ask the user n single-digit arithmetic questions, and return the number of correct answers.

     ∇ points←Test n;answer;solution          
[1]    points←0                               
[2]    :While n>0                             
[3]        solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10
[4]        answer←⎕                           
[5]        points+←answer≡solution            
[6]        n-←1                               
[7]    :EndWhile                              
[8]    ⎕←'You scored',points,'points'         
     ∇  

      Test 3

Examine the Test function and try it out. Which line asks for user input?

Note

The del representation of the Test function above is the vector representation result of ⎕VR'Test' which can be directly input into the session. Copy the Test function above, paste it into the session and press Enter to define the Test function in your workspace.

You will see that it is quite possible to cheat the Test function by entering the same expression that it asks. To be even more sly, simply move the text cursor with the up arrow to the printed problem statement and press Enter.

To ameliorate this, we can verify and fix input with ⎕VFI. Also note the use of quote-quad .

     ∇ points←Test2 n;answer;input;solution;valid
[1]    points←0                                  
[2]    :While n>0                                
[3]        solution←⍎⎕←⍕(?10),('+-×÷'[?4]),?10   
[4]        input←⍞                               
[5]        (valid answer)←⎕VFI input             
[6]        answer←valid/answer                   
[7]        points+←answer≡,solution              
[8]        n-←1                                  
[9]    :EndWhile                                 
[10]   ⎕←'You scored',points,'points'            
     ∇  

      Test2 3

In this case, "fix" means to define as an APL value in the workspace, as if it had been typed into the session.

Verify and Fix Input ⎕VFI is used when you need to process numeric data from an external source, but it has arrived in a text format. This is very common when working with data from the internet or from files.

You might be tempted to use the Execute function ⍎⍵ but this is very dangerous because it will execute any text as APL code.

More about ⎕VFI

(valid numbers) ← ⎕VFI text

The Boolean vector valid indicates the locations of elements in numbers which were converted from text.

By default, any valid number representation - including engineering exponential notation XeY and complex numbers of the form xJy - surrounded by spaces is considered valid input. You may provide a list of separator characters as left argument.

    ⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌───────────┬──────────────────┐
│1 1 1 1 0 0│7 4.3 1000 3J4 0 0│
└───────────┴──────────────────┘
    ' ,'⎕VFI'7 4.3 1e3 3j4 5,300 ok'
┌─────────────┬──────────────────────┐
│1 1 1 1 1 1 0│7 4.3 1000 3J4 5 300 0│
└─────────────┴──────────────────────┘

Convenient text input

Single quotes ' in APL character arrays must be escaped by doubling. It can be sometimes easier to paste input by assigning :

      text←⍞
My great string 'which has some quoted text' 


      ]Repr text
'My great string ''which has some quoted text'' '

Note

The user command ]Repr can generate APL expressions which produce most arrays. In some sense, it is like an inverse to execute . There is also a utility function ⎕SE.Dyalog.Utils.repObj which can be used in code, but we do not recommend using it in applications; use the primitives to test the properties of arrays, as explained in the sections on error handling.

Convenient text output

Once upon a time, APL was considered an incredible, revolutionary tool for scientists, artists and business people alike to be able to get work done using computers. In a time before spreadsheet software was so ubiquitous, APL terminals offered a way to quickly and easily process data, produce reports and format them for printing.

Take a look at the Chapter F of Mastering Dyalog APL for how to use the formatting functionality of and ⎕FMT.

  1. It is easy (but inefficient) to round numbers to a specific precision with dyadic format :

          ⎕←rand←?5⍴0
    0.2225024074 0.3282243862 0.314984696 0.9533625773 0.757200184
          ⍎2⍕rand
    0.22 0.33 0.31 0.95 0.76

    1. Write a function equivalent to {⍎⍺⍕⍵} without using or .
    2. Why does {⍎⍺⍕⍵} fail for large values of ?
  2. The following expression formats the current date as YY/MM/DD.

    'I2,2(</>,ZI2)'⎕FMT 1 3⍴100|3↑⎕TS
    Change the expression to produce YYYY-MM-DD.

  3. In Dyalog version 18.0, the experimental 1200⌶ (twelve hundred eye beam) function can convert date times into human readable formats according to some specification. For example:

          'Dddd Mmmm Doo YYYY'(1200⌶)1⎕DT⊂3↑⎕TS
    ┌──────────────────────────┐
    │Wednesday August 12th 2020│
    └──────────────────────────┘

    Write a function DTFMT to generate a similar output using a 3-element vector like 3↑⎕TS. That is,

    • Full day name
    • Full month name
    • Ordinal day number (1st, 2nd 3rd, 4th etc.)
    • Full year number

          DTFMT 2020 8 12
    ┌──────────────────────────┐
    │Wednesday August 12th 2020│
    └──────────────────────────┘

Importing code and data while developing

The experimental ]Get user command can be used in the interactive IDE to obtain code and data from the internet or local file system in various formats. For example:

  • APL code from files, folders and online repositories like GitHub
  • Workspaces and text source shipped with the interpreter, for example dfns and HttpCommand
  • Text data including plain text, CSV, XML and JSON

]Get is a development tool intended as a one-stop utility for quickly bringing resources into the workspace while programming. Do not use it at run time, as exact results can vary. Instead, use precisely documented features like ⎕JSON, ⎕CSV, ⎕XML, and ⎕FIX in combination with loading tools like ⎕NGET, HttpCommand, ⎕SE.Link.Import, etc.

Enter ]Get -? into the interactive session to see more information.

Downloading data from the internet

HttpCommand User Guide

HttpCommand is a utility for making requests to interact with web services. The HttpCommand class is built on top of the Conga framework for TCP/IP communications.

Load HttpCommand into the active workspace.

      ]Get HttpCommand
#.HttpCommand

Make an HTTP GET request to receive plain text data.

      (HttpCommand.Get 'https://catfact.ninja/fact').Data
{"fact":"Cats have about 130,000 hairs per square inch (20,155 hairs per square centimeter).","length":83}

The GetJSON method automatically converts JSON payloads to APL namespaces. Remember to specify the HTTP method ('GET' in the following example).

      (HttpCommand.GetJSON 'GET' 'https://catfact.ninja/fact').Data.fact
There are approximately 60,000 hairs per square inch on the back of a cat and about 120,000 per square inch on its underside.

The result of a call to an HttpCommand method is a namespace including information about the request and its response.

      r←HttpCommand.Get 'https://catfact.ninja/fact'
      r.(HttpStatus HttpMessage)
┌───┬──┐
│200│OK│
└───┴──┘

Using HttpCommand with ⎕FIX is a way to download APL code from the internet.

Native files

The term "Native Files" refers to any type of file on a hard disk. These can be text or media files, or even executable files. Usually we are interested in various kinds of text files.

Text files

Read Text File ⎕NGET documentation
Write Text File ⎕NPUT documentation

Generally, the ⎕N... family of system functions are for reading and writing native files as described in the documentation. ⎕NGET and ⎕NPUT are useful for reading and writing text files without having to tie and untie them.

(⊂words)⎕NPUT'data/words.txt'                      ⍝ Write words to a unicode text file
(content encoding newline)←⎕NGET'data/words.txt'   ⍝ Read words from a unicode text file
words←⊃⎕NGET'data/words.txt' 1                     ⍝ Split words on each new line 

CSV

Comma Separated Values documentation
Parsing content from text files using ⎕CSV

The Comma Separator Values system function ⎕CSV can read tabular data from .csv files as APL matrices. Here are some features of ⎕CSV:

  • Read data from and write data to files directly
    data ← ⎕CSV '/path/to/file.csv'   ⍝ Read from file.csv
    data ⎕CSV '/path/to/file.csv'     ⍝ Write to file.csv
  • Separate the header (first row) from the rest of the data
    (data header) ← ⎕CSV '/path/to/file.csv' ⍬ ⍬ 1
  • Import specific columns as numbers or characters, depending on the options provided.

    numeric_if_possible ← ⎕CSV '/path/to/file.csv' ⍬ 4

    The 4 in this example indicates to convert numeric values if possible, else keep the value as text.

  • Use a separator other than commas, using the "Separator" variant option, for example using tabs (⎕UCS 9) for Tab Separated Values (.tsv).

    tsv ← ⎕CSV⍠'Separator' (⎕UCS 9)⊢'/path/to/file.csv'

  • Read data chunks at a time so as to not fill the workspace, using the "Records" variant option.
          path ← '/path/to/file.csv'    ⍝ The file path as simple character vector
          ReadCSV10←⎕CSV⍠'Records' 10   ⍝ A function to read CSV 10 records at a time
          tn←path ⎕NTIE 0               ⍝ Tie the file - this locks it from use by other applications
          first10 ← ReadCSV10 tn        ⍝ Read the first 10 records (rows)
          second10 ← ReadCSV10 tn       ⍝ Read the next 10
          ≢¨first10 second10
    10 10
          first10 second10
    ┌──────────┬──────────┐
    │┌──┬─────┐│┌──┬─────┐│
    ││1 │JQZUK│││11│DECJM││
    │├──┼─────┤│├──┼─────┤│
    ││2 │ANPYW│││12│PXPGL││
    │├──┼─────┤│├──┼─────┤│
    ││3 │WYVSR│││13│SYSCN││
    │├──┼─────┤│├──┼─────┤│
    ││4 │ZOGOX│││14│EKDPS││
    │├──┼─────┤│├──┼─────┤│
    ││5 │CXKRS│││15│XCOHA││
    │├──┼─────┤│├──┼─────┤│
    ││6 │BFTYO│││16│RDAHR││
    │├──┼─────┤│├──┼─────┤│
    ││7 │VFLAS│││17│KPUTW││
    │├──┼─────┤│├──┼─────┤│
    ││8 │BAFYD│││18│TPDOD││
    │├──┼─────┤│├──┼─────┤│
    ││9 │XPEBP│││19│BGIVA││
    │├──┼─────┤│├──┼─────┤│
    ││10│UVBFG│││20│IITSO││
    │└──┴─────┘│└──┴─────┘│
    └──────────┴──────────┘
          ⎕NUNTIE tn                    ⍝ Don't forget to untie the file after use!

JSON

JSON Convert ⎕JSON documentation
⎕JSON Table Support

JavaScript Object Notation (JSON) can be translated to and from APL.

  • Lists can be represented as APL vectors

          1⎕JSON (1 2 3)'ABCD'
    [[1,2,3],"ABCD"]
  • Objects can be represented as APL namespaces.

          0⎕JSON '{"name":"David", "age": 42}'
    #.[JSON object]
  • Both can be represented as a matrix of depth, name, value and type columns somewhat similar to that used by ⎕XML.

          0 (⎕JSON ⎕OPT'Format' 'M')'[{"name":"David", "age": 42}, {"name": "Sandra", "age": 42}]'
    ┌─┬────┬──────┬─┐
    │0│    │      │2│
    ├─┼────┼──────┼─┤
    │1│    │      │1│
    ├─┼────┼──────┼─┤
    │2│name│David │4│
    ├─┼────┼──────┼─┤
    │2│age │42    │3│
    ├─┼────┼──────┼─┤
    │1│    │      │1│
    ├─┼────┼──────┼─┤
    │2│name│Sandra│4│
    ├─┼────┼──────┼─┤
    │2│age │42    │3│
    └─┴────┴──────┴─┘

JSON is not only a convenient way to represent nested data structures, but also a convenient data representation for the modern web since it is natively handled by JavaScript.

A JSON object in Dyalog uses dot-syntax to access members. Some JSON object keys are invalid APL names, so Dyalog works around this using special characters:

      (⎕JSON'{"$name": "steve", "3var": "what"}').⎕nl-⍳9
┌─────┬─────────┐
│⍙3var│⍙⍙36⍙name│
└─────┴─────────┘

Be aware of incompatible namespaces, although most of the time you will be converting data rather than namespaces.

      'ns'⎕NS⍬
      ns.fn←{⍵}
      ⎕JSON ns
DOMAIN ERROR: JSON export: item "fn" of the right argument cannot be 
converted (⎕IO=1)
      ⎕JSON ns
      ∧

Recall the expression for an empty JSON object.

Using ⎕JSON, we can also display error information in a human-readable format.

XML

XML Convert ⎕XML documentation

⎕XML converts between XML character vectors and a nested matrices of node depth, tag name, value, attribute key/value pairs and markup description columns.

      ⎕XML'<name born="1920">Ken</name><name born="1925">Jean</name>'
┌─┬────┬────┬───────────┬─┐
│0│name│Ken │┌────┬────┐│5│
│ │    │    ││born│1920││ │
│ │    │    │└────┴────┘│ │
├─┼────┼────┼───────────┼─┤
│0│name│Jean│┌────┬────┐│5│
│ │    │    ││born│1925││ │
│ │    │    │└────┴────┘│ │
└─┴────┴────┴───────────┴─┘

Binary files and other arbitrary file types

Native Files Cheat Sheet
System Functions Categorised

In the chapter on selecting from arrays there was an example of reading a text file using ⎕NGET. Before Dyalog version 15.0, reading text files required a couple of extra steps. Some ⎕N... native file functions are general and can be used to read and write any type of file. As a simple example, here we tie the file words.txt, read the data and store it in a variable, and finally untie the file.

Note

For multi-user systems, take care to set appropriate file access permissions when using ⎕NCREATE, ⎕NTIE and ⎕NLOCK.

      tn←'assets/words.txt'⎕NTIE 0
      ⎕←10↑words←(⎕UCS 10)(≠⊆⊢)⎕NREAD tn 82(⎕NSIZE tn)0
┌─┬───┬────┬────┬─────┬────┬──────┬────┬──────┬────┐
│A│A's│AA's│AB's│ABM's│AC's│ACTH's│AI's│AIDS's│AM's│
└─┴───┴────┴────┴─────┴────┴──────┴────┴──────┴────┘
      ⎕NUNTIE⎕NNUMS

⎕MAP

Map File ⎕MAP documentation

The memory mapping function ⎕MAP allows you to treat a file on disk as if it were a variable in the workspace. This is useful if you are working with data that cannot fit inside the available workspace memory. One approach might be to read the data in chunks and process one chunk at a time (for example, see the "Records" variant option for ⎕CSV). Another approach is to use ⎕MAP.

text ← 80 ¯1 ⎕MAP '/path/to/file.txt'

You must specify the type according to the Data Representation ⎕DR of the data to be read.

APL Component files

Chapter N of Mastering Dyalog APL
Component File documentation

If it is only APL systems that need to store data, the most convenient and efficient way to store that data is in APL component files.

System functions that deal with component files begin ⎕F....

Tie and untie

In Dyalog, component files have the extension .dcf (Dyalog Component File) and must be tied and untied.

A component file may be exclusively tied (⎕FTIE) or have a shared tie (⎕FSTIE). With an exclusive tie, no other process may access the file.

tn←'cfile'⎕FCREATE 0   ⍝ The file is exclusively tied
⎕FUNTIE tn             ⍝ The file is untied, it can now be used by other applications and processes

The next time we want to use this file, we can use ⎕FTIE instead of ⎕FCREATE. The right argument to these functions specifies a tie number (which can be different each time the file is tied), but with a right argument of 0 the next available tie number is used (component file tie numbers start at 1).

tn←'cfile'⎕FTIE 0   ⍝ The file on disk is cfile.dcf, but this extension is assumed if not specified 

The structure of a component file is analogous to a nested vector of arrays. We add new values by appending them to the end of a file.

(3 3⍴⍳9)⎕FAPPEND tn
(↑'Dave' 'Sam' 'Ellie' 'Saif')⎕FAPPEND tn
nested←2 2⍴'this' 0 'that' (1 2 3)
nested ⎕FAPPEND tn

Each array stored in a component file (a component) is referred to by its index in the file (its component number), starting from 1 (not affected by ⎕IO).

      ⎕FREAD¨tn,¨1 2 3
┌─────┬─────┬────────────┐
│1 2 3│Dave │┌────┬─────┐│
│4 5 6│Sam  ││this│0    ││
│7 8 9│Ellie│├────┼─────┤│
│     │Saif ││that│1 2 3││
│     │     │└────┴─────┘│
└─────┴─────┴────────────┘

A component can be replaced by any other array.

      'Hello'⎕FREPLACE tn 2
      ⎕FREAD tn 2
Hello

Use ⎕FSIZE to find the range of components and file size:

      ⎕FSIZE tn
1 4 1744 1.8446744073709552E19

The elements of ⎕FSIZE are:

  • [1] The number of the first component
  • [2] 1 + the number of the last component (that is, where a new component will be if ⎕FAPPEND is used)
  • [3] The current size of the file in bytes
  • [4] The file size limit in bytes

Components can be removed from the beginning or end of a component file, with the ⎕FDROP function analogous to ⍺↓⍵.

      ⎕FDROP tn  1
      ⎕FDROP tn ¯1
      ⎕FREAD¨tn,¨1 2 3
FILE INDEX ERROR: cfile.dcf: No such component
      ⎕FREAD¨tn,¨1 2 3
      ∧
      ⎕FREAD tn 2   ⍝ Only component number 2 remains
Dave 
Sam  
Ellie
Saif 

After use, don't forget to untie all tied component files using ⎕FUNTIE ⎕FNUMS.

Multi-user access

If you are working on a system through which multiple users need to access the same component files, it is important to become familiar with multi-user access techniques and potential pitfalls. In particular, you will need to use ⎕FSTIE, ⎕FHOLD, ⎕FSTACK and probably ⎕AN.

Multi-user access can mean manual access by actual human users, or automated access by separate computers or processes.

SQL Databases

SQL Interface Guide

SQAPL ships with Dyalog and can be used out-of-the-box provided that a database is installed and a corresponding ODBC data source has been set up.

'SQA'⎕CY'sqapl'
SQA.Connect cid odbc_datasource_name sql_password sql_user
SQA.Do cid 'USE my_database'
SQA.Do cid 'SELECT * FROM my_table'

Some freely available ODBC drivers allow you to connect to databases and are sufficient for most use cases, such as the MySQL ODBC Connector or the MariaDB ODBC Connector. If you cannot find one which works for your particular hardware and software, Dyalog resells Progress DataDirect ODBC drivers, but these require a different version of SQAPL which is licensed separately. Contact Dyalog sales if you require the use of Progress DataDirect ODBC drivers.

Problem set 13

Indian Summer

IndiaRainfall.csv is a file of comma separated values. It is adapted from IndiaRainfallSource.csv to remove incomplete records.

The India Meteorological Department(IMD) has shared this dataset under Govt. Open Data License - India. It can be downloaded from the links above or from the Kaggle data science website.

The data contains the total measured monthly rain fall in millimeters for 30 regions in India from the years 1915 to 2015 inclusive.

  1. Load the data into the workspace

    By default, ⎕CSV will load all fields as text data:

          ⎕←3↑1 2↓⎕CSV'assets/IndiaRainfall.csv'

    With the following parameters, ⎕CSV will try to interpret all fields as numeric, and fall back to text if that fails. It will also import the first line as a separate array:

          (raindata header)←⎕CSV'assets/IndiaRainfall.csv' ⍬ 4 1
          ⎕←3↑0 2↓raindata

    Bonus

    Try reading IndiaRainfallSource.csv and removing the missing records for yourself. When data sets contain a very small amount of missing data, sometimes it is appropriate to estimate those values in a process called imputation. Most of the time, it is best to just remove the sections containing missing records.

  2. What was the total rainfall in Punjab in 1995?

  3. Which month in which region had the highest rainfall?
  4. Use a least squares linear fit to estimate the total rainfall in all 30 regions in 2018
  5. Use a least squares linear fit to estimate the total rainfall in Punjab in 2018

    Hint

    No one would expect you to derive an expression for the least squares linear fit with little APL experience. If you have done it, kudos to you. The expression Mv(⊢⌹1,∘⍪⊣)Nv from APLcart will compute coefficients of a least squares linear fit given a vector of X values Mv and a vector of Y values Nv.

  6. Inspect the data in IndiaRainfallSource.csv to see how close the true values were to your estimates. Were they within your standard error?

    Hint

    If the error e is a vector of the differences between Y values predicted by the linear fit and the actual Y values

    \[e_i=Y_i^{\text{predicted}}-Y_i^{\text{actual}}\]

    then an estimate for the variance is given by

    \[s^2=\sum_{i=1}^n{{e_i^2}\over{n-2}}\]

    where the standard deviation (standard error) is \(s\).

MarkDown Sort

Write a program which reads in a markdown file, rearranges the sections by the alphabetical order of their headers, and writes the sorted file to a new file. For extra credit, include a method by which the user can decide whether to overwrite the existing file or provide the name or path to a new file. For example files, feel free to use any of the source files for these course materials.


Fun facts
If you are not very familiar with the workings of modern software, you might be surprised to see how accessible file types are. Many text editors might try to open a wide range of files by interpreting their data as text. In the audio editing program Audacity, native files can be inspected and manipulated as audio waveforms. These are a couple of techniques used in an art style called databending.