Knowledge Engineering


Character encoding

Control Characters


Control characters are often significant.

Common security errors involving control characters:

Escape Sequences


ASCII (American Standard Code for Information Exchance) defines 128 characters.

  • Python:

    from __future__ import print_function
    for i in range(0,128):
        print("{0:<3d} {1!r} {1:s}.".format(i, chr(i)))


Unicode encodings:

  • UTF-1
  • UTF-5
  • UTF-6
  • UTF-8
  • UTF-9, UTF-18
  • UTF-16
  • UTF-32

UTF-8 is a Unicode Character encoding which can represent all Unicode symbols with 8-bit code units.

Logic, Reasoning, and Inference



Propositional Calculus

  • Premise P
  • Conclusion Q
Modus ponens
  • P -> Q – Premise 1 P1 P_1 (“P sub 1”)
  • P – Premise 2 P2 P_2 (“P sub 2”)
  • Q – Conclusion Q Q_0 (“Q sub 0”)

Predicate Logic

  • Universe of discourse
  • Predicate
    • ∃ – There exists – Existential quantifier
    • ∀ – For all – Universal quantifier
Existential quantification
  • ∃ – “There exists” is the Existential quantifier symbol.
  • An existential quantifier is true (“holds true”) if there is one (or more) example in which the condition holds true.
  • An existential quantifier is satisfied by one (or more) examples.
Universal quantification
  • ∀ – “For all” is the Universal quantifier symbol.
  • A universal quantification is disproven by one counterexample where the condition does not hold true.
    • disproven by one counterexample.

Hoare Logic

  • precondition P
  • command C
  • postcondition Q


First-order Logic

First-order logic (FOL)

  • Terms
    • Variables
      • x, y, z
      • x, x_0 (“x subscript 0”, “x sub 0”)
    • Functions
      • f(x) – function symbol (arity 1)
      • a – constant symbol (arity 0) ( a() )
  • Formulas (“formulae”)
    • Equality
      • = – equality
    • Logical Connectives (“unary”, “binary”, sequence/tuple/list)
      • ¬~, ! – negation (unary)
      • ...
      • ^, &&, and – conjunction
      • v, ||, or – disjunction
      • ->, – implication
      • <->, – biconditional
      • ...
      • XOR
      • NAND
    • Grouping Operators
      • Parentheses ( )
      • Brackets < >
    • Relations
      • P(x) – predicate symbol (n_args=1, arity 1, valence 1)
      • R(x) – relation symbol (n_args=1, arity 1, valence 1)
      • Q(x,y) – binary predicate/relation symbol (n_args=2, ...)
    • Quantifier Symbols “universe relation”
    • ...

Data Engineering

Data Engineering is about the 5 Ws (who, what, when, where, why) and how data are stored.

Who: schema:author @westurner ;
What: schema:name “WRD R&D Documentation”@en ;
When: schema:codeRepository <> ;
Where: schema:codeRepository <> ;
Why: schema:description “Documentation purposes”@en ;
How: schema:programmingLanguage :ReStructuredText ;
How: schema:runtimePlatform [ :Python, :CPython, :Sphinx ] ;

File Structures

Git File Structures

Git specifies a number of file structures: Git Objects, Git References, and Git Packfiles.

Git implements something like on-disk shared snapshot objects with commits, branching, merging, and multi-protocol push/pull semantics:

Git Packfile
“Git is a content-addressable filesystem

Bup (backup) is a backup system based on git packfiles and rolling checksums.

[bup is a very] efficient backup system based on the Git Packfile format, providing fast incremental saves and global deduplication (among and within files, including virtual machine images).

Torrent file structure

A bittorrent torrent file is an encoded manifest of tracker, DHT, and web seed URIs; and segment checksum hashes.

See: BitTorrent, Named Data Networking, Web Distribution

File Locking

File locking is one strategy for synchronization with concurrency and parallelism.

Data Structures


An array is a data structure for unidimensional data.

  • Arrays must be resized when data grows beyond the initial shape of the array.
  • Sparse arrays are sparsely allocated.
  • A multidimensional array is said to be a matrix.


A matrix is a data structure for multidimensional data; a multidimensional array.


A list is a data structure with nodes that link to a next and/or previous node.


A graph is a system of nodes connected by edges; an abstract data type for which there are a number of suitable data structures.


DFS (Depth-first search) is a graph traversal algorithm.

# Given a tree:

# BFS:
[1, 1.1, 1.2, 2, 2.1, 2.2

See also: Bulk Synchronous Parallel, Firefly Algorithm


BFS (Breadth-first search) is a graph traversal agorithm.

# Given a tree:

# BFS:
1, 2, 1.1, 1.2, 2.1, 2.2
Topological Sorting

A DAG (directed acyclic graph) has a topological sorting, or is topologically sorted.

  • The unix tsort utility does a topological sorting of a space and newline delimited list of edge labels:
$ tsort --help
Usage: tsort [OPTION] [FILE]
Write totally ordered list consistent with the partial ordering in FILE.
With no FILE, or when FILE is -, read standard input.

    --help     display this help and exit
    --version  output version information and exit

GNU coreutils online help: <>
For complete documentation, run: info coreutils 'tsort invocation'

$ echo -e '1 2\n2 3\n3 4\n2 a' | tsort
  • Installing a set of packages with dependencies is a topological sorting problem; plus e.g. version and platform constraints (as solvable with a SAT constraint satisfaction solver (see Conda (pypi:pycosat)))
  • A topological sorting can identify the “root” of a directed acyclic graph.
    • Information gain can be useful for less discrete problems.

Compression Algorithms


File Extension: .bz2

bzip2 is an Open Source lossless compression algorithm based upon the Burrows-Wheeler algorithm.

  • bzip2 is usually slower than gzip or zip, but more space efficient


gzip is a compression algorithm based on DEFLATE and LZ77.

  • gzip is similar to zip, in that both are based upon DEFLATE


File Extension: .tar

tar is a file archiving format for storing a manifest of records of a set of files with paths and attributes at the beginning of the actual files all concatenated into one file.

  • TAR = ( table of contents + data stream )
  • .tar.gz is tar + gzip
  • .tar.bz2 is tar + bzip2

TAR and gzip or bzip2 can be streamed over SSH:

tar czf - . | ssh remote "( cd ~/ ; cat > file.tar.gz )"
tar bzf - . | ssh remote "( cd ~/ ; cat > file.tar.bz2 )"

See also: zip (Windows)


zip is a lossless file archive compression

Hash Functions

Hash functions (or checksums) are one-way functions designed to produce uniquely identifying identifiers for blocks or whole files in order to verify data Integrity.

  • A hash is the output of a hash function.
  • In Python, dict keys must be hashable (must have a __hash__ method).
  • In Java, Scala, and many other languages dicts are called HashMaps.
  • MD5 is a checksum algorithm.
  • SHA is a group of checksum algorithms.


A CRC (Cyclical Redundancy Check) is a hash function for error detection based upon an extra check value.


MD5 is a 128-bit hash function which is now broken, and deprecated in favor of SHA-2 or better.



  • SHA-0 – 160 bit (retracted 1993)
  • SHA-1 — 160 bit (deprecated 2010)
  • SHA-2 — sha-256, sha-512
  • SHA-3 (2012)
shasum -a 1
shasum -a 224
shasum -a 256
shasum -a 384
shasum -a 512
shasum -a 512224
shasum -a 512256


Filesystems (file systems) determine how files are represented in a persistent physical medium.


RAID (redundant array of independent disks) is set of configurations for Hard Drives and SSDs to stripe and/or mirror with parity.

RAID 0 -- striping,        -,             no parity ... throughput
RAID 1 -- no striping,  mirroring,        no parity ...
RAID 2 -- bit striping,    -,             no parity ... legacy
RAID 3 -- byte striping,   -,      dedicated parity ... uncommon
RAID 4 -- block striping,  -,      dedicated parity
RAID 5 -- block striping,  -,    distributed parity ... min. 3; n-1 rebuild
RAID 6 -- block striping,  -, 2x distributed parity

RAID Implementations:


MBR (Master Boot Record) is a boot record format and a file partition scheme.

  • DOS and Windows use MBR partition tables.
  • Many/most UNIX variants support MBR partition tables.
  • Linux supports MBR partition tables.
  • Most PCs since 1983 boot from MBR partition tables.
  • When a PC boots, it reads the MBR on the first configured drive in order to determine where to find the bootloader.


GPT (GUID Partition Table) is a boot record format and a file partition scheme wherein partitions are assigned GUIDs (Globally Unique Identifiers).


LVM (Logical Volume Manager) is an Open Source software disk abstraction layer with snapshotting, copy-on-write, online resize and allocation and a number of additional features.

  • In LVM, there are Volume Groups (VG), Physical Volumes (PV), and Logical Volumes (LV).
  • LVM can do striping and high-availability sofware RAID.
  • LVM and device-mapper are now part of the Linux kernel tree (the LVM Linux kernel modules are built and included with most distributions’ default kernel build).
  • LVM Logical Volumes can be resized online (without e.g. rebooting to busybox or a LiveCD); but many Filesystems support only onlize grow (and not online shrink).
  • There is feature overlap between LVM and btrfs (pooling, snapshotting, copy-on-write).


ext2, ext3, and ext4 are the ext (extended filesystem) Open Source on-disk filesystems.

  • ext filesystems are the default filesystems of many Linux distributions.
  • Windows machines can access ext2, ext3, and ext4 filesystems with ext2explore and ext2fsd.
  • OS X machines can access ext2, ext3, and ext4 filesystems with OSXFuse and FUSE-EXT2.


FAT is a group of on-disk filesystem standards.

  • FAT is used on cross-platform USB drives.
  • FAT is found on older Windows and DOS machines.
  • FAT12, FAT16, and FAT32 are all FAT filesystem standards.
  • FAT32 has a maximum filesize of 4GB and a maximum volume size of 2 TB.
  • Windows machines can read and write FAT partitions.
  • OS X machines can read and write FAT partitions.
  • Linux machines can read and write FAT partitions.


FileExt: .iso

ISO9660 is an ISO standard for disc drive images which specifies a standard for booting from a filesystem image.

  • Many Operating System distributions are distributed as ISO9660 .iso files.

  • ISO9660 and Linux:

    • An ISO9660 ISO can be loop mounted:

      mount -o loop,ro -t iso9660 ./path/to/file.iso /mnt/cdrom
    • An ISO8660 CD can be mounted:

      mount -o ro -t iso9660 /dev/cdrom /mnt/cdrom
  • Most CD/DVD burning utilities support ISO9660 .iso files.

  • ISO9660 is useful in that it specifies how to encode the boot sector (El Torito) and partition layout.

  • Nowadays, ISO9660 .iso files are often converted to raw drive images and written to bootable USB Mass Storage devices (e.g. to write a install / recovery disq for Debian, Ubuntu, Fedora, Windows)


HFS+ (Hierarchical Filesystem) or Mac OS Extended, is the filesystem for Mac OS 8.1+ and OS X.


NTFS is a proprietary journaling filesytem.


FUSE (Filesystem in Userspace) is a userspace filesystem API for implementing filesystems in userspace.

  • FUSE support is included in the Linux kernel since 2.6.14.
  • FUSE is available for most POSIX platforms.

Interesting FUSE implementations:

Network Filesystems


Ceph is an Open Source network filesystem (a distributed database for files with attributes like owner, group, permissions) written in C++ and Perl which runs over top of one or more on-disk filesystems.


CIFS (Common Internet File System) is a centralized network filesystem protocol.

  • Samba smbd is one implementation of a CIFS network file server.


DDFS (Disco Distributed File System) is a distributed network filesystem written in Python and C.

  • DDFS is like a Python implementation of HDFS (which is written in Java).


HDFS (Hadoop Distributed File System) is an Open Source distributed network filesystem.


NFS (Network File System #TODO) is an Open Source centralized network filesystem.


SMB (Server Message Block) is a centralized network filesystem.

  • SMB has been superseded by CIFS.


WebDAV (Web Distributed Authoring and Versioning) is a network filesystem protocol built with HTTP.

  • WebDAV specifies a number of unique HTTP methods:
    • PROPFIND (ls, stat, getfacl),
    • PROPPATCH (touch, setfacl)
    • MKCOL (mkdir)
    • COPY (cp)
    • MOVE (mv)
    • LOCK (File Locking)
    • UNLOCK ()


Relational Databases

Relational Algebra

What doesn’t SQL do?


Drizzle is an Open Source relational database “for the cloud” which was forked from MySQL 6.0.

  • Drizzle stores all data as UTF-8.
  • Drizzle has a minimal core and a plugin API.

PostgreSQL is an Open Source relational database.

  • PostgreSQL has native support for storing and querying JSON.
  • PostgreSQL has support for geographical queries (PostGIS).

SQLite is a serverless Open Source relational database which stores all data in one file.

  • SQLite is included in the Python standard library.

Virtuoso Open Source edition is a multi-paradigm relational database / XML document database / RDF triplestore.

  • Virtuoso supports ODBC, JDBC, and DB-API relational database access.
  • Virtuoso powers DBpedia.

Graph Databases

Graph Queries


Blazegraph is an Open Source graph database written in Java with support for Gremlin, Blueprints, RDF, RDFS and OWL inferencing, SPARQL.


Blueprints is an Open Source graph database API (and reference graph data model).

Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model.

Blueprints is analogous to the JDBC, but for graph databases. As such, it provides a common set of interfaces to allow developers to plug-and-play their graph database backend.

Moreover, software written atop Blueprints works over all Blueprints-enabled graph databases.

Within the TinkerPop software stack, Blueprints serves as the foundational technology for:

  • Pipes: A lazy, data flow framework
  • Gremlin: A graph traversal language
  • Frames: An object-to-graph mapper
  • Furnace: A graph algorithms package
  • Rexster: A graph server

Gremlin is an Open Source domain-specific language for traversing property graphs.

  • Gremlin works with databases that implement the Blueprints graph database API.

Neo4j is an Open Source HA graph database written in Java.

Distributed Databases

See: Distributed Algorithms


Apache Accumulo is an Open Source distributed database key/value store written in Java based on BigTable which adds realtime queries, streaming iterators, row-level ACLs and a number of additional features.

  • Accumulo supports MapReduce-style computation.
  • Accumulo supports streaming iterator computation.
  • Accumulo supports HDFS.
  • Accumulo implements a programmatic Java query API.

Google BigTable is a open reference design for a distributed key/value column store and a proprietary production database system.

  • BigTable functionality overlaps with that of the newer Pregel and Spanner distributed databases.
  • Cloud BigTable is a PaaS / SaaS service with Java integration through an adaptation of HBase API.
Apache Beam

Apache Beam is an open source batch and streaming parallel data processing framework with support for Apache Apex, Apache Flink, `Apache Spark`_, and Google Cloud Dataflow.


Apache Cassandra is an Open Source distributed key/value super column store written in Java.


Apache Hadoop is a collection of Open Source distributed computing components; particularly for MapReduce-style computation over Hadoop HDFS distributed filesystem.


Apache HBase is an Open Source distributed key/value super column store based on BigTable written in Java that does MapReduce-style computation over Hadoop HDFS.


Apache Parqet is an Open Source columnar storage format for Distributed Databases

Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.
  • The Parquet format and Parquet metadata are encoded with Thrift:
  • See also: CSV, CSVW

Presto is an Open Source distributed query engine designed to query multiple datastores at once.


Apache Spark is an Open Source distributed computation platform.

Distributed Algorithms

Distributed Databases and distributed Information Systems implement Distributed Algorithms designed to solve for Confidentiality, Integrity, and Availability.

As separate records / statements to be yield-ed or emitted:

See Also:


A DHT (Distributed Hash Table*) is a distributed key value store for storing values under a consistent file checksum hash which can be looked up with e.g. an exact string match.


MapReduce is a distributed algorithm for distributed computation.


Bulk Synchronous Parallel

Bulk Synchronous Parallel (BSP) is a distributed algorithm for distributed computation.

Distributed Computing Protocols


CORBA (Common Object Request Broker Architecture) is a distributed computing protocol now defined by OMG with implementations in many languages.


An ESB (Enterprise Service Bus) is a centralized distributed computing component which relays (or brokers) messages with or as a message queue (MQ).


MPI (Message Passing Interface) is a distributed computing protocol for structured data interchange with implementations in many languages.


XML Remote Procedure Call defines method names with parameters and values for making function calls with XML.

See also:

Protocol Buffers

Protocol Buffers (PB) is a standard for structured data interchange.

  • Protocol Buffers are faster than JSON

See also:


Thrift is a standard for structured data interchange in the style of Protocol Buffers.

  • Thrift is faster than JSON.

See also:


SOA (Service Oriented Architecture) is a collection of Web Standards (e.g WS-*) and architectural patterns for distributed computing.


There are many web service specifications; many web service specifications often start with WS-.


WSDL (Web Services Description Language) is a web standard for describing web services and the schema of their inputs and outputs.


JSON-WSP (JSON Web-Service Protocol) is a web standard protocol for describing services and request and response objects.

  • JSON-WSP is similar in function to WSDL and CORBA IDL.

See also: Linked Data Platform (LDP)



REST (Representational State Transfer) is a pattern for interacting with web resources using regular HTTP methods like GET, POST, PUT, and DELETE.


WAMP (Web Application Messaging Protocol) defines Publish/Subscribe (PubSub) and Remote Procedure Call (RPC) over WebSocket, JSON, and URIs

Using WAMP, you can have a browser-based UI, the embedded device and your backend talk to each other in real-time:

Search Engine Indexing


ElasticSearch is an Open Source realtime search server written in Java built on Apache Lucene with a RESTful API for indexing JSON documents.

  • ElasticSearch supports geographical (bounded) queries.
  • ElasticSearch can build better indexes for faster search response times when ElasticSearch Mappings are specified.
  • ElasticSearch mappings can be (manually) transformed to JSON-LD @context mappings:


Apache Nutch is an Open Source distributed web crawler and search engine written in Java and implemented on top of Lucene.

  • Nutch has a pluggable storage and indexing API with support for e.g. Solr, ElasticSearch.


Whoosh is an Open Source search indexing service written in Python.


Xapian is an Open Source search library written in C++ with bindings for many languages.

Information Retrieval

  • Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Time Standards

International Atomic Time (IAT)

International Atomic Time (IAT) is an international standard for extremely precise time keeping; which is the basis for UTC Earth time and for Terrestrial Time (Earth and Space).

Long Now Dates

 2015    # ISO8601 date
02015    # 5-digit Y10K date

Unix Time

Defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds

Unix time is the delta in seconds since 1970-01-01T00:00:00Z, not counting leap seconds:

0                       # Unix time
1970-01-01T00:00:00Z    # ISO8601 timestamp

1435255816              # Unix time
2015-06-25T18:10:16Z    # ISO8601 timestamp


Unix time does not count leap seconds.

See also: Swatch Internet Time (Beat Time)

Year Zero

  • The Gregorian Calendar (e.g. Common Era, Julian Calendar) does not include a year zero; (1 BCE is followed by 1 CE).
  • Astronomical year numbering includes a year zero.
  • Before Present dates do not specify a year zero. (because they are relative to the current (or published) date).

Astronomical year numbering

  • Astronomical year numbering includes a year zero:

Tools with support for Astronomical year numbering:

Before Present (BP)

Before Present (BP) dates are relative to the current date (or date of publication); e.g. “2.6 million years ago”.

Common Era (CE)

Common Era and Year Zero:

5000 BCE == -5000 CE
   1 BCE ==    -1 CE
   0 BCE ==     0 CE
   0  CE ==     0 BCE
   1  CE ==     1 CE
2015  CE ==  2015 CE


Are these off by one?

Common Era and Python datetime calculations:

# Paleolithic Era (2.6m years ago -> 12000 years ago)
# "2.6m years ago" = (2.6m - (2015)) BCE = 2597985 BCE = -2597985 CE

2597985 BCE == -2597985 CE

### Python datetime w/ scientific notation string formatter
>>> import datetime
>>> year =
>>> '{:.6e}'.format(2.6e6 - year)

### Python datetime supports (dates >= 1 BCE).
>>>, 1, 1), 1, 1)
>>> datetime.datetime(1, 1, 1)
>>> datetime.datetime(1, 1, 1, 0, 0)

### Python pypi:arrow supports (dates >= 1 BCE).
>>> !pip install arrow
>>> arrow.get(1, 1, 1)
<Arrow [0001-01-01T00:00:00+00:00]>

### astropy.time.Time supports (1 BCE <= dates >= 1 CE) and/or *Year Zero*
>>> !conda install astropy
>>> import astropy.time
>>> # TimeJulianEpoch (Julian date (jd) ~= Common Era (CE))
>>> astropy.time.Time(-2.6e6, format='jd', scale='utc')
<Time object: scale='utc' format='jd' value=-2600000.0>

Time Zones


UTC (Coordinated Universal Time) is the primary terrestrial Earth-based clock time.

  • Earth Time Zones are specified as offsets from UTC.
  • UTC time is set determined by International Atomic Time (IAT); with occasional leap seconds to account for the difference between Earth’s rotational time and the actual passage of time according to the decay rate of cesium atoms (an SI Unit calibrated with an atomic clock; see QUDT).
  • Many/most computer systems work with UTC, but are not exactly synchronized with International Atomic Time (IAT) (see also: RTC, NTP and time drift).

US Time Zones

Time Zone names, URIs, and ISO8601 UTC offsets:

Table of US Time Zones
Time Zone names, URNs, URIs UTC Offset UTC DST Offset

#tz: Coordinated Universal Time, UTC, Zulu

-0000 Z +0000 Z

#tz: Atlantic, Antarctica (Palmer), AST, ADT


-0400 AST -0300 ADT

#tz: America/St_Thomas, America/Virgin

-0400 -0400

#tz: Eastern, EST, EDT


-0500 EST -0400 EDT

#tz: Central, CST, CDT


-0600 CST -0500 CDT

#tz: Mountain, MST, MDT


-0700 MST -0600 MDT

#tz: Pacific, PST, PDT


-0800 PST -0700 PDT


#tz: Alaska, AKST, AKDT


-0900 AKST -0800 AKDT


#tz: Hawaii Aleutian, HAST, HADT


-1000 HAST -0900 HADT

#tz: Samoa Time Zone, SST


-1100 SST -1100 SST

#tz: Chamorro, Guam


+1000 +1000

Antarctica (Amundsen, McMurdo), South Pole


+1200 +1300
US Daylight Saving Time

Currently, daylight saving time starts on the second Sunday in March and ends on the first Sunday in November, with the time changes taking place at 2:00 a.m. local time.

With a mnemonic word play referring to seasons, clocks “spring forward and fall back” — that is, in spring (technically late winter) the clocks are moved forward from 2:00 a.m. to 3:00 a.m., and in fall they are moved back from 2:00 am to 1:00 am.

Daylight Savings Time Starts and Ends on the following dates (from

Year DST start date DST end date
2015 2015-03-08 02:00 2015-11-01 02:00
2016 2016-03-13 02:00 2016-11-06 02:00
2017 2017-03-12 02:00 2017-11-05 02:00
2018 2018-03-11 02:00 2018-11-04 02:00
2019 2019-03-10 02:00 2019-11-03 02:00
2020 2020-03-08 02:00 2020-11-01 02:00


ISO8601 is an ISO standard for specifying Gregorian dates, times, datetime intervals, durations, and recurring datetimes.

  • The date command can print ISO8601 -compatible datestrings:

    $ date +'%FT%T%z'
    $ date +'%F %T%z'
    2016-01-01 22:11:59-0600
  • Roughly, an ISO8601 datetime is specified as: year, dash month, dash day, (T or `` `` [space-character]), hour, colon, minute, colon, second, (Z [for UTC] or a time zone offset (e.g. +/- -0000, +0000)); where the dashes and colons are optional.

  • ISO8601 specifies a standard for absolute time durations: start date, forward-slash, end date.

  • ISO8601 specifies a standard for relative time durations: number of years Y, months M, days D, hours H, minutes M, and seconds S.

  • A Z timezone specifies UTC (Universal Coordinated Time) (or “Zulu”) time.

  • Many/most W3C standards (such as XSD) specify ISO8601 time formats:

A few examples of ISO8601:

2014-10-23T20:59:30+Z       # UTC / Zulu
2014-10-23T20:59:30Z        # UTC / Zulu
2014-10-23T20:59:30-06:00   # CST
2014-10-23T20:59:30-06      # CST
2014-10-23T20:59:30-05:00   # CDT
2014-10-23T20:59:30-05      # CDT


AFAIU, ISO8601 does not specify standards for milliseconds, microseconds, nanoseconds, picoseconds, femtoseconds, or attoseconds.


NTP (Network Time Protocol) is a standard for synchronizing clock times.

  • Most Operating Systems and mobile devices support NTP.
  • NTP clients calculate time drift (or time skew) and network latency and then gradually adjust the local system time to the most recently retrieved server time.
  • Many OS distributions run their own NTP servers (in order to reduce load on the core NTP pool servers).

Linked Data

5 ★ Linked Data

Publish data on the Web in any format (e.g., PDF, JPEG) accompanied by an explicit Open License (expression of rights).


Publish structured data on the Web in a machine-readable format (e.g. XML).


Publish structured data on the Web in a documented, non-proprietary data format (e.g. CSV, KML).


Publish structured data on the Web as RDF (e.g. Turtle, RDFa, JSON-LD, SPARQL.)


In your RDF, have the identifiers be links (URLs) to useful data sources.

See: Semantic Web

Web Standards

Web Names




IEC (International Electrotechnical Commission) is a standards body.


IETF (Internet Engineering Task Force) is a standards body.


ISO (International Organization for Standardization) is a standards body.


OMG (Object Management Group) is a standards body.


W3C (World Wide Web Consortium) is a standards body.


HTTP (HyperText Transfer Protocol) is an Open Source text-based request-response TCP/IP protocol for text and binary data interchange.

  • HTTPS (Secure HTTP) wraps HTTP in SSL/TLS to secure HTTP.


xmlns: @prefix http: <> .
xmlns: @prefix http-headers: <> .
xmlns: @prefix http-methods: <> .
xmlns: @prefix http-statusCodes: <> .

HTTP-in-RDF is a standard for representing HTTP as RDF.


HTTPS (HTTP over SSL) is HTTP wrapped in TLS/SSL.

  • TLS (Transport Layer Security)
  • SSL (Secure Sockets Layer)


HTTP STS (HTTP Strict Transport Security) is a standardized extension for notifying browsers that all requests should be made over HTTPS indefinitely or for a specified time period.

See also: HTTPS Everywhere


CSS (Cascading Style Sheets) define the presentational aspects of HTML and a number of mobile and desktop web framworks.

  • CSS is designed to ensure separation of data and presentation. With javascript, the separation is then data, code, and presentation.


RTMP is a TCP/IP protocol for streaming audio, video, and data originally for Flash which is now Open Source.


URI Scheme: ws://

WebSocket is a full-duplex (two-way) TCP/IP protocol for audio, video, and data which can interoperate with HTTP Web Servers.

  • WebSockets are often more efficient than other methods for realtime HTTP like HTTP Streaming and long polling.
  • WebSockets work with many/most HTTP proxies

  • Python: pypi:gevent-websocket, pypi:websockets (asyncio), pypi:autobahn (pypi:twisted, asyncio)

See also:, WebRTC


WebRTC is a web standard for decentralized or centralized streaming of audio, video, and data in browser, without having to download any plugins.


WebRTC is supported by a growing number of browsers:

Notably, Internet Explorer and Safari still require a plugin to handle WebRTC.


HTTP/2 (HTTP2) is the newest standard for HTTP.

  • HTTP/2 is largely derived from the SPDY protocol.


HTML (HyperText Markup Language) is a Open Source standard for representing documents with tags, attributes, and hyperlinks.

Recent HTML standards include HTML4, XHTML, and HTML5.


HTML4 is the fourth generation HTML standard.


XHTML is an XML-conforming HTML standard which is being superseded by HTML5.

Compared to HTML4, XHTML requires closing tags, suports additional namespace declarations, and expects things to be wrapped in CDATA blocks, among a few other notable differences.

XHTML has not gained the widespread adoption of HTML4, and is being largely superseded by HTML5.


HTML5 is the fifth generation HTML standard with many new (and removed) features.

Like its predecessors, HTML5 is not case sensitive, but it is recommended to use lowercased tags and attributes.

Differences Between HTML4 and HTML5

  • HTML5 does not require closing tags (many browsers had already implemented routines for auto-closing broken markup).
  • Frames have been removed
  • Presentational attributes have been removed (in favor of CSS)

HTML 5.1

HTML 5.1 is in the works:


XML (Extensible Markup Language) is a standard for representing data with tags and attributes.

Like PDF, XML is derived from SGML.


xmlns: @prefix xsd: <> .

XSD (XML Schema Datatypes) are standard datatypes for things like strings, integers, floats, and dates for XML and also RDF.


JSON (JavaScript Object Notation) is a standard for representing data in a JavaScript compatible way; with a restricted set of data types.

Conforming JSON does not contain JavaScript code, only data. It is not safe to eval JSON, because it could contain code.

There are many parsers for JSON.

JSON-LD adds RDF Linked Data support to JSON with @context.


Extension: .csv
MIME Type: text/csv

CSV (Comma Separated Values) as a flat file representation for columnar data with rows and columns.

Most spreadsheet tools can export (raw and computed) data from a sheet into a CSV file, for use with many other tools.


CSVW (CSV on the Web) is a set of relatively new standards for representing CSV rows and columns as RDF (and JSON / JSON-LD) along with metadata.


xmlns: @prefix rdf: <> .

RDF (Resource Description Framework) is a standard data model for representing data as triples.



Useful Resources

See also: RDF Triplestores

RDF Interfaces

RDF Interfaces is an Open Source standard for RDF APIs (e.g. as implemented by RDF libraries and RDF Triplestores.

  • createBlankNode –> BlankNode
  • createNamedNode –> NamedNode
  • createLiteral –> Literal
  • createTriple –> Triple (RDFNode s, RDFNode p, RDFNode, o)
  • createGraph –> []Triple
  • createAction –> TripleAction (TripleFilter, TripleCallback)
  • createProfile –> Profile
  • createTermMap –> TermMap
  • createPrefixMap –> PrefixMap

Implementations of RDF Interfaces:


Extension: .nt
MIME Type: application/n-triples

N-Triples is a standard for serializing RDF triples to text.


Extension: .rdf
MIME Type: application/rdf+xml

RDF/XML is a standard for serializing RDF as XML.


TriX is a standard which extends the RDF/XML RDF serialization standard with named graphs.


Extension: .n3
MIME Type: text/n3

N3 (Notation3) is a standard which extends the Turtle RDF serialization standard with a few extra features.

  • => implies (useful for specifying production rules)


Extension: .ttl
MIME type: text/turtle

Turtle is a standard for serializing RDF triples into human-readable text.


Extension: .trig
MIME Type: application/trig

TriG (...) extends the Turtle RDF standard to allow multiple named graphs to be expressed in one file (as triples with a named graph IRI (“quads”)).

Triples without a specified named graph are, by default, part of the “Default Graph”.


RDFa (RDF in attributes) is a standard for storing structured data (RDF triples) in HTML, (XHTML, HTML5) attributes. structured data can be included in an HTML page as RDFa.

RDFa 1.1 Core Context

The RDFa 1.1 Core Context defines a number of commonly used vocabulary namespaces and URIs (prefix mappings).

An example RDFa HTML5 fragment with vocabularies drawn from the RDFa 1.1 Core Context:

<div vocab="schema:">
  <div typeof="schema:Thing">
    <span property="schema:name">RDFa 1.1 JSON-LD Core Context</span>
    <a property="schema:url"></a>

An example JSON-LD document with the RDFa 1.1 Core Context:

{"@context": "",
 "@graph": [
   {"@type": "schema:Thing"
    "schema:name": "RDFa 1.1 JSON-LD Core Context",
    "schema:url": ""}

Note is included in the RDFa 1.1 Core Context. does, in many places, reimplement other vocabularies e.g. for consistency with s like

There is also RDF, which, for example maps schema:name to rdfs:label; and OWL.


JSON-LD (JSON Linked Data) is a standard for expressing RDF Linked Data as JSON.

JSON-LD specifies a @context for regular JSON documents which maps JSON attributes to URIs with datatypes and, optionally, languages.


xmlns: @prefix rdfs: <> .

RDFS (RDF Schema) is an RDF standard for classes and properties.

A few notable RDFS classes:

  • rdfs:Resource (everything in RDF)
  • rdfs:Literal (strings, integers)
  • rdfs:Class

A few notable / frequently used properties:

  • rdfs:label
  • rdfs:comment
  • rdfs:seeAlso
  • rdfs:domain
  • rdfs:range
  • rdfs:subPropertyOf

OWL builds upon many RDFS concepts.


xmlns: @prefix dcterms: <> .
xmlns: @prefix dctypes: <> .

DCTYPES (Dublin Core Types) and DCTERMS (Dublin Core Terms) are standards for common types, classes, and properties that have been mapped to XML and RDF.


xmlns: @prefix earl:

W3C EARL (Evaluation and Reporting Language) is an RDFS vocabulary for automated, semi-automated, and manual test results.

RDF Data Cubes

xmlns: @prefix qb: <> .

RDF Data Cubes vocabulary is an RDF standard vocabulary for expressing linked multi-dimensional statistical data and aggregations.

  • Data Cubes have dimensions, attributes, and measures
  • Pivot tables and crosstabulations can be expressed with RDF Data Cubes vocabulary


SKOS (Simple Knowledge Organization System) is an RDF standard vocabulary for linking concepts and vocabulary terms.


XKOS (Extended Knowledge Organization System) is an RDF standard which extends SKOS for linking concepts and statistical measures.


FOAF (Friend of a Friend) is an RDF standard vocabulary for expressing social networks and contact information.


xmlns: @prefix sh: <> .

W3C SHACL (Shapes Constraint Language) is a language for describing RDF and RDFS graph shape constraints.


SIOC (Semantically Interlinked Online Communities) is an RDF standard for online social networks and resources like blog, forum, and mailing list posts.


xmlns: @prefix oa: <> .

OA (Open Annotation) is an RDF standard for commenting on anything with a URI.


Implementations: is a vocabulary for expressing structured data on the web. can be expressed as microdata, RDF, RDFa, and JSON-LD.



The site is served over HTTPS, but the terms are HTTP URIs TopBraid RDF

xmlns: @prefix schema: <> .
xmlns: @prefix schemax: <> .

TopBraid maintains more complete OWL RDF transformations of


SPARQL is a text-based query and update language for RDF triples (and quads).


  • SPARQL query requests and responses are over HTTP; however, it’s best – and often required – to build SPARQL queries with a server application, on behalf of clients.
  • SPARQL default LIMIT clauses and paging windows could allow for more efficient caching
  • See: LDP for more of a resource-based RESTful API that can be implemented on top of the graph pattern queries supported by SPARQL.


xmlns: @prefix ldp: <> .

LDP (Linked Data Platform) is a standard for building HTTP REST APIs for RDF Linked Data.


  • HTTP REST API for Linked Data Platform Containers (LDPC) containing Linked Data Plaform Resources (LDPR)
  • Server-side Paging


OWL (Web Ontology Language) layers semantics, reasoning, inference, and entailment capabilities onto RDF (and general logical set theory).

A few notable OWL classes:

  • owl:Class a owl:Class ; rdfs:subClassOf rdfs:Class (RDFS)
  • owl:Thing a owl:Class – universal class
  • owl:Nothing a owl:Class – empty class
  • owl:Restriction a rdfs:Class ; rdfs:subClassOf owl:Class

A few OWL Property types:

  • owl:DatatypeProperty
  • owl:ObjectProperty
  • owl:ReflexiveProperty
  • owl:IrreflexiveProperty
  • owl:SymmetricProperty
  • owl:TransitiveProperty
  • owl:FunctionalProperty
  • owl:InverseFunctionalProperty
  • owl:OntologyProperty
  • owl:AnnotationProperty
  • owl:AsymmetricProperty

  • owl:minCardinality
  • owl:cardinality
  • owl:maxCardinality


  • owl:intersectionOf
  • owl:unionOf
  • owl:complementOf
  • owl:oneOf


  • owl:allValuesFrom
  • owl:someValuesFrom



PROV (Provenance) ontology is an OWL RDF standard for expressing data provenance (who, what, when, and how, to a certain extent).


xmlns: @prefix dbpedia-owl: <> .

DBpedia is an OWL RDF vocabulary for expressing structured data from Wikipedia sidebar infoboxes.

DBpedia is currently the most central (most linked to and from) RDF vocabulary. (see: LODCloud)


DBpedia is generated by batch extraction on a regular basis.


xmlns: @prefix qudt: <> .
xmlns: @prefix qudt-1.1:  <> .

QUDT (Quantities, Units, Dimensions, and Types) is an RDF standard vocabulary for representing physical units.

  • QUDT is composed of a number of sub-vocabularies
  • QUDT maintains conversion factors for Metric and Imperial Units


  • qudt:SpaceAndTimeUnit

       rdf:type owl:Class ;
       rdfs:label "Space And Time Unit"^^xsd:string ;
       rdfs:subClassOf qudt:PhysicalUnit ;
               [ rdf:type owl:Restriction ;
                 owl:hasValue "UST"^^xsd:string ;
                 owl:onProperty qudt:typePrefix
               ] .
  • QUDT Namespaces:

    @prefix qudt:           <> .
    @prefix qudt-1.1:       <> .
    @prefix qudt-dimension: <> .
    @prefix qudt-quantity:  <> .
    @prefix qudt-unit-1.1:  <> .
    @prefix unit:           <> .

This diagram explains how each of the vocabularies are linked and derived:

QUDT Quantities


xmlns: @prefix quantity: <> .


xmlns: @prefix qudt-quantity: <> .

QUDT Quantities is an RDF schema and vocabulary for describing physical quantities.

Examples from :

  • qudt-quantity:Time

        rdf:type qudt:SpaceAndTimeQuantityKind ;
        rdfs:label "Time"^^xsd:string ;
        qudt:description "Time is a basic component of the measuring system used to sequence events, to compare the durations of events and the intervals between them, and to quantify the motions of objects."^^xsd:string ;
        qudt:symbol "T"^^xsd:string ;
        skos:exactMatch <> .
    # ...
        qudt:quantityKind qudt-quantity:Time .
  • qudt-quantity:AreaTimeTemperature

        rdf:type qudt:ThermodynamicsQuantityKind ;
        rdfs:label "Area Time Temperature"^^xsd:string .
    # ...
        qudt:quantityKind qudt-quantity:AreaTimeTemperature .

QUDT Units

xmlns: @prefix unit: <> .
xmlns: @prefix qudt-unit-1.1:  <> .

The QUDT Units Ontology is an RDF vocabulary defining many units of measure.


  • unit:SecondTime

          rdf:type qudt:SIBaseUnit , qudt:TimeUnit ;
          rdfs:label "Second"^^xsd:string ;
          qudt:abbreviation "s"^^xsd:string ;
          qudt:code "1615"^^xsd:string ;
                  "1"^^xsd:double ;
                  "0.0"^^xsd:double ;
          qudt:symbol "s"^^xsd:string ;
          skos:exactMatch <> .
    # ...

  • unit:HorsepowerElectric

        rdf:type qudt:NotUsedWithSIUnit , qudt:PowerUnit ;
        rdfs:label "Horsepower Electric"^^xsd:string ;
        qudt:abbreviation "hp/V"^^xsd:string ;
        qudt:code "0815"^^xsd:string ;
        qudt:symbol "hp/V"^^xsd:string .
  • unit:SystemOfUnits_SI

          rdf:type qudt:SystemOfUnits ;
          rdfs:label "International System of Units"^^xsd:string ;
          qudt:abbreviation "SI"^^xsd:string ;
                  unit:ArcMinute , unit:Day , unit:MinuteTime , unit:DegreeAngle , unit:ArcSecond , unit:ElectronVolt , unit:RevolutionPerHour , unit:Femtometer , unit:DegreePerSecond , unit:DegreeCelsius , unit:Liter , unit:MicroFarad , unit:AmperePerDegree , unit:RevolutionPerMinute , unit:MicroHenry , unit:Kilometer , unit:Revolution , unit:Hour , unit:PicoFarad , unit:Gram , unit:DegreePerSecondSquared , unit:MetricTon , unit:CubicCentimeter , unit:SquareCentimeter , unit:CubicMeterPerHour , unit:KiloPascal , unit:DegreePerHour , unit:UnifiedAtomicMassUnit , unit:MilliHenry , unit:KilogramPerHour , unit:KiloPascalAbsolute , unit:NanoFarad , unit:RadianPerMinute , unit:RevolutionPerSecond ;
          qudt:systemBaseUnit unit:Kilogram , unit:Unitless , unit:Kelvin , unit:Meter , unit:SecondTime , unit:Mole , unit:Candela , unit:Ampere ;
                  unit:PerCubicMeter , unit:WattPerSquareMeter , unit:Volt , unit:WattPerMeterKelvin , unit:CoulombPerCubicMeter , unit:Becquerel , unit:WattPerSquareMeterSteradian , unit:KelvinPerSecond , unit:Gray , unit:RadianPerSecond , unit:VoltPerMeter , unit:HenryPerMeter , unit:WattPerSteradian , unit:JouleMeterPerMole , unit:CoulombMeter , unit:PerTeslaMeter , unit:Pascal , unit:LumenPerWatt , unit:KilogramMeterPerSecond , unit:SquareMeterKelvin , unit:MoleKelvin , unit:MeterKelvinPerWatt , unit:Steradian , unit:AmperePerMeter , unit:SquareMeterKelvinPerWatt , unit:JouleSecond , unit:MeterPerFarad , unit:KilogramPerSecond , unit:HertzPerTesla , unit:KilogramMeterSquared , unit:WattPerSquareMeterQuarticKelvin , unit:PerMeterKelvin , unit:JoulePerCubicMeterKelvin , unit:JoulePerSquareTesla , unit:JoulePerCubicMeter , unit:MeterPerKelvin , unit:AmperePerSquareMeter , unit:CubicCoulombMeterPerSquareJoule , unit:CoulombPerMeter , unit:Katal , unit:CubicMeter , unit:LumenSecond , unit:Coulomb , unit:MolePerKilogram , unit:CubicMeterPerKilogramSecondSquared , unit:PerMeter , unit:AmperePerRadian , unit:CoulombPerKilogram , unit:QuarticCoulombMeterPerCubicEnergy , unit:Tesla , unit:JoulePerKilogram , unit:MeterKelvin , unit:MeterPerSecond , unit:NewtonMeter , unit:CandelaPerSquareMeter , unit:Siemens , unit:CoulombSquareMeter , unit:KilogramPerCubicMeter , unit:KilogramSecondSquared , unit:Watt , unit:AmperePerJoule , unit:VoltPerSecond , unit:JoulePerKilogramKelvinPerCubicMeter , unit:PascalPerSecond , unit:CubicMeterPerMole , unit:KilogramPerMeter , unit:PascalSecond , unit:Joule , unit:HertzPerVolt , unit:KilogramPerSquareMeter , unit:PerTeslaSecond , unit:MolePerCubicMeter , unit:PerSecond , unit:JoulePerKelvin , unit:RadianPerSecondSquared , unit:Newton , unit:CubicMeterPerKelvin , unit:GrayPerSecond , unit:SquareMeterPerSecond , unit:CubicMeterPerKilogram , unit:KilogramPerMole , unit:SquareMeterPerKelvin , unit:SquareMeterSteradian , unit:TeslaSecond , unit:Ohm , unit:KelvinPerWatt , unit:JoulePerKilogramKelvinPerPascal , unit:WattSquareMeter , unit:MeterKilogram , unit:WattSquareMeterPerSteradian , unit:Hertz , unit:VoltPerSquareMeter , unit:CubicMeterPerSecond , unit:JoulePerMoleKelvin , unit:TeslaMeter , unit:JoulePerMole , unit:Lux , unit:FaradPerMeter , unit:PerMole , unit:JouleSecondPerMole , unit:AmpereTurnPerMeter , unit:VoltMeter , unit:SecondTimeSquared , unit:AmpereTurn , unit:JoulePerKilogramKelvin , unit:CoulombPerSquareMeter , unit:NewtonPerKilogram , unit:JoulePerSquareMeter , unit:Weber , unit:Henry , unit:MeterPerSecondSquared , unit:KilogramKelvin , unit:Sievert , unit:NewtonPerMeter , unit:WattPerSquareMeterKelvin , unit:SquareCoulombMeterPerJoule , unit:Lumen , unit:Farad , unit:HertzPerKelvin , unit:SquareMeter , unit:JoulePerTesla , unit:Radian , unit:KelvinPerTesla , unit:NewtonPerCoulomb , unit:CoulombPerMole ;
                  unit:Hecto , unit:Nano , unit:Tera , unit:Atto , unit:Kilo , unit:Yocto , unit:Yotta , unit:Deci , unit:Zepto , unit:Pico , unit:Femto , unit:Milli , unit:Micro , unit:Zetta , unit:Mega , unit:Centi , unit:Giga , unit:Peta , unit:Deca , unit:Exa ;
          skos:exactMatch <> .


Wikidata is an Open Source collaboratively edited knowledgebase.

  • DBpedia scrapes data from Wikipedia Infoboxes periodically. Wikidata is a database with forms, datatypes, and alphanumerical identifiers (which do not change or redirect).
  • Wikidata SPARQL, RDF, and OWL will be powered by Blazegraph.

Semantic Web Tools

Semantic Web Tools are designed to work with RDF formats.

See also: RDF Triplestores


CKAN (Comprehensive Knowledge Archive Network) is an Open Source data repository web application and API written in Python with support for RDF.


Protégé is a knowledge management software application with support for RDF, OWL, and a few different reasoners.

Web Protégé is a web-based version of Protégé with many similar features.

Protégé is a Free and Open Source software tool.


RDFHDT (RDF Header Dictionary Triples) is an optimized binary format for storing and working with very many triples in highly compressed form.

HDT-IT is a software application for working with RDFHDT datasets:

Semantic Web Schema Resources


Lookup RDF vocabularies, classes, and properties


LOV (“Linked Open Vocabularies”) is a web application for cataloging and viewing metadata of and links between vocabularies (RDF, RDFS, OWL)

  • All of the vocabularies stored in LOV as a bubble chart:

  • LOV has a “suggest a vocabulary” feature

  • Many of the vocabularies stored in LOV can also be searched or looked up from


The LOD (“Linking Open Data”) cloud diagram visualizes the nodes and edges of the Linked Open Data Cloud